Datawhale十二月Pandas組學習打卡--Task01.預備知識2:NumPy第一部分

2021-02-21 網安雜談

編號112打卡

這一部分介紹一下NumPy的基礎知識。也可以通過幕布來查看思維導圖模式：https://share.mubu.com/doc/gcPiai7k5T

參考NumPy官網的介紹（中文：https://www.numpy.org.cn/；英文：https://numpy.org/）：NumPy是使用Python進行科學計算的基礎軟體包。它是一個Python庫，提供多維數組對象，各種派生對象（如掩碼數組和矩陣），以及用於數組快速操作的各種API，有包括數學、邏輯、形狀操作、排序、選擇、輸入輸出、離散傅立葉變換、基本線性代數，基本統計運算和隨機模擬等等。

NumPy包的核心是 ndarray 對象。它封裝了python原生的同數據類型的 n 維數組，為了保證其性能優良，其中有許多操作都是代碼在本地進行編譯後執行的。

NumPy數組和原生Python Array（數組）之間有幾個重要的區別：

NumPy 數組在創建時具有固定的大小，與Python的原生數組對象（可以動態增長）不同。更改ndarray的大小將創建一個新數組並刪除原來的數組。NumPy 數組中的元素都需要具有相同的數據類型，因此在內存中的大小相同。例外情況：Python的原生數組裡包含了NumPy的對象的時候，這種情況下就允許不同大小元素的數組。NumPy 數組有助於對大量數據進行高級數學和其他類型的操作。通常，這些操作的執行效率更高，比使用Python原生數組的代碼更少。

NumPy 最重要的是其 N 維數組對象 ndarray，它是一系列同類型數據的集合，以 0 下標為開始進行集合中元素的索引。

ndarray 對象是用於存放同類型元素的多維數組。ndarray 中的每個元素在內存中都有相同存儲大小的區域。ndarray 內部由以下內容組成：

一個指向數據（內存或內存映射文件中的一塊數據）的指針。

數據類型或 dtype，描述在數組中的固定大小值的格子。一個表示數組形狀（shape）的元組，表示各維度大小的元組。一個跨度元組（stride），其中的整數指的是為了前進到當前維度下一個元素需要"跨過"的字節數。

ndarray 的內部結構

跨度可以是負數，這樣會使數組在內存中後向移動，切片中 obj[::-1] 或 obj[:,::-1] 就是如此。

創建一個 ndarray 只需調用 NumPy 的 array 函數即可：

numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

‍

參數的含義：

dtype 數組元素的數據類型，可選

copy 對象是否需要複製，可選

order 創建數組的樣式，C為行方向，F為列方向，A為任意方向（默認）

subok 默認返回一個與基類類型一致的數組

ndmin 指定生成數組的最小維度

2.2NumPy 數組屬性

NumPy 數組的維數稱為秩（rank），秩就是軸的數量，即數組的維度，一維數組的秩為 1，二維數組的秩為 2，以此類推。在 NumPy中，每一個線性的數組稱為是一個軸（axis），也就是維度（dimensions）。axis=0，表示沿著第 0 軸進行操作，即對每一列進行操作；axis=1，表示沿著第1軸進行操作，即對每一行進行操作。屬性說明ndarray.ndim秩，即軸的數量或維度的數量ndarray.shape數組的維度，對於矩陣，n 行 m 列ndarray.size數組元素的總個數，相當於 .shape 中 n*m 的值ndarray.dtypendarray 對象的元素類型ndarray.itemsizendarray 對象中每個元素的大小，以字節為單位ndarray.flagsndarray 對象的內存信息ndarray.realndarray元素的實部ndarray.imagndarray 元素的虛部ndarray.data包含實際數組元素的緩衝區，由於一般通過數組的索引獲取元素，所以通常不需要使用這個屬性。

（1）從其他Python結構（例如，列表，元組）轉換

（2）numpy原生數組的創建（例如，arange、ones、zeros等）

（3）從磁碟讀取數組，無論是標準格式還是自定義格式

（4）通過使用字符串或緩衝區從原始字節創建數組

（5）使用特殊庫函數（例如，random）

下面介紹的主要是NumPy原生數組的創建：

2.3.1一般生成方法

形式：numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

參數

實例

np.array([1,2,3]) 輸出array([1, 2, 3])

2.3.2設置數值範圍生成數組

‍numpy.arange

numpy 包中的使用 arange 函數創建數值範圍並返回 ndarray 對象

函數格式：numpy.arange(start, stop, step, dtype)

參數

例：np.arange(1,5,2) # 起始、終止（不包含）、步長輸出為：array([1,3])

等差數列

numpy.linspace 函數用於創建一個一維數組，數組由等差數列構成

格式：np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

參數：

dtype ndarray 的數據類型

retstep 如果為 True 時，生成的數組中會顯示間距，反之不顯示。

endpoint 該值為 true 時，數列中包含stop值，反之不包含，默認是True。

num 要生成的等步長的樣本數量，默認為50

stop 序列的終止值，如果endpoint為true，該值包含於數列中

start 序列的起始值

例：np.linspace(1,5,11) # 起始、終止（包含）、樣本個數輸出:array([1. , 1.4, 1.8, 2.2, 2.6, 3. , 3.4, 3.8, 4.2, 4.6, 5. ])

等比數列

numpy.logspace 函數用於創建一個於等比數列。

格式：np.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None)

參數：

dtype ndarray 的數據類型

base 對數 log 的底數（下標）。

endpoint 該值為 true 時，數列中中包含stop值，反之不包含，默認是True。

num 要生成的等步長的樣本數量，默認為50

stop 序列的終止值為：base ** stop。如果endpoint為true，該值包含於數列中

start 序列的起始值為：base ** start

例：np.logspace(0,9,10,base=2) 輸出：array([ 1. 2. 4. 8. 16. 32. 64. 128. 256. 512.])

2.3.2生成特殊數組

numpy.random通過生成器來產生符合統計分布的偽隨機數序列。看官網的介紹，使用位生成器的組合來創建序列，並使用生成器來使用這些序列從不同的統計分布取樣。

最常用的隨機生成函數為 rand, randn, randint, choice ，它們分別表示0-1均勻分布的隨機數組、標準正態的隨機數組、隨機整數組和隨機列表抽樣。可以通過隨機種子固定隨機數的輸出結果。如np.random.seed(0)設置後產生的隨機數就固定了。

np.random.rand

功能：按照指定數組形狀，生成0-1均勻分布的隨機值數組

形式：numpy.random.rand(d0, d1, ..., dn)

返回：ndarray, shape (d0, d1, ..., dn) Random values（0-1）.

例

對於服從區間 a到 b上的均勻分布生成：(b - a) * np.random.rand(3) + a

np.random.randint

功能：可以指定生成隨機整數的最小值最大值（不包含）和維度大小 Return random integers from the 「discrete uniform」 distribution of the specified dtype in the 「half-open」 interval [low, high). If high is None (the default), then results are from [0, low).

形式：numpy.random.randint(low, high=None, size=None, dtype=int)

‍參數

low int or array-like of ints Lowest (signed) integers to be drawn from the distribution (unless high=None, in which case this parameter is one above the highest such integer).

high int or array-like of ints, optional If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None). If array-like, must contain integer values

size int or tuple of ints, optional Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

dtype dtype, optional Desired dtype of the result. Byteorder must be native. The default value is int.

例：

np.random.randn

功能：生成標準正態分布數組 Return a sample (or samples) from the 「standard normal」 distribution.

形式：numpy.random.randn(d0, d1, ..., dn)

服從方差為σ2均值為 μ的一元正態分布表示為sigma * np.random.randn(...) + mu

當mu=0,sigma=1時，正態分布就成為標準正態分布

例：np.random.randn(2, 2)

例：Two-by-four array of samples from N(3, 6.25):

3 + 2.5 * np.random.randn(2, 4)

np.random.choice

功能：choice 可以從給定的列表中，以一定概率和方式抽取結果，當不指定概率時為均勻採樣，默認抽取方式為有放回抽樣

形式：numpy.random.choice(a, size=None, replace=True, p=None）

參數：

p 1-D array-like, optional ，The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

replace boolean, optional ，Whether the sample is with or without replacement

size int or tuple of ints, optional Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

a 1-D array-like or int ，If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

例：p中各個概率值相加要為1，replace是布爾類型(True替換原列表，否則不替換).

注意：dtype='<U1'是numpy中表示長度為1位的unicode類型數據。dtype='<U16'就表示長度為16位了

在choice()方法將replace設為False的前提下，當返回個數等於數組長度時（無放回抽樣時）等價於permutation()方法，即打散 np.random.permutation(my_list)

參考：

https://www.numpy.org.cn/user/quickstart.html

https://www.runoob.com/numpy/numpy-tutorial.html

https://numpy.org/doc/stable/reference/

Datawhale十二月Pandas組學習打卡--Task01.預備知識2:NumPy第一部分

相關焦點

Datawhale十二月Pandas組學習打卡--Task01.預備知識2:NumPy第三部分

Datawhale十二月Pandas組學習打卡--Task01.預備知識練習題

Numpy學習打卡task01

Numpy學習打卡task02

Numpy學習打卡task03

Datawhale組隊學習-pandas task04

Numpy學習打卡task05

再見Numpy,Pandas!又一個數據分析神器橫空出現!

天池新聞推薦入門賽學習打卡task01

如何系統地學習Python 中 matplotlib, numpy, scipy, pandas?

Pandas和Numpy的視圖和拷貝

Datawhale組隊學習-pandas task08 文本數據

101道Numpy、Pandas練習題

python-pandas讀寫csv數據

python(pandas)讀取外部數據---使用Pandas讀寫操作txt文件

Python學習120課 pandas簡介kaggle下載數據及pandas讀取外部數據

Python數據分析之numpy學習(二)

AI學習準備工作以及numpy學習

未明學院:7張思維導圖掌握數據分析關鍵庫pandas

Pandas從小白到大師學習指南