目錄
36、求numpy.ndarray兩列相關係數
37、判斷numpy.ndarray中是否有null值
38、使用指定值替代numpy.ndarray中的預設值
39、計算numpy.ndarray元素頻率
40、將numpy.ndarray元素由數值型轉換為分類型
41、由numpy.ndarray已知列得到新列
42、numpy.ndarray概率抽樣
43、numpy.ndarray按某個指標分類後求第二大的元素
44、通過numpy.ndarray某一列排序
45、挑選numpy.ndarray中頻數最高的元素
46、輸出numpy.ndarray中第一次大於給定元素的位置
47、使用給定值替換numpy.ndarray中滿足條件的元素
48、獲取numpy.ndarray中大小排前n的元素位置、元素
49、求numpy.ndarray的row wise counts
50、多個numpy.ndarray合成一個
51、計算numpy.ndarray的one-hot encodings numpy.ndarray
52、create row numbers grouped by a categorical variable
53、create groud ids based on a given categorical variable
54、numpy.ndarray(一維)元素rank
55、numpy.ndarray(多維)元素rank
56、輸出numpy.ndarray每行的最大元素
57、輸出numpy.ndarray每行的最小值與最大值比值
58、判斷numpy.ndarray中元素是否是第一次出現
59、求numpy.ndarray中每組元素的均值
60、將PIL image轉換為numpy.ndarray
61、丟棄numpy.ndarray中所有預設值
62、計算兩個numpy.ndarray的歐幾裡得距離
63、求numpy.ndarray的局部最大值位置
64、numpy.ndarray減法運算
65、輸出numpy.ndarray中元素第n次重複的位置
66、numpy.ndarray數據格式從datetime64轉換為datetime
67、計算numpy.ndarray數據窗口大小
68、指定起始、終止、步長,構建numpy.ndarray
69、補齊非連續時間序列numpy.ndarray
70、構建按指定步長滑窗的numpy.ndarray
36、求numpy.ndarray兩列相關係數url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
#方法1
np.corrcoef(iris[:, 0], iris[:, 2])[0, 1]
#方法2
from scipy.stats.stats import pearsonr
corr, p_value = pearsonr(iris[:, 0], iris[:, 2])
print(corr)
37、判斷numpy.ndarray中是否有null值url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
np.isnan(iris_2d).any()
38、使用指定值替代numpy.ndarray中的預設值url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
iris_2d[np.isnan(iris_2d)] = 0#使用0替代預設值
iris_2d[:4]
39、計算numpy.ndarray元素頻率url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
species = np.array([row.tolist()[4] for row in iris])
# Get the unique values and the counts
np.unique(species, return_counts=True)
40、將numpy.ndarray元素由數值型轉換為分類型'''
需求:
Less than 3 --> 'small'
3-5 --> 'medium'
'>=5 --> 'large'
'''
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Bin petallength
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])
# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]
# View
petal_length_cat[:4]
41、由numpy.ndarray已知列得到新列url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
#計算新列
sepallength = iris_2d[:, 0].astype('float')
petallength = iris_2d[:, 2].astype('float')
volume = (np.pi * petallength * (sepallength**2))/3
# 轉換為iris_2d大小
volume = volume[:, np.newaxis]
#添加新列
out = np.hstack([iris_2d, volume])
out[:4]
42、numpy.ndarray概率抽樣#需求:抽樣結果使得species中setose is twice the number of versicolor and virginica
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
# Get the species column
species = iris[:, 4]
#方法1
np.random.seed(100)
a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
species_out = np.random.choice(a, 150, p=[0.5, 0.25, 0.25])
#方法2
np.random.seed(100)
probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, .750, num=50), np.linspace(.751, 1.0, num=50)]
index = np.searchsorted(probs, np.random.random(150))
species_out = species[index]
print(np.unique(species_out, return_counts=True))
43、numpy.ndarray按某個指標分類後求第二大的元素url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
# Get the species and petal length columns
petal_len_setosa = iris[iris[:, 4] == b'Iris-setosa', [2]].astype('float')
# Get the second last value
np.unique(np.sort(petal_len_setosa))[-2]
44、通過numpy.ndarray某一列排序import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
print(iris[iris[:,0].argsort()][:20])#按第一列排序
45、挑選numpy.ndarray中頻數最高的元素url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
vals, counts = np.unique(iris[:, 2], return_counts=True)
print(vals[np.argmax(counts)])
46、輸出numpy.ndarray中第一次大於給定元素的位置url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
np.argwhere(iris[:, 3].astype(float) > 1.0)[0]
47、使用給定值替換numpy.ndarray中滿足條件的元素#需求:numpy.ndarray中大於30的用30替換、小於10的用10替換
np.set_printoptions(precision=2)
np.random.seed(100)
a = np.random.uniform(1,50, 20)
#方法1
np.clip(a, a_min=10, a_max=30)
#方法2
print(np.where(a < 10, 10, np.where(a > 30, 30, a)))
48、獲取numpy.ndarray中大小排前n的元素位置、元素np.random.seed(100)
a = np.random.uniform(1,50, 20)
##獲取numpy.ndarray中大小排前5的元素位置
#方法1
print(a.argsort())
#方法2
np.argpartition(-a, 5)[:5]
##獲取numpy.ndarray中大小排前5的元素
#方法1
a[a.argsort()][-5:]
#方法2
np.sort(a)[-5:]
#方法3
np.partition(a, kth=-5)[-5:]
#方法4
a[np.argpartition(-a, 5)][:5]
49、求numpy.ndarray的row wise countsnp.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
print(arr)
def counts_of_all_values_rowwise(arr2d):
# Unique values and its counts row wise
num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]
# Counts of all values row wise
return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])
print(np.arange(1,11))
counts_of_all_values_rowwise(arr)
50、多個numpy.ndarray合成一個arr1 = np.arange(3)
arr2 = np.arange(3,7)
arr3 = np.arange(7,10)
array_of_arrays = np.array([arr1, arr2, arr3])
print('array_of_arrays: ', array_of_arrays)
#方法
arr_2d = np.array([a for arr in array_of_arrays for a in arr])
#方法2
arr_2d = np.concatenate(array_of_arrays)
print(arr_2d)
51、計算numpy.ndarray的one-hot encodings numpy.ndarraynp.random.seed(101)
arr = np.random.randint(1,4, size=6)
arr
print(arr)
# Solution:
def one_hot_encodings(arr):
uniqs = np.unique(arr)
out = np.zeros((arr.shape[0], uniqs.shape[0]))
for i, k in enumerate(arr):
out[i, k-1] = 1
return out
one_hot_encodings(arr)
52、create row numbers grouped by a categorical variableurl = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
print(species_small)
print([i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])])
53、create groud ids based on a given categorical variableurl = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
print(species_small)
output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]]
output
54、numpy.ndarray(一維)元素ranknp.random.seed(10)
a = np.random.randint(20, size=10)
print('Array: ', a)
print(a.argsort().argsort())
55、numpy.ndarray(多維)元素ranknp.random.seed(10)
a = np.random.randint(20, size=[2,5])
print(a)
print(a.ravel().argsort().argsort().reshape(a.shape))
56、輸出numpy.ndarray每行的最大元素np.random.seed(100)
a = np.random.randint(1,10, [5,3])
print(a)
# 方法1
np.amax(a, axis=1)
#方法2
np.apply_along_axis(np.max, arr=a, axis=1)
57、輸出numpy.ndarray每行的最小值與最大值比值np.random.seed(100)
a = np.random.randint(1,10, [5,3])
print(a)
np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr=a, axis=1)
58、判斷numpy.ndarray中元素是否是第一次出現np.random.seed(100)
a = np.random.randint(0, 5, 10)
# There is no direct function to do this as of 1.13.3
# Create an all True array
out = np.full(a.shape[0], True)
# Find the index positions of unique elements
unique_positions = np.unique(a, return_index=True)[1]
# Mark those positions as False
out[unique_positions] = False
print(out)
59、求numpy.ndarray中每組元素的均值url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# No direct way to implement this. Just a version of a workaround.
numeric_column = iris[:, 1].astype('float') # sepalwidth
grouping_column = iris[:, 4] # species
# List comprehension version
[[group_val, numeric_column[grouping_column==group_val].mean()] for group_val in np.unique(grouping_column)]
# For Loop version
output = []
for group_val in np.unique(grouping_column):
output.append([group_val, numeric_column[grouping_column==group_val].mean()])
output
60、將PIL image轉換為numpy.ndarrayfrom io import BytesIO
from PIL import Image
import PIL, requests
# Import image from URL
URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'
response = requests.get(URL)
# Read it as Image
I = Image.open(BytesIO(response.content))
# Optionally resize
I = I.resize([150,150])
# Convert to numpy array
arr = np.asarray(I)
# Optionaly Convert it back to an image and show
im = PIL.Image.fromarray(np.uint8(arr))
Image.Image.show(im)
61、丟棄numpy.ndarray中所有預設值a = np.array([1,2,3,np.nan,5,6,7,np.nan])
print(a)
a[~np.isnan(a)]
62、計算兩個numpy.ndarray的歐幾裡得距離a = np.array([1,2,3,4,5])
b = np.array([4,5,6,7,8])
# Solution
dist = np.linalg.norm(a-b)
dist
63、求numpy.ndarray的局部最大值位置a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
doublediff = np.diff(np.sign(np.diff(a)))
peak_locations = np.where(doublediff == -2)[0] + 1
peak_locations
64、numpy.ndarray減法運算#需求:Subtract the 1d array b_1d from the 2d array a_2d, such that each item of b_1d subtracts from respective row of a_2d.
a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]])
b_1d = np.array([1,2,3])
print(a_2d - b_1d[:,None])
65、輸出numpy.ndarray中元素第n次重複的位置x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
print(x)
n = 5
#方法1:列表推導式
[i for i, v in enumerate(x) if v == 1][n-1]#輸出元素1第5次重複的位置
#方法2
np.where(x == 1)[0][n-1]
66、numpy.ndarray數據格式從datetime64轉換為datetimedt64 = np.datetime64('2018-02-25 22:10:10')
#方法1
from datetime import datetime
dt64.tolist()
#方法2
dt64.astype(datetime)
67、計算numpy.ndarray數據窗口大小def moving_average(a, n=3) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
np.random.seed(100)
Z = np.random.randint(10, size=10)
print('array: ', Z)
#方法1
moving_average(Z, n=3).round(2)
#方法2
np.convolve(Z, np.ones(3)/3, mode='valid')
68、指定起始、終止、步長,構建numpy.ndarraylength = 10
start = 5
step = 3
def seq(start, length, step):
end = start + (step*length)
return np.arange(start, end, step)
seq(start, length, step)
69、補齊非連續時間序列numpy.ndarraydates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
print(dates)
#方法1
filled_in = np.array([
np.arange(date, (date + d)) for date, d in zip(dates, np.diff(dates))
]).reshape(-1)
output = np.hstack([filled_in, dates[-1]])
output
#方法2
out = []
for date, d in zip(dates, np.diff(dates)):
out.append(np.arange(date, (date + d)))
filled_in = np.array(out).reshape(-1)
output = np.hstack([filled_in, dates[-1]])
output
70、構建按指定步長滑窗的numpy.ndarrayimport numpy as np
def gen_strides(a, stride_len=5, window_len=5):
n_strides = ((a.size - window_len) // stride_len) + 1
# return np.array([a[s:(s+window_len)] for s in np.arange(0, a.size, stride_len)[:n_strides]])
return np.array([
a[s:(s + window_len)]
for s in np.arange(0, n_strides * stride_len, stride_len)
])
print(gen_strides(np.arange(15), stride_len=2, window_len=4))