深度學習算法工程師面試,記錄一道較為基礎的筆試題:
輸入:目標向量Y(N*1),矩陣X(N*K);輸出:使用隨機梯度下降求得的邏輯回歸係數W(K+1)。
分析:該問題需要先列出邏輯回歸的函數解析式,再選擇損失函數,最後算出損失函數關於更新參數的導數,即可開始隨機梯度下降。
地址:https://www.zhihu.com/people/thisiszhou
01
邏輯回歸解析式
02
Loss函數
由於邏輯回歸的輸出值在[0, 1]之間,並且邏輯回歸雖然名字為回歸,但實際是分類任務,所以損失函數使用交叉熵。其中交叉熵函數解析式為:03
關於更新參數的導數
更新參數有 ,b,Loss的解析式為:04
參數更新(梯度下降)
其中 為學習率。05
代碼實現
import numpy as npfrom random import shuffle
def sigmoid(x): return 1 / (1 + np.exp(-x))
def ce(y, y_label): return -y_label*np.log(y) - (1-y_label)*np.log(1-y)
def ce_grad(y, y_label): return -y_label/y + (1-y_label)/(1-y)np.random.seed(32) n = 6k = 7X = np.random.rand(n,k) y_label = np.random.randint(0,2,size=(n,1)).astype(np.float32)w = np.random.rand(k,1)b = np.random.rand(1,1)def forward(X, w, b): y1 = np.dot(X, w) + b y2 = sigmoid(y1) y3 = ce(y2,y_label) loss = sum(y3) return y1, y2, y3, lossdef gradients(y1, y2, y3, X, y_label): grad1 = np.ones(len(y3)) grad2 = ce_grad(y2, y_label) grad3 = sigmoid(y1)*(1-sigmoid(y1)) grad4_w = X grad4_b = 1 return ( np.dot(grad1, grad2*grad3*grad4_w), np.dot(grad1, grad2*grad3*grad4_b) )array([2.34286961, 2.97101168, 1.98692618, 1.81275096, 2.52826215, 2.42595535, 1.9706045 ])import tensorflow as tf
def ce(y, y_label): return -y_label*tf.log(y) - (1-y_label)*tf.log(1-y) X = tf.Variable(X)w = tf.Variable(w)b = tf.Variable(b)
y1 = tf.matmul(X, w) + by2 = tf.sigmoid(y1)y3 = ce(y2, y_label)loss = tf.reduce_sum(y3)grad = tf.gradients(loss, w)with tf.Session() as sess: sess.run(tf.global_variables_initializer()) ret = sess.run(grad)array([[2.34286961], [2.97101168], [1.98692618], [1.81275096], [2.52826215], [2.42595535], [1.9706045 ]])使用隨機梯度下降進行參數更新。隨機梯度下降,一般會隨機選擇batch,這裡為了簡便,直接將所有向量進行BP:yita = 1e-2train_num = 10000for i in range(train_num): y1, y2, y3, loss = forward(X, w, b) g_w, g_b = gradients(y1, y2, y3, X, y_label) w -= yita*g_w.reshape([-1, 1]) b -= yita*g_b if i % 1000 == 0: print("loss:", loss)loss: [11.6081676]loss: [1.18844796]loss: [0.71728752]loss: [0.49936237]loss: [0.37872785]loss: [0.30340733]loss: [0.25233963]loss: [0.21561081]loss: [0.18800623]loss: [0.16654284]>>> forward(X, w, b)[1]array([[0.01485668], [0.00538101], [0.01436137], [0.01684294], [0.0247247 ], [0.93002105]])>>> y_labelarray([[0.], [0.], [0.], [0.], [0.], [1.]], dtype=float32)06
總結
BP過程相對基礎,但確實不是很簡單。例如,此題loss關於w的導數,是典型的標量關於向量求導。關於向量、矩陣的求導,推薦以下文章:
https://zhuanlan.zhihu.com/p/24709748本文目的在於學術交流,並不代表本公眾號贊同其觀點或對其內容真實性負責,版權歸原作者所有,如有侵權請告知刪除。