神經網絡(1):使用python計算神經網絡中的Jacobian矩陣

2021-03-02 AI算法後丹修煉爐

一般來說，神經網絡是一個多元矢量值函數，如下所示：

函數

在訓練過程中，通常會在輸出中附加標量損失值-分類的典型是預測類概率上的交叉熵損失。使用這種標量損失時，M = 1，使用（隨機）梯度下降來學習參數，重複計算損失函數相對於

在推理階段，網絡的輸出通常是向量（例如，類別概率）。本文解釋了什麼是Jacobian，然後探討並比較了一些用Python完成的可能實現。

雅可比矩陣是什麼，為什麼要關心呢？

假設

這個矩陣告訴我們神經網絡輸入的局部擾動將如何影響輸出。在某些情況下，此信息可能很有價值。例如，在用於創作任務的ML系統中，讓系統向用戶提供一些交互式反饋，告訴他們修改每個輸入維度將如何影響每個輸出類別。
Tensorflow

嘗試用Tensorflow。先設計一個玩具網絡來玩。計算現有網絡

import numpy as np
N = 500  # Input size
H = 100  # Hidden layer size
M = 10   # Output size
w1 = np.random.randn(N, H)  # first affine layer weights
b1 = np.random.randn(H)     # first affine layer bias
w2 = np.random.randn(H, M)  # second affine layer weights
b2 = np.random.randn(M)     # second affine layer bias``
使用Keras實現以下網絡：
import tensorflow as tf
from tensorflow.keras.layers import Dense
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
model = tf.keras.Sequential()
model.add(Dense(H, activation='relu', use_bias=True, input_dim=N))
model.add(Dense(O, activation='softmax', use_bias=True, input_dim=O))
model.get_layer(index=0).set_weights([w1, b1])
model.get_layer(index=1).set_weights([w2, b2])
現在計算該模型的雅可比矩陣。但是Tensorflow當前沒有提供開箱即用的計算Jacobian矩陣的方法。tf.gradients（ys，xs）方法為xs中的每個x返回sum（dy / dx），該方法包含Jacobian行總和的N維向量；不太符合我們的期望。但仍然可以通過計算每個
def jacobian_tensorflow(x):    
    jacobian_matrix = []
    for m in range(M):
        # We iterate over the M elements of the output vector
        grad_func = tf.gradients(model.output[:, m], model.input)
        gradients = sess.run(grad_func, feed_dict={model.input: x.reshape((1, x.size))})
        jacobian_matrix.append(gradients[0][0,:])
        
    return np.array(jacobian_matrix)
使用數值微分檢查來確保計算出的雅可比矩陣是正確的。下面的函數is_jacobian_correct（）接受一個參數，該函數計算Jacobian函數和前饋函數
def is_jacobian_correct(jacobian_fn, ffpass_fn):
    """ Check of the Jacobian using numerical differentiation
    """
    x = np.random.random((N,))
    epsilon = 1e-5
    """ Check a few columns at random
    """
    for idx in np.random.choice(N, 5, replace=False):
        x2 = x.copy()
        x2[idx] += epsilon
        num_jacobian = (ffpass_fn(x2) - ffpass_fn(x)) / epsilon
        computed_jacobian = jacobian_fn(x)
        
        if not all(abs(computed_jacobian[:, idx] - num_jacobian) < 1e-3): 
            return False
    return True
def ffpass_tf(x):
    """ The feedforward function of our neural net
    """    
    xr = x.reshape((1, x.size))
    return model.predict(xr)[0]
is_jacobian_correct(jacobian_tensorflow, ffpass_tf)
輸出結果
>> True
看看此計算需要多長時間：
tic = time.time()
jacobian_tf = jacobian_tensorflow(x0, verbose=False)
tac = time.time()
print('It took %.3f s. to compute the Jacobian matrix' % (tac-tic))
>> It took 0.658 s. to compute the Jacobian matrix
在Macbook Pro 4核CPU上大約需要650毫秒。使用Tensorflow可能會更好; 但在編寫本文時，似乎並不能大幅度改善，因為Tensorflow需要在M個輸出上循環進行梯度計算（請注意，我此處並未嘗試使用GPU）。650毫秒對於這樣的示例來說太慢了，特別是如果考慮到在測試時進行交互使用的情況。
自動微分autograd是一個很好的庫。特別是在Numpy上執行自動區分。要使用它，必須使用Autograd封裝的Numpy指定前饋函數
import autograd.numpy as anp
def ffpass_anp(x):
    a1 = anp.dot(x, w1) + b1   # affine
    a1 = anp.maximum(0, a1)    # ReLU
    a2 = anp.dot(a1, w2) + b2  # affine
    
    exps = anp.exp(a2 - anp.max(a2))  # softmax
    out = exps / exps.sum()
    return out
將其與之前的Tensorflow前饋函數ffpass_tf（）進行比較，來檢查該函數是否正確。
out_anp = ffpass_anp(x0)
out_keras = ffpass_tf(x0)
np.allclose(out_anp, out_keras, 1e-4)
輸出
>> True

現在有相同的函數
from autograd import jacobian
def jacobian_autograd(x):
    return jacobian(ffpass_anp)(x)
is_jacobian_correct(jacobian_autograd, ffpass_np)
輸出
>> True

那麼需要多長時間？
%timeit jacobian_autograd(x0)

輸出
>> 3.69 ms ± 135 µs

所以Tensorflow實現花費了大約650毫秒，而Autograd需要3.7毫秒，速度提高了約170倍。當然，使用Numpy指定模型並不是很方便，因為Tensorflow和Keras提供了許多現成的有用函數和訓練工具……但是現在我們跨過了這一步，使用Numpy編寫了網絡， 也許可以使其更快？如果看一下Autograd的jacobian（）函數的實現，事實證明它仍在映射函數輸出的維度。也許可以直接依靠Numpy更好的矢量化來改善結果。
Numpy如果想要Numpy實現，則必須指定每個圖層的前進和後退路徑，以便自己實現反向傳播。我在下面針對玩具網絡包含的三種類型進行了affine，ReLU和softmax。此處各層的實現是非常通用的（如果僅關心這一網絡，則可以使其更加緊湊）。
反向傳播包含每個網絡輸出的梯度的矩陣（或者在通常情況下為張量），使用Numpy有效的矢量化操作：
def affine_forward(x, w, b):
    """
    Forward pass of an affine layer
    :param x: input of dimension (I, )
    :param w: weights matrix of dimension (I, O)
    :param b: biais vector of dimension (O, )
    :return output of dimension (O, ), and cache needed for backprop
    """
    out = np.dot(x, w) + b
    cache = (x, w)
    return out, cache
def affine_backward(dout, cache):
    """
    Backward pass for an affine layer.
    :param dout: Upstream Jacobian, of shape (M, O)
    :param cache: Tuple of:
      - x: Input data, of shape (I, )
      - w: Weights, of shape (I, O)
    :return the jacobian matrix containing derivatives of the M neural network outputs with respect to
            this layer's inputs, evaluated at x, of shape (M, I)
    """
    x, w = cache
    dx = np.dot(dout, w.T)
    return dx
def relu_forward(x):
    """ Forward ReLU
    """
    out = np.maximum(np.zeros(x.shape), x)
    cache = x
    return out, cache
def relu_backward(dout, cache):
    """
    Backward pass of ReLU
    :param dout: Upstream Jacobian
    :param cache: the cached input for this layer
    :return: the jacobian matrix containing derivatives of the M neural network outputs with respect to
             this layer's inputs, evaluated at x.
    """
    x = cache
    dx = dout * np.where(x > 0, np.ones(x.shape), np.zeros(x.shape))
    return dx
def softmax_forward(x):
    """ Forward softmax
    """
    exps = np.exp(x - np.max(x))
    s = exps / exps.sum()
    return s, s
    
def softmax_backward(dout, cache):
    """
    Backward pass for softmax
    :param dout: Upstream Jacobian
    :param cache: contains the cache (in this case the output) for this layer
    """
    s = cache
    ds = np.diag(s) - np.outer(s, s.T)
    dx = np.dot(dout, ds)
    return dx
現在已經定義了圖層，在前饋和反向傳播過程中使用它們：
def forward_backward(x):
    layer_to_cache = dict()  # for each layer, we store the cache needed for backward pass
    # Forward pass
    a1, cache_a1 = affine_forward(x, w1, b1)
    r1, cache_r1 = relu_forward(a1)
    a2, cache_a2 = affine_forward(r1, w2, b2)
    out, cache_out = softmax_forward(a2)
    # backward pass
    dout = np.diag(np.ones(out.size, ))  # the derivatives of each output w.r.t. each output.
    dout = softmax_backward(dout, cache_out)
    dout = affine_backward(dout, cache_a2)
    dout = relu_backward(dout, cache_r1)
    dx = affine_backward(dout, cache_a1)
    
    return out, dx
前饋輸出是否正確？
out_fb = forward_backward(x0)[0]
out_tf = ffpass_tf(x0)
np.allclose(out_fb, out_tf, 1e-4)
輸出
>> True

雅可比矩陣是否正確？
is_jacobian_correct(lambda x: forward_backward(x)[1], ffpass_tf)

輸出
>> True

最後：需要多長時間？
%timeit forward_backward(x0)
>> 115 µs ± 2.38 µs
在Autograd需要3.7 ms的情況下，現在只需要115 µs。好多了 ：）
結論探索了幾種在CPU上使用Tensorflow，Autograd和Numpy來計算Jacobian矩陣的方法。每種方法都有各自的優缺點。如果準備指定圖層的前饋和反向遍歷，則可以直接使用Numpy來獲得很多性能-對於我的玩具網絡和示例實現，約為5,000倍。當然，過程會因網絡架構而異。通常輸出維數M越大，就可以越過需要遍歷M個標量輸出的方法。
參考文獻
[1]: Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. The Journal of Machine Learning Research, 18(153):1–43, 2018
完整代碼 https://github.com/hrzn/jacobianmatrix/blob/master/Jacobian-matrix-examples.ipynb

神經網絡(1):使用python計算神經網絡中的Jacobian矩陣

相關焦點

圖神經網絡開發必備組件,NetworkX、稀疏矩陣、稀疏Tensor等

基於Python建立深度神經網絡!你學會了嘛?

【神經網絡】神經網絡簡介

用Python實現多層感知器神經網絡

一份完全解讀:是什麼使神經網絡變成圖神經網絡?

深入淺出圖神經網絡實現方式,讓圖神經網絡不再難!

神經網絡初學者指南:基於Scikit-Learn的Python模塊

邊緣計算中深度神經網絡剪枝壓縮的研究簡介

吳恩達深度學習(20)-激活函數的導數和神經網絡的梯度下降

床長人工智慧教程——基於矩陣計算神經網絡輸出的途徑

理清神經網絡中的數學知識

從零學習:從Python和R理解和編碼神經網絡(完整版)

訓練神經網絡的五大算法

GRNN神經網絡(Matlab)

利用TensorFlow構建前饋神經網絡

零基礎入門深度學習 |最終篇:遞歸神經網絡

從零開始用 Python 構建循環神經網絡

人工智慧之卷積神經網絡(CNN)

圖解:卷積神經網絡的數學原理分析

知識卡片遞歸神經網絡

神經網絡(1):使用python計算神經網絡中的Jacobian矩陣

相關焦點

圖神經網絡開發必備組件,NetworkX、稀疏矩陣、稀疏Tensor等

基於Python建立深度神經網絡!你學會了嘛?

【神經網絡】神經網絡簡介

用Python實現多層感知器神經網絡

一份完全解讀:是什麼使神經網絡變成圖神經網絡?

深入淺出圖神經網絡實現方式,讓圖神經網絡不再難!

神經網絡初學者指南:基於Scikit-Learn的Python模塊

邊緣計算中深度神經網絡剪枝壓縮的研究簡介

吳恩達深度學習(20)-激活函數的導數和神經網絡的梯度下降

床長人工智慧教程——基於矩陣計算神經網絡輸出的途徑

理清神經網絡中的數學知識

從零學習:從Python和R理解和編碼神經網絡(完整版)

訓練神經網絡的五大算法

GRNN神經網絡(Matlab)

利用TensorFlow構建前饋神經網絡

零基礎入門深度學習 |最終篇:遞歸神經網絡

從零開始用 Python 構建循環神經網絡

人工智慧之卷積神經網絡(CNN)

圖解:卷積神經網絡的數學原理分析

知識卡片 遞歸神經網絡

知識卡片遞歸神經網絡