終於來了!TensorFlow 2.0入門指南(上篇)

2021-03-06 Python之禪

本文來自公眾號機器學習算法工程師，致力於為機器學習、深度學習、數據挖掘等AI技術的「初學者」或者「愛好者」，進行基礎理論與實戰技能的介紹和學習，感興趣可以關注。

TensorFlow雖是深度學習領域最廣泛使用的框架，但是對比PyTorch這一動態圖框架，採用靜態圖（Graph模式）的TensorFlow確實是難用。好在最近TensorFlow支持了eager模式，對標PyTorch的動態執行機制。更進一步地，Google在最近推出了全新的版本TensorFlow 2.0，2.0版本相比1.0版本不是簡單地更新，而是一次重大升級（雖然目前只發布了preview版本）。簡單地來說，TensorFlow 2.0默認採用eager執行模式，而且重整了很多混亂的模塊。毫無疑問，2.0版本將會逐漸替換1.0版本，所以很有必要趁早入手TensorFlow 2.0。這篇文章將簡明扼要地介紹TensorFlow 2.0，以求快速入門。

Eager執行

AutoGraph

性能優化：tf.function

Eager執行

TensorFlow的Eager執行時一種命令式編程（imperative programming），這和原生Python是一致的，當你執行某個操作時是立即返回結果的。而TensorFlow一直是採用Graph模式，即先構建一個計算圖，然後需要開啟Session，餵進實際的數據才真正執行得到結果。顯然，eager執行更簡潔，我們可以更容易debug自己的代碼，這也是為什麼PyTorch更簡單好用的原因。一個簡單的例子如下：

x = tf.ones((2, 2), dtype=tf.dtypes.float32)y = tf.constant([[1, 2],                 [3, 4]], dtype=tf.dtypes.float32)z = tf.matmul(x, y)print(z)# tf.Tensor(# [[4. 6.]#  [4. 6.]], shape=(2, 2), dtype=float32)print(z.numpy())# [[4. 6.]# [4. 6.]]
可以看到在eager執行下，每個操作後的返回值是tf.Tensor，其包含具體值，不再像Graph模式下那樣只是一個計算圖節點的符號句柄。由於可以立即看到結果，這非常有助於程序debug。更進一步地，調用tf.Tensor.numpy()方法可以獲得Tensor所對應的numpy數組。
這種eager執行的另外一個好處是可以使用Python原生功能，比如下面的條件判斷：
random_value = tf.random.uniform([], 0, 1)x = tf.reshape(tf.range(0, 4), [2, 2])print(random_value)if random_value.numpy() > 0.5:    y = tf.matmul(x, x)else:    y = tf.add(x, x)
這種動態控制流主要得益於eager執行得到Tensor可以取出numpy值，這避免了使用Graph模式下的tf.cond和tf.while等算子。
另外一個重要的問題，在egaer模式下如何計算梯度。在Graph模式時，我們在構建模型前向圖時，同時也會構建梯度圖，這樣實際餵數據執行時可以很方便計算梯度。但是eager執行是動態的，這就需要每一次執行都要記錄這些操作以計算梯度，這是通過tf.GradientTape來追蹤所執行的操作以計算梯度，下面是一個計算實例：
w = tf.Variable([[1.0]])with tf.GradientTape() as tape:  loss = w * w + 2. * w + 5.grad = tape.gradient(loss, w)print(grad)  
對於eager執行，每個tape會記錄當前所執行的操作，這個tape只對當前計算有效，並計算相應的梯度。PyTorch也是動態圖模式，但是與TensorFlow不同，它是每個需要計算Tensor會擁有grad_fn以追蹤歷史操作的梯度。
TensorFlow 2.0引入的eager提高了代碼的簡潔性，而且更容易debug。但是對於性能來說，eager執行相比Graph模式會有一定的損失。這不難理解，畢竟原生的Graph模式是先構建好靜態圖，然後才真正執行。這對於在分布式訓練、性能優化和生產部署方面具有優勢。但是好在，TensorFlow 2.0引入了tf.function和AutoGraph來縮小eager執行和Graph模式的性能差距，其核心是將一系列的Python語法轉化為高性能的graph操作。

AutoGraphAutoGraph在TensorFlow 1.x已經推出，主要是可以將一些常用的Python代碼轉化為TensorFlow支持的Graph代碼。一個典型的例子是在TensorFlow中我們必須使用tf.while和tf.cond等複雜的算子來實現動態流程控制，但是現在我們可以使用Python原生的for和if等語法寫代碼，然後採用AutoGraph轉化為TensorFlow所支持的代碼，如下面的例子:
def square_if_positive(x):    if x > 0:        x = x * x    else:        x = 0.0    return x# eager 模式print('Eager results: %2.2f, %2.2f' % (square_if_positive(tf.constant(9.0)),                                       square_if_positive(tf.constant(-9.0))))# graph 模式tf_square_if_positive = tf.autograph.to_graph(square_if_positive)with tf.Graph().as_default():        g_out1 = tf_square_if_positive(tf.constant( 9.0))    g_out2 = tf_square_if_positive(tf.constant(-9.0))    with tf.compat.v1.Session() as sess:        print('Graph results: %2.2f, %2.2f\n' % (sess.run(g_out1), sess.run(g_out2)))
上面我們定義了一個square_if_positive函數，它內部使用的Python的原生的if語法，對於TensorFlow 2.0的eager執行，這是沒有問題的。然而這是TensorFlow 1.x所不支持的，但是使用AutoGraph可以將這個函數轉為Graph函數，你可以將其看成一個常規TensorFlow op，其可以在Graph模式下運行（tf2 沒有Session，這是tf1.x的特性，想使用tf1.x的話需要調用tf.compat.v1）。大家要注意eager模式和Graph模式的差異，儘管結果是一樣的，但是Graph模式更高效。 
從本質上講，AutoGraph是將Python代碼轉為TensorFlow原生的代碼，我們可以進一步看到轉化後的代碼:
print(tf.autograph.to_code(square_if_positive))#################################################from __future__ import print_functiondef tf__square_if_positive(x):  try:    with ag__.function_scope('square_if_positive'):      do_return = False      retval_ = None      cond = ag__.gt(x, 0)      def if_true():        with ag__.function_scope('if_true'):          x_1, = x,          x_1 = x_1 * x_1          return x_1      def if_false():        with ag__.function_scope('if_false'):          x = 0.0          return x      x = ag__.if_stmt(cond, if_true, if_false)      do_return = True      retval_ = x      return retval_  except:    ag__.rewrite_graph_construction_error(ag_source_map__)tf__square_if_positive.autograph_info__ = {}
可以看到AutoGraph轉化的代碼定義了兩個條件函數，然後調用if_stmt op，應該就是類似tf.cond的op。
AutoGraph支持很多Python特性，比如循環: 
def sum_even(items):    s = 0    for c in items:        if c % 2 > 0:            continue        s += c    return sprint('Eager result: %d' % sum_even(tf.constant([10,12,15,20])))tf_sum_even = tf.autograph.to_graph(sum_even)with tf.Graph().as_default(), tf.compat.v1.Session() as sess:    print('Graph result: %d\n\n' % sess.run(tf_sum_even(tf.constant([10,12,15,20]))))
對於大部分Python特性AutoGraph是支持的，但是其仍然有限制，具體可以見Capabilities and Limitations。
連結：
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/LIMITATIONS.md
此外，要注意的一點是，經過AutoGraph轉換的新函數是可以eager模式下執行的，但是性能卻並不會比轉換前的高，你可以對比：
x = tf.constant([10, 12, 15, 20])print("Eager at orginal code:", timeit.timeit(lambda: sum_even(x), number=100))print("Eager at autograph code:", timeit.timeit(lambda: tf_sum_even(x), number=100))with tf.Graph().as_default(), tf.compat.v1.Session() as sess:    graph_op = tf_sum_even(tf.constant([10, 12, 15, 20]))    sess.run(graph_op)      print("Graph at autograph code:", timeit.timeit(lambda: sess.run(graph_op), number=100))##########################################Eager at orginal code: 0.05176109499999981Eager at autograph code: 0.11203173799999977Graph at autograph code: 0.03418808900000059
從結果上看，Graph模式下的執行效率是最高的，原來的代碼在eager模式下效率次之，經AutoGraph轉換後的代碼效率最低。
所以，在TensorFlow 2.0，我們一般不會直接使用tf.autograph，因為eager執行下效率沒有提升。要真正達到Graph模式下的效率，要依賴tf.function這個更強大的利器。
性能優化：tf.function條件隨機場（Conditional Random Field，簡稱CRF）是一種判別式無向圖模型。生成式模型是直接對聯合分布進行建模，而判別式模型則是對條件分布進行建模。前面介紹的隱馬爾可夫模型和馬爾可夫隨機場都是生成式模型，而條件隨機場是判別式模型。
儘管eager執行更簡潔，但是Graph模式卻是性能更高，為了減少這個性能gap，TensorFlow 2.0引入了tf.function，先給出官方對tf.function的說明:
function constructs a callable that executes a TensorFlow graph (tf.Graph) created by tracing the TensorFlow operations in func. This allows the TensorFlow runtime to apply optimizations and exploit parallelism in the computation defined by func.
簡單來說，就是tf.function可以將一個func中的TensorFlow操作構建為一個Graph，這樣在調用時是執行這個Graph，這樣計算性能更優。比如下面的例子:
def f(x, y):    print(x, y)    return tf.reduce_mean(tf.multiply(x ** 2, 3) + y)g = tf.function(f)x = tf.constant([[2.0, 3.0]])y = tf.constant([[3.0, -2.0]])# `f` and `g` will return the same value, but `g` will be executed as a# TensorFlow graph.assert f(x, y).numpy() == g(x, y).numpy()# tf.Tensor([[2. 3.]], shape=(1, 2), dtype=float32) tf.Tensor([[ 3. -2.]], shape=(1, 2), dtype=float32)# Tensor("x:0", shape=(1, 2), dtype=float32) Tensor("y:0", shape=(1, 2), dtype=float32)
如上面的例子，被tf.function裝飾的函數將以Graph模式執行，可以把它想像一個封裝了Graph的TF op，直接調用它也會立即得到Tensor結果，但是其內部是高效執行的。我們在內部列印Tensor時，eager執行會直接列印Tensor的值，而Graph模式列印的是Tensor句柄，其無法調用numpy方法取出值，這和TF 1.x的Graph模式是一致的。
由於tf.function裝飾的函數是Graph執行，其執行速度一般要比eager模式要快，當Graph包含很多小操作時差距更明顯，可以比較下卷積和LSTM的性能差距：
import timeitconv_layer = tf.keras.layers.Conv2D(100, 3)@tf.functiondef conv_fn(image):  return conv_layer(image)image = tf.zeros([1, 200, 200, 100])# warm upconv_layer(image); conv_fn(image)print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))# 單純的卷積差距不是很大# Eager conv: 0.44013839924952197# Function conv: 0.3700763391782858lstm_cell = tf.keras.layers.LSTMCell(10)@tf.functiondef lstm_fn(input, state):  return lstm_cell(input, state)input = tf.zeros([10, 10])state = [tf.zeros([10, 10])] * 2# warm uplstm_cell(input, state); lstm_fn(input, state)print("eager lstm:", timeit.timeit(lambda: lstm_cell(input, state), number=10))print("function lstm:", timeit.timeit(lambda: lstm_fn(input, state), number=10))# 對於LSTM比較heavy的計算，Graph執行要快很多# eager lstm: 0.025562446062237565# function lstm: 0.0035498656569271647
要想靈活使用tf.function，必須深入理解它背後的機理，這裡簡單地談一下。在TF 1.x時，首先要創建靜態計算圖，然後新建Session真正執行不同的運算：
import tensorflow as tfx = tf.placeholder(tf.float32)y = tf.square(x)z = tf.add(x, y)sess = tf.Session()z0 = sess.run([z], feed_dict={x: 2.})        z1 = sess.run([z], feed_dict={x: 2., y: 2.}) 
儘管上面只定義了一個graph，但是兩次不同的sess執行（運行時）其實是執行兩個不同的程序或者說subgraph：
def compute_z0(x):  return tf.add(x, tf.square(x))def compute_z1(x, y):  return tf.add(x,  y)
這裡我們將兩個不同的subgraph封裝到了兩個python函數中。更進一步地，我們可以不再需要Session，當執行這兩個函數時，直接調用對應的計算圖就可以，這就是tf.function的功效：
import tensorflow as tf@tf.functiondef compute_z1(x, y):  return tf.add(x, y)@tf.functiondef compute_z0(x):  return compute_z1(x, tf.square(x))z0 = compute_z0(2.)z1 = compute_z1(2., 2.)
可以說tf.function內部管理了一系列Graph，並控制了Graph的執行。另外一個問題時，雖然函數內部定義了一系列的操作，但是對於不同的輸入，是需要不同的計算圖。如函數的輸入Tensor的shape或者dtype不同，那麼計算圖是不同的，好在tf.function支持這種多態性（polymorphism）
@tf.functiondef double(a):  print("Tracing with", a)  return a + aprint(double(tf.constant(1)))print(double(tf.constant(1.1)))print(double(tf.constant([1, 2])))
注意函數內部的列印，當輸入tensor的shape或者類型發生變化，列印的東西也是相應改變。所以，它們的計算圖（靜態的）並不一樣。tf.function這種多態特性其實是背後追蹤了（tracing）不同的計算圖。具體來說，被tf.function裝飾的函數f接受一定的Tensors，並返回0到任意到Tensor，當裝飾後的函數F被執行時：
根據輸入Tensors的shape和dtypes確定一個"trace_cache_key"；
每個"trace_cache_key"映射了一個Graph，當新的"trace_cache_key"要建立時，f將構建一個新的Graph，若"trace_cache_key"已經存在，那麼直需要從緩存中查找已有的Graph即可；
將輸入Tensors餵進這個Graph，然後執行得到輸出Tensors。
這種多態性是我們需要的，因為有時候我們希望輸入不同shape或者dtype的Tensors，但是當"trace_cache_key"越來越多時，意味著你要cache了龐大的Graph，這點是要注意的。另外，tf.function提供了input_signature，這個參數採用tf.TensorSpec指定了輸入到函數的Tensor的shape和dtypes，如下面的例子：
@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])def f(x):    return tf.add(x, 1.)print(f(tf.constant(1.0)))  print(f(tf.constant([1.0,]))) print(f(tf.constant([1])))  
此時，輸入Tensor的dtype必須是float32，但是shape不限制，當類型不匹配時會出錯。
tf.function的另外一個參數是autograph，默認是True，意思是在構建Graph時將自動使用AutoGraph，這樣你可以在函數內部使用Python原生的條件判斷以及循環語句，因為它們會被tf.cond和tf.while_loop轉化為Graph代碼。注意的一點是判斷分支和循環必須依賴於Tensors才會被轉化，當autograph為False時，如果存在判斷分支和循環必須依賴於Tensors的情況將會出錯。如下面的例子。
def sum_even(items):  s = 0  for c in items:    if c % 2 > 0:      continue    s += c  return ssum_even_autograph_on = tf.function(sum_even, autograph=True)sum_even_autograph_off = tf.function(sum_even, autograph=False)x = tf.constant([10, 12, 15, 20])sum_even(x) sum_even_autograph_on(x) sum_even_autograph_off(x) 
很容易理解，應用tf.function之後是Graph模式，Tensors是不能被遍歷的，但是採用AutoGraph可以將其轉換為Graph代碼，所以可以成功。大部分情況，我們還是默認開啟autograph。
最要的是tf.function可以應用到類方法中，並且可以引用tf.Variable，可以看下面的例子：
class ScalarModel(object):  def __init__(self):    self.v = tf.Variable(0)  @tf.function  def increment(self, amount):    self.v.assign_add(amount)model1 = ScalarModel()model1.increment(tf.constant(3))assert int(model1.v) == 3model1.increment(tf.constant(4))assert int(model1.v) == 7model2 = ScalarModel()  model2.increment(tf.constant(5))assert int(model2.v) == 5
後面會講到，這個特性可以應用到tf.Keras的模型構建中。上面這個例子還有一點，就是可以在function中使用tf.assign這類具有副作用（改變Variable的值）的操作，這對於模型訓練比較重要。
前面說過，python原生的print函數只會在構建Graph時列印一次Tensor句柄。如果想要列印Tensor的具體值，要使用tf.print：
@tf.functiondef print_element(items):    for c in items:      tf.print(c)x = tf.constant([1, 5, 6, 8, 3])print_element(x)
這裡就對tf.function做這些介紹，但是實際上其還有更多複雜的使用須知，詳情可以參考TensorFlow 2.0: Functions, not Sessions。
連結：
https://github.com/tensorflow/community/blob/master/rfcs/20180918-functions-not-sessions-20.md
參考：
TensorFlow官網：
https://tensorflow.google.cn/versions/r2.0/api_docs/

終於來了!TensorFlow 2.0入門指南(上篇)

相關焦點

終於來了!TensorFlow 2.0 入門指南

TensorFlow 2入門指南,初學者必備!

TensorFlow極速入門

【TensorFlow2.0】簡單粗暴入門TensorFlow 2.0,全中文教學,北大學霸出品

tensorflow極速入門

《30天吃掉那隻 TensorFlow2.0 》全新TF2.0教程收穫1000 Star

運行tensorflow2.0出錯

Tensorflow 全網最全學習資料匯總之Tensorflow 的入門與安裝【2】

GitHub標星2000+,如何用30天啃完TensorFlow2.0?

如何用30天吃掉TensorFlow2.0?

一文上手最新Tensorflow2.0系列|TensorFlow2.0安裝

TF - GAN入門:TensorFlow 2.0 的輕量級 GAN 庫

TensorFlow 2.0正式版官宣!深度集成Keras

一文上手Tensorflow2.0之tf.keras|三

北大學霸出的中文教程:簡單粗暴入門 TensorFlow 2.0

Tensorflow 2.0 到底好在哪裡?

TensorFlow 2.0 基礎:張量、自動求導與優化器

tensorflow安裝教程

TensorFlow從入門到精通 | 01 簡單線性模型(上篇)

終於來了,TensorFlow 新增官方 Windows 支持