作為TensorFlow的底層語言,你會用C++構建深度神經網絡嗎?

2020-11-29 機器之心Pro

選自Matrices.io

作者：Florian Courtial

機器之心編譯

參與：李澤南、蔣思源

目前流行的深度學習框架 TensorFlow（TensorFlow 中文官方公眾號已於月初發布）是以 C++為底層構建的，但絕大多數人都在 Python 上使用 TensorFlow 來開發自己的模型。隨著 C++ API 的完善，直接使用 C++來搭建神經網絡已經成為可能，本文將向你介紹一種簡單的實現方法。

很多人都知道 TensorFlow 的核心是構建在 C++之上的，但是這種深度學習框架的大多數功能只在 Python API 上才方便使用。

當我寫上一篇文章的時候，我的目標是僅使用 TensorFlow 中的 C++ API 和 CuDNN 來實現基本的深度神經網絡（DNN）。在實踐中，我意識到在這個過程中我們忽略了很多東西。

注意，使用外部操作（exotic operations）訓練神經網絡是不可能的，你面臨的錯誤最有可能就是缺少梯度運算。目前我正在試圖將 Python 上的梯度運算遷移到 C++上。

在本文中，我將展示如何使用 TensorFlow 在 C++ 上構建深度神經網絡，並通過車齡、公裡數和使用油品等條件為寶馬 1 系汽車進行估價。目前，我們還沒有可用的 C++ 優化器，所以你會看到訓練代碼看起來不那麼吸引人，但是我們會在未來加入的。

本文章遵從 TensorFlow 1.4 C++ API 官方指南：https:///api_guides/cc/guide

代碼 GitHub：

安裝

我們會在 C++ 中運行 TensorFlow 框架，我們需要嘗試使用已編譯的庫，但肯定有些人會因為環境的特殊性而遇到麻煩。從頭開始構建 TensorFlow 將避免這些問題，同時確保使用的是最新版本的 API。

首先，你需要安裝 bazel 構建工具，這裡有安裝方法：

在 OSX 上 brew 就足夠了：

brew install bazel

你需要從 TensorFlow 源文件開始構建：

mkdir /path/tensorflow

cd /path/tensorflow

git clone

隨後你需要進行配置，如選擇是否使用 GPU，你需要這樣運行配置腳本：

cd /path/tensorflow

./configure

現在我們要創建接收 TensorFlow 模型代碼的文件。請注意，第一次構建需要花費很長一段時間（10-15 分鐘）。非核心的 C++ TF 代碼在 /tensorflow/cc 中，這是我們創建模型文件的位置，我們也需要 BUILD 文件讓 bazel 可以構建模型。

mkdir /path/tensorflow/model

cd /path/tensorflow/model

touch model.cc

touch BUILD

我們在 BUILD 文件中加入 bazel 指令：

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

tf_cc_binary(

name = "model",

srcs = [

"model.cc",

deps = [

"//tensorflow/cc:gradients",

"//tensorflow/cc:grad_ops",

"//tensorflow/cc:cc_ops",

"//tensorflow/cc:client_session",

"//tensorflow/core:tensorflow"

)

基本上，它會使用 model.cc 構建一個二進位文件。現在，我們可以開始編寫自己的模型了。

讀取數據

這些數據從法國網站 leboncoin.fr 上摘取，隨後被清理和歸一化，並被存儲於 CSV 文件中。我們的目標是讀取這些數據。經歸一化的源數據被存儲在 CSV 文件的第一行，我們需要使用它們重構神經網絡輸出的價格。所以，我們創建 data_set.h 和 data_set.cc 文件來保持代碼清潔。它們從 CSV 文件中生成一個浮點型的二維數組，並用於饋送到神經網絡。

data_set.h

using namespace std;

// Meta data used to normalize the data set. Useful to

// go back and forth between normalized data.

classDataSetMetaData {

friend classDataSet;

private:

float mean_km;

float std_km;

float mean_age;

float std_age;

float min_price;

float max_price;

};

enum classFuel {

DIESEL,

GAZOLINE

};

classDataSet {

public:

// Construct a data set from the given csv file path.

DataSet(string path) {

ReadCSVFile(path);

}

// getters

vector<float>& x() { return x_; }

vector<float>& y() { return y_; }

// read the given csv file and complete x_ and y_

void ReadCSVFile(string path);

// convert one csv line to a vector of float

vector<float> ReadCSVLine(string line);

// normalize a human input using the data set metadata

initializer_list<float> input(float km, Fuel fuel, float age);

// convert a price outputted by the DNN to a human price

float output(float price);

private:

DataSetMetaData data_set_metadata;

vector<float> x_;

vector<float> y_;

};

data_set.cc

#include <vector>

#include <fstream>

#include <sstream>

#include <iostream>

#include "data_set.h"

using namespace std;

void DataSet::ReadCSVFile(string path) {

ifstream file(path);

stringstream buffer;

buffer << file.rdbuf();

string line;

vector<string> lines;

while(getline(buffer, line, '\n')) {

lines.push_back(line);

}

// the first line contains the metadata

vector<float> metadata = ReadCSVLine(lines[]);

data_set_metadata.mean_km = metadata[];

data_set_metadata.std_km = metadata[1];

data_set_metadata.mean_age = metadata[2];

data_set_metadata.std_age = metadata[3];

data_set_metadata.min_price = metadata[4];

data_set_metadata.max_price = metadata[5];

// the other lines contain the features for each car

for (int i = 2; i < lines.size(); ++i) {

vector<float> features = ReadCSVLine(lines[i]);

x_.insert(x_.end(), features.begin(), features.begin() + 3);

y_.push_back(features[3]);

}

vector<float> DataSet::ReadCSVLine(string line) {

vector<float> line_data;

std::stringstream lineStream(line);

std::string cell;

while(std::getline(lineStream, cell, ','))

{

line_data.push_back(stod(cell));

}

return line_data;

}

initializer_list<float> DataSet::input(float km, Fuel fuel, float age) {

km = (km - data_set_metadata.mean_km) / data_set_metadata.std_km;

age = (age - data_set_metadata.mean_age) / data_set_metadata.std_age;

float f = fuel == Fuel::DIESEL ? -1.f : 1.f;

return {km, f, age};

}

float DataSet::output(float price) {

return price * (data_set_metadata.max_price - data_set_metadata.min_price) + data_set_metadata.min_price;

}

我們必須在 bazel BUILD 文件中添加這兩個文件。

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

tf_cc_binary(

name = "model",

srcs = [

"model.cc",

"data_set.h",

"data_set.cc"

deps = [

"//tensorflow/cc:gradients",

"//tensorflow/cc:grad_ops",

"//tensorflow/cc:cc_ops",

"//tensorflow/cc:client_session",

"//tensorflow/core:tensorflow"

)

構建模型

第一步是讀取 CSV 文件，並提取出兩個張量，其中 x 是輸入，y 為預期的真實結果。我們使用之前定義的 DataSet 類。

CSV 數據集下載連結：

DataSet data_set("/path/normalized_car_features.csv");

Tensor x_data(DataTypeToEnum<float>::v(),

TensorShape{static_cast<int>(data_set.x().size())/3, 3});

copy_n(data_set.x().begin(), data_set.x().size(),

x_data.flat<float>().data());

Tensor y_data(DataTypeToEnum<float>::v(),

TensorShape{static_cast<int>(data_set.y().size()), 1});

copy_n(data_set.y().begin(), data_set.y().size(),

y_data.flat<float>().data());

要定義一個張量，我們需要知道它的類型和形狀。在 data_set 對象中，x 數據以向量的方式保存，所以我們將尺寸縮減為 3（每個保存三個特徵）。隨後我們使用 std::copy_n 來從 data_set 對象中複製數據到 Tensor（一個 Eigen::TensorMap）的底層數據結構中。現在，我們有了數據和 TensorFlow 數據結構，是時候構建模型了。

你可以輕易地調試一個張量：

LOG(INFO) << x_data.DebugString();

C ++ API 的獨特之處在於，您需要一個 Scope 對象來保持構建靜態計算圖的狀態，並將該對象傳遞給每個操作。

Scope scope = Scope::NewRootScope();

我們需要兩個佔位符，x 包含特徵，y 代表每輛車相應的價格。

auto x = Placeholder(scope, DT_FLOAT);

auto y = Placeholder(scope, DT_FLOAT);

我們的網絡有兩個隱藏層，因此我們會有三個權重矩陣和三個偏置項向量。在 Python 中，它是由底層直接完成的，在 C++ 中你必須定義一個變量，隨後定義一個 Assign 節點以為該變量分配一個默認值。我們使用 RandomNormal 來初始化我們的變量，這會給我們一個服從正態分布的隨機值。

// weights init

auto w1 = Variable(scope, {3, 3}, DT_FLOAT);

auto assign_w1 = Assign(scope, w1, RandomNormal(scope, {3, 3}, DT_FLOAT));

auto w2 = Variable(scope, {3, 2}, DT_FLOAT);

auto assign_w2 = Assign(scope, w2, RandomNormal(scope, {3, 2}, DT_FLOAT));

auto w3 = Variable(scope, {2, 1}, DT_FLOAT);

auto assign_w3 = Assign(scope, w3, RandomNormal(scope, {2, 1}, DT_FLOAT));

// bias init

auto b1 = Variable(scope, {1, 3}, DT_FLOAT);

auto assign_b1 = Assign(scope, b1, RandomNormal(scope, {1, 3}, DT_FLOAT));

auto b2 = Variable(scope, {1, 2}, DT_FLOAT);

auto assign_b2 = Assign(scope, b2, RandomNormal(scope, {1, 2}, DT_FLOAT));

auto b3 = Variable(scope, {1, 1}, DT_FLOAT);

auto assign_b3 = Assign(scope, b3, RandomNormal(scope, {1, 1}, DT_FLOAT));

隨後我們使用 Tanh 作為激活函數來構建三個層。

// layers

auto layer_1 = Tanh(scope, Add(scope, MatMul(scope, x, w1), b1));

auto layer_2 = Tanh(scope, Add(scope, MatMul(scope, layer_1, w2), b2));

auto layer_3 = Tanh(scope, Add(scope, MatMul(scope, layer_2, w3), b3));

加入 L2 正則化。

// regularization

auto regularization = AddN(scope,

initializer_list<Input>{L2Loss(scope, w1),

L2Loss(scope, w2),

L2Loss(scope, w3)});

最後計算損失函數，即計算預測價格和實際價格 y 之間的差異，並添加正則化到損失函數中。

// loss calculation

auto loss = Add(scope,

ReduceMean(scope, Square(scope, Sub(scope, layer_3, y)), {, 1}),

Mul(scope, Cast(scope, 0.01, DT_FLOAT), regularization));

在這裡，我們完成了前向傳播，現在該進行反向傳播了。第一步是調用函數以在前向傳播操作的計算圖中加入梯度運算。

// add the gradients operations to the graph

std::vector<Output> grad_outputs;

TF_CHECK_OK(AddSymbolicGradients(scope, {loss}, {w1, w2, w3, b1, b2, b3}, &grad_outputs));

所有的運算都需要計算損失函數對每一個變量的導數並添加到計算圖中，我們初始化 grad_outputs 為一個空向量，它在 TensorFlow 會話打開時會將梯度傳入節點，grad_outputs[0] 會提供損失函數對 w1 的導數，grad_outputs[1] 提供損失函數對 w2 的導數，這一過程會根據 {w1, w2, w3, b1,b2, b3} 的順序，也是變量被傳遞到 AddSymbolicGradients 的順序進行。

現在我們在 grad_outputs 有一系列節點，當在 TensorFlow 會話中使用時，每個節點計算損失函數對一個變量的梯度。我們需要使用它來更新變量。所以，我們在每行放一個變量，使用梯度下降這個最簡單的方法來更新。

// update the weights and bias using gradient descent

auto apply_w1 = ApplyGradientDescent(scope, w1, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[]});

auto apply_w2 = ApplyGradientDescent(scope, w2, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[1]});

auto apply_w3 = ApplyGradientDescent(scope, w3, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[2]});

auto apply_b1 = ApplyGradientDescent(scope, b1, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[3]});

auto apply_b2 = ApplyGradientDescent(scope, b2, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[4]});

auto apply_b3 = ApplyGradientDescent(scope, b3, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[5]});

Cast 操作實際上是學習速率的參數，在這裡是 0.01。

我們神經網絡的計算圖已經構建完畢，現在可以打開一個會話並運行該計算圖。基於 Python 的 Optimizers API 基本封裝了計算和應用過程中的損失函數最小化方法。當 Optimizer API 可以接入 C++ 時我們就可以在這裡使用它了。

我們初始化一個以 ClientSession 和一個以 Tensor 命名的輸出向量，用來接收網絡的輸出。

ClientSession session(scope);

std::vector<Tensor> outputs;

隨後在 Python 中調用 tf.global_variables_initializer() 就可以初始化變量，因為在構建計算圖時，所有變量的列表都是保留的。在 C++中，我們必須列出變量。每個 RandomNormal 輸出會分配給 Assign 節點中定義的變量。

// init the weights and biases by running the assigns nodes once

TF_CHECK_OK(session.Run({assign_w1, assign_w2, assign_w3, assign_b1, assign_b2, assign_b3}, nullptr));

在這一點上，我們可以在訓練數量內循環地更新參數，在我們的例子中是 5000 步。第一步是使用 loss 節點運行前向傳播部分，輸出是網絡的損失。每 100 步我們都會記錄一次損失值，損失的減少是網絡成功運行的標誌。隨後我們必須計算梯度節點並更新變量。我們的梯度節點是 ApplyGradientDescent 節點的輸入，所以運行 apply_nodes 會首先計算梯度，隨後將其應用到正確的變量上。

// training steps

for (int i = ; i < 5000; ++i) {

TF_CHECK_OK(session.Run({{x, x_data}, {y, y_data}}, {loss}, &outputs));

if (i % 100 == ) {

std::cout << "Loss after " << i << " steps " << outputs[].scalar<float>() << std::endl;

}

// nullptr because the output from the run is useless

TF_CHECK_OK(session.Run({{x, x_data}, {y, y_data}}, {apply_w1, apply_w2, apply_w3, apply_b1, apply_b2, apply_b3, layer_3}, nullptr));

}

在網絡訓練到這種程度後，我們可以嘗試預測汽車的價格了——進行推斷。讓我們來嘗試預測一輛車齡為 7 年，裡程 11 萬公裡，柴油發動機的寶馬 1 系轎車。為了這樣做我們需要運行 layer_3 節點，將汽車的數據輸入 x，這是一個前向傳播的步驟。因為我們之前運行了 5000 步的訓練，權重已經得到了學習，所以輸出的結果將不是隨機的。

我們不能直接使用汽車的屬性，因為我們的神經網絡是從歸一化屬性中學習的，所以數據必須經過同樣的歸一化過程。DataSet 類有一個 input 方法在 CSV 讀取器件處理數據集中的元數據。

// prediction using the trained neural net

TF_CHECK_OK(session.Run({{x, {data_set.input(110000.f, Fuel::DIESEL, 7.f)}}}, {layer_3}, &outputs));

cout << "DNN output: " << *outputs[].scalar<float>().data() << endl;

std::cout << "Price predicted " << data_set.output(*outputs[].scalar<float>().data()) << " euros" << std::endl;

網絡的輸出值在 0 到 1 之間，data_set 的 output 方法還負責將數值從元數據轉換回人類可讀的數字。模型可以使用 bazel run -c opt //tensorflow/cc/models:model 命令來運行，如果 TensorFlow 剛剛被編譯，你可以看到這樣形式的輸出：

Loss after steps 0.317394

Loss after 100 steps 0.0503757

Loss after 200 steps 0.0487724

Loss after 300 steps 0.047366

Loss after 400 steps 0.0460944

Loss after 500 steps 0.0449263

Loss after 600 steps 0.0438395

Loss after 700 steps 0.0428183

Loss after 800 steps 0.041851

Loss after 900 steps 0.040929

Loss after 1000 steps 0.0400459

Loss after 1100 steps 0.0391964

Loss after 1200 steps 0.0383768

Loss after 1300 steps 0.0375839

Loss after 1400 steps 0.0368152

Loss after 1500 steps 0.0360687

Loss after 1600 steps 0.0353427

Loss after 1700 steps 0.0346358

Loss after 1800 steps 0.0339468

Loss after 1900 steps 0.0332748

Loss after 2000 steps 0.0326189

Loss after 2100 steps 0.0319783

Loss after 2200 steps 0.0313524

Loss after 2300 steps 0.0307407

Loss after 2400 steps 0.0301426

Loss after 2500 steps 0.0295577

Loss after 2600 steps 0.0289855

Loss after 2700 steps 0.0284258

Loss after 2800 steps 0.0278781

Loss after 2900 steps 0.0273422

Loss after 3000 steps 0.0268178

Loss after 3100 steps 0.0263046

Loss after 3200 steps 0.0258023

Loss after 3300 steps 0.0253108

Loss after 3400 steps 0.0248298

Loss after 3500 steps 0.0243591

Loss after 3600 steps 0.0238985

Loss after 3700 steps 0.0234478

Loss after 3800 steps 0.0230068

Loss after 3900 steps 0.0225755

Loss after 4000 steps 0.0221534

Loss after 4100 steps 0.0217407

Loss after 4200 steps 0.0213369

Loss after 4300 steps 0.0209421

Loss after 4400 steps 0.020556

Loss after 4500 steps 0.0201784

Loss after 4600 steps 0.0198093

Loss after 4700 steps 0.0194484

Loss after 4800 steps 0.0190956

Loss after 4900 steps 0.0187508

DNN output: 0.0969611

Price predicted 13377.7 euros

這裡的預測車價是 13377.7 歐元。每次預測的到的車價都不相同，甚至會介於 8000-17000 之間。這是因為我們只使用了三個屬性來描述汽車，而我們的的模型架構也相對比較簡單。

正如之前所說的，C++ API 的開發仍在進行中，我們希望在不久的將來，更多的功能可以加入進來。

原文連結：

作為TensorFlow的底層語言,你會用C++構建深度神經網絡嗎?

相關焦點

深度學習筆記8:利用Tensorflow搭建神經網絡

TensorFlow與PyTorch之爭,哪個框架最適合深度學習

深度解讀TensorFlow,了解它的最新發展!

關於TensorFlow,你應該了解的9件事

谷歌開放GNMT教程:如何使用TensorFlow構建自己的神經機器翻譯系統

一步一步學用Tensorflow構建卷積神經網絡

玩轉TensorFlow?你需要知道這30功能

Keras和TensorFlow究竟哪個會更好?

TensorFlow 資源大全中文版

教程| 如何用TensorFlow在安卓設備上實現深度學習推斷

用TensorFlow和Keras構建卷積神經網絡

入門| Tensorflow實戰講解神經網絡搭建詳細過程

教程| 如何使用TensorFlow構建、訓練和改進循環神經網絡

教程 | 如何使用TensorFlow構建、訓練和改進循環神經網絡

初學AI神經網絡應該選擇Keras或是Pytorch框架?

機器之心GitHub項目:從零開始用TensorFlow搭建卷積神經網絡

Tensorflow還是PyTorch?哪一個才更適合編程實現深度神經網絡?

從框架優缺點說起,這是一份TensorFlow入門極簡教程

深度學習的敲門磚:手把手教你TensorFlow初級入門

Keras結合Keras後端搭建個性化神經網絡模型(不用原生Tensorflow)