關於GCN,我有三種寫法

2021-01-11 AI科技大本營

作者 | 阿澤

本篇文章主要基於 DGL 框架用三種不同的方式來實現圖卷積神經網絡。

DGL簡介

DGL（Deep Graph Library）框架是由紐約大學和 AWS 工程師共同開發的開源框架，旨在為大家提供一個在圖上進行深度學習的工具，幫助大家更高效的實現算法。

用現有的一些框架比如 TensorFlow、Pytorch、MXNet 等實現圖神經網絡模型都不太方便，同樣現有框架實現圖神經網絡模型的速度不夠快。

DGL 框架設計理念主要在於將圖神經網絡看作是消息傳遞的過程，每一個節點會發出它自己的消息，也會接收來自其它節點的消息。然後在得到所有信息之後做聚合，計算出節點新的表示。原有的深度學習框架都是進行張量運算，但是圖很多時候並不能直接表示成一個完整的張量，需要手動補零，這其實很麻煩，不高效。

DGL 是基於現有框架，幫助用戶更容易實現圖神經網絡模型。DGL 現在主要是以消息傳遞的接口作為核心，同時提供圖採樣以及批量處理圖的接口。

關於 DGL 就不再進行過多介紹，感興趣的同學可以去官網（http://dgl.ai/）了解。

Prepare

import torchimport timeimport mathimport dglimport numpy as npimport torch.nn as nnfrom dgl.data import citation_graph as citegrhfrom dgl import DGLGraphimport dgl.function as fnimport networkx as nximport torch.nn.functional as Ffrom dgl.nn import GraphConv# from dgl.nn.pytorch import GraphConv# from dgl.nn.pytorch.conv import GraphConv這裡有三種導入方法，建議用第一種，因為 DGL 的開發同學設計了一個機制，會自動 detect 用了什麼 beckend，從而適配對應的 backend 的 api。

print(torch.__version__)print(dgl.__version__)print(nx.__version__)1.4.00.4.32.3

GCN

3.1 First versionDGL 的第一種寫法是利用 DGL 預定義的圖卷積模塊 GraphConv 來實現的。

GCN 的數學公式如下：

其中，為節點的鄰居集合，表示節點度的平方根的乘積，用於歸一化數據，為激活函數。

GraphConv 模型參數初始化參考 tkipf 大佬的原始實現，其中使用 Glorot uniform 統一初始化，並將偏差初始化為零。

簡單介紹下 Glorot 均勻分布（uniform）

Glorot 均勻分布，也叫 Xavier 均勻分布，該方法源於 2010 年的一篇論文《Understanding the difficulty of training deep feedforward neural networks》。其核心思想在於：為了使得網絡中信息更好的流動，每一層輸出的方差應該儘量相等。基於這個目標，權重 W 的方差需要滿足，我們知道均勻分布的方差為：。所以我們可以初始化 W 為 Xavier 均勻分布：（具體證明見論文）

class GCN(nn.Module):def __init__(self,g,in_feats,n_hidden,n_classes,n_layers,activation,dropout):super(GCN, self).__init__self.g = gself.layers = nn.ModuleList# input layerself.layers.append(GraphConv(in_feats, n_hidden, activation=activation))# output layerfor i in range(n_layers - 1):self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation))# output layerself.layers.append(GraphConv(n_hidden, n_classes))self.dropout = nn.Dropout(p=dropout)def forward(self, features):h = featuresfor i, layers in enumerate(self.layers):if i!=0:h = self.dropout(h)h = layers(self.g, h)return h3.2 Second version

3.2.1 ndata

DGL 的第二種寫法：使用用戶自定義的 Message 和 Reduce 函數

ndata 是 DGL 的一個特殊的語法，可以用於賦值(獲得)某些節點的特徵：

x = tourch.randn(10, 3)g.ndata['x'] = x如果指定某些節點的特徵，可以進行切片操作：

g.ndata['x'][0] = th.zeros(1, 3)g.ndata['x'][[0, 1, 2]] = th.zeros(3, 3)g.ndata['x'][th.tensor([0, 1, 2])] = th.randn((3, 3))當然也可以獲得邊的特徵：

g.edata['w'] = th.randn(9, 2)# Access edge set with IDs in integer, list, or integer tensorg.edata['w'][1] = th.randn(1, 2)g.edata['w'][[0, 1, 2]] = th.zeros(3, 2)g.edata['w'][th.tensor([0, 1, 2])] = th.zeros(3, 2)# You can get the edge ids by giving endpoints, which are useful for accessing the features.g.edata['w'][g.edge_id(1, 0)] = th.ones(1, 2) # edge 1 -> 0g.edata['w'][g.edge_ids([1, 2, 3], [0, 0, 0])] = th.ones(3, 2) # edges [1, 2, 3] -> 0# Use edge broadcasting whenever applicable.g.edata['w'][g.edge_ids([1, 2, 3], 0)] = th.ones(3, 2) # edges [1, 2, 3] -> 03.2.2 UDFs

在 DGL 中，通過用戶自定義的函數（User-defined functions，UDFs）來實現消息傳遞和節點特徵變換。

可以利用 Edge UDFs 來定義一個消息（Message）函數，其功能在於基於邊傳遞消息。具體實現如下：

def gcn_msg(edge):msg = edge.src['h'] * edge.src['norm']return {'m': msg}Edge UDFs 需要傳入一個 edge 參數，其中 edge 有三個屬性：src、dst、data，分別對應源節點特徵、目標節點特徵和邊特徵。

我們的 Message 函數，是從源節點向目標節點傳遞，所以只考慮源節點的特徵。

節點中的 'norm' 用於歸一化，具體計算方式後面會說。

對於每個節點來說，可能過會收到很多個源節點傳過來的消息，所以可以將這些消息存儲在郵箱中（mailbox）。

我們那再來定義一個聚合（Reduce）函數。

消息傳遞完後，每個節點都要處理下他們的「信箱」（mailbox），Reduce 函數的作用就是用來處理節點「信箱」的消息的。

Reduce 函數是一個 Node UDFs。

Node UDFs 接收一個 node 的參數，並且 node 有兩個屬性 data 和 mailbox，分別為節點的特徵和用來接收信息的「信箱」。

def gcn_reduce(node):# 需要注意：消息存放在 mailbox 的第二個維上，第一維是消息的數量accum = torch.sum(node.mailbox['m'], dim=1) * node.data['norm']return {'h': accum}Messge UDF 作用於邊上，而 Reduce UDF 作用於節點上。兩者的關係如下：

從左到右開始看，源節點通過 message 函數傳遞節點特徵，並傳遞到目標節點的 Mailbox 中，在觸發 Node UDF 時（這裡為 Reduce 函數），Mailbox 將被清空。

上圖中我們還可以看到作用於節點的有兩個函數：Apply 函數和 Reduce 函數。

Reduce 函數我們上面介紹過了，那這個 Apply 函數是什麼呢？

Apply 函數為節點更新的函數，可以用於「初始化參數」和「對節點特徵的進行非線形變換」。初始化參數：我們剛剛指出，參數分布服從 Glorot 均勻分布，所以要給節點加偏置的話，我們也需要將其初始化為並使其服從 Glorot 均勻分布，如下面代碼中的 reset_parameters 函數

非線形變換：GCN 中每一層進行傳遞後，節點可能需要進行非線形變換，如下面代碼中 forward 函數

class NodeApplyModule(nn.Module):def __init__(self, out_feats, activation=None, bias=True):super(NodeApplyModule, self).__init__if bias:self.bias = nn.Parameter(torch.Tensor(out_feats))else:self.bias = Noneself.activation = activationself.reset_parametersdef reset_parameters(self):if self.bias is not None:stdv = 1. / math.sqrt(self.bias.size(0))self.bias.data.uniform_(-stdv, stdv)def forward(self, nodes):h = nodes.data['h']if self.bias is not None:h = h + self.biasif self.activation:h = self.activation(h)return {'h': h}有了 Message 函數、Reduce 函數和節點的更新函數後，我們需要將其連貫起來：

g.update_all(message_func='default', reduce_func='default',apply_node_func='default')這個函數可以用於發送信息並更新所有節點，是 send 和 recv 函數的一個簡單組合

3.2.3 GCNLayer

有了這些後，我們便可以定義 GCNLayer 了：

class GCNLayer(nn.Module):def __init__(self,g,in_feats,out_feats,activation,dropout,bias=True):super(GCNLayer, self).__init__self.g = gself.weight = nn.Parameter(torch.Tensor(in_feats, out_feats))if dropout:self.dropout = nn.Dropout(p=dropout)else:self.dropout = 0.self.node_update = NodeApplyModule(out_feats, activation, bias)self.reset_parametersdef reset_parameters(self):stdv = 1. / math.sqrt(self.weight.size(1))self.weight.data.uniform_(-stdv, stdv)def forward(self, h):if self.dropout:h = self.dropout(h)self.g.ndata['h'] = torch.mm(h, self.weight)self.g.update_all(gcn_msg, gcn_reduce, self.node_update)h = self.g.ndata.pop('h')return h然後我們把 GCNLayer 拼接在一起組成 GCN 網絡

class GCN(nn.Module):def __init__(self,g,in_feats,n_hidden,n_classes,n_layers,activation,dropout):super(GCN, self).__init__self.layers = nn.ModuleList# input layerself.layers.append(GCNLayer(g, in_feats, n_hidden, activation, dropout))# hidden layersfor i in range(n_layers - 1):self.layers.append(GCNLayer(g, n_hidden, n_hidden, activation, dropout))# output layerself.layers.append(GCNLayer(g, n_hidden, n_classes, None, dropout))def forward(self, features):h = featuresfor layer in self.layers:h = layer(h)return h3.3 Third version

DGL 的第三種寫法：使用 DGL 的內置（builtin）函數

由於 Messge 和 Reduce 函數使用的比較頻繁，所以 DGL 了內置函數以方便使用，我們把剛剛的 Message 和 Reduce 函數改變為內置函數有：

dgl.function.copy_src(src, out)：Message 函數其實就是把源節點的特徵拷貝到目標節點，所以可以換用內置的 copy_src 函數。dgl.function.sum(msg, out)：Reduce 函數其實就是聚合節點 Mailbox 中的消息，所以可以換用內置的 sum 函數。class GCNLayer(nn.Module):def __init__(self,g,in_feats,out_feats,activation,dropout,bias=True):super(GCNLayer, self).__init__self.g = gself.weight = nn.Parameter(torch.Tensor(in_feats, out_feats))if bias:self.bias = nn.Parameter(torch.Tensor(out_feats))else:self.bias = Noneself.activation = activationif dropout:self.dropout = nn.Dropout(p=dropout)else:self.dropout = 0.self.reset_parametersdef reset_parameters(self):stdv = 1. / math.sqrt(self.weight.size(1))self.weight.data.uniform_(-stdv, stdv)if self.bias is not None:self.bias.data.uniform_(-stdv, stdv)def forward(self, h):if self.dropout:h = self.dropout(h)h = torch.mm(h, self.weight)# normalization by square root of src degreeh = h * self.g.ndata['norm']self.g.ndata['h'] = hself.g.update_all(fn.copy_src(src='h', out='m'),fn.sum(msg='m', out='h'))h = self.g.ndata.pop('h')# normalization by square root of dst degreeh = h * self.g.ndata['norm']# biasif self.bias is not None:h = h + self.biasif self.activation:h = self.activation(h)return h這裡的做了兩次的標準化，對應 GCN 公式中的；這裡把 Node 的 Apply 函數的功能合併到 GCNLayer 中了。class GCN(nn.Module):def __init__(self,g,in_feats,n_hidden,n_classes,n_layers,activation,dropout):super(GCN, self).__init__self.layers = nn.ModuleList# input layerself.layers.append(GCNLayer(g, in_feats, n_hidden, activation, 0.))# hidden layersfor i in range(n_layers - 1):self.layers.append(GCNLayer(g, n_hidden, n_hidden, activation, dropout))# output layerself.layers.append(GCNLayer(g, n_hidden, n_classes, None, dropout))def forward(self, features):h = featuresfor layer in self.layers:h = layer(h)return h

訓練

dropout=0.5gpu=-1lr=0.01n_epochs=200n_hidden=16 # 隱藏層節點的數量n_layers=2 # 輸入層 + 輸出層的數量weight_decay=5e-4 # 權重衰減self_loop=True # 自循環# cora 數據集data = citegrh.load_corafeatures = torch.FloatTensor(data.features)labels = torch.LongTensor(data.labels)train_mask = torch.BoolTensor(data.train_mask)val_mask = torch.BoolTensor(data.val_mask)test_mask = torch.BoolTensor(data.test_mask)in_feats = features.shape[1]n_classes = data.num_labelsn_edges = data.graph.number_of_edges# 構建 DGLGraphg = data.graphif self_loop:g.remove_edges_from(nx.selfloop_edges(g))g.add_edges_from(zip(g.nodes, g.nodes))g = DGLGraph(g)這裡大家可能會有些疑惑：為什麼要先移除自環？然後再加上自環。

這個主要是為了防止原本數據集中有一部分的自環，如果不去掉直接加上自環的話，會導致一些節點有兩個自環，而有些只有一個。

# 加載 GPUif gpu < 0:cuda = Falseelse:cuda = Truetorch.cuda.set_device(gpu)features = features.cudalabels = labels.cudatrain_mask = train_mask.cudaval_mask = val_mask.cudatest_mask = test_mask.cuda# 歸一化，依據入度進行計算degs = g.in_degrees.floatnorm = torch.pow(degs, -0.5)norm[torch.isinf(norm)] = 0if cuda:norm = norm.cudag.ndata['norm'] = norm.unsqueeze(1)# 創建一個 GCN 的模型，可以選擇上面的任意一個進行初始化model = GCN(g,in_feats,n_hidden,n_classes,n_layers,F.relu,dropout)if cuda:model.cuda# 採用交叉熵損失函數和 Adam 優化器loss_fcn = torch.nn.CrossEntropyLossoptimizer = torch.optim.Adam(model.parameters,lr=lr,weight_decay=weight_decay)# 定義一個評估函數def evaluate(model, features, labels, mask):model.evalwith torch.no_grad:logits = model(features)logits = logits[mask]labels = labels[mask]_, indices = torch.max(logits, dim=1)correct = torch.sum(indices == labels)return correct.item * 1.0 / len(labels)# 訓練，並評估dur =for epoch in range(n_epochs):model.traint0 = time.time# forwardlogits = model(features)loss = loss_fcn(logits[train_mask], labels[train_mask])optimizer.zero_gradloss.backwardoptimizer.stepdur.append(time.time - t0)if epoch % 10 == 0:acc = evaluate(model, features, labels, val_mask)print("Epoch {:05d} | Time(s) {:.4f} | Loss {:.4f} | Accuracy {:.4f} | ""ETputs(KTEPS) {:.2f}". format(epoch, np.mean(dur), loss.item,acc, n_edges / np.mean(dur) / 1000))printacc = evaluate(model, features, labels, test_mask)print("Test accuracy {:.2%}".format(acc))Epoch 00000 | Time(s) 0.0178 | Loss 1.9446 | Accuracy 0.2100 | ETputs(KTEPS) 594.54Epoch 00010 | Time(s) 0.0153 | Loss 1.7609 | Accuracy 0.3533 | ETputs(KTEPS) 689.33Epoch 00020 | Time(s) 0.0150 | Loss 1.5518 | Accuracy 0.5633 | ETputs(KTEPS) 703.47Epoch 00030 | Time(s) 0.0146 | Loss 1.2769 | Accuracy 0.5867 | ETputs(KTEPS) 721.28Epoch 00040 | Time(s) 0.0143 | Loss 1.0785 | Accuracy 0.6567 | ETputs(KTEPS) 740.36Epoch 00050 | Time(s) 0.0140 | Loss 0.8881 | Accuracy 0.7067 | ETputs(KTEPS) 754.21Epoch 00060 | Time(s) 0.0138 | Loss 0.6994 | Accuracy 0.7533 | ETputs(KTEPS) 763.21Epoch 00070 | Time(s) 0.0137 | Loss 0.6249 | Accuracy 0.7800 | ETputs(KTEPS) 770.54Epoch 00080 | Time(s) 0.0137 | Loss 0.5048 | Accuracy 0.7800 | ETputs(KTEPS) 772.31Epoch 00090 | Time(s) 0.0136 | Loss 0.4457 | Accuracy 0.7867 | ETputs(KTEPS) 778.78Epoch 00100 | Time(s) 0.0135 | Loss 0.4167 | Accuracy 0.7800 | ETputs(KTEPS) 782.25Epoch 00110 | Time(s) 0.0134 | Loss 0.3389 | Accuracy 0.8000 | ETputs(KTEPS) 786.52Epoch 00120 | Time(s) 0.0134 | Loss 0.3777 | Accuracy 0.8100 | ETputs(KTEPS) 789.85Epoch 00130 | Time(s) 0.0133 | Loss 0.3307 | Accuracy 0.8133 | ETputs(KTEPS) 792.00Epoch 00140 | Time(s) 0.0133 | Loss 0.2542 | Accuracy 0.7933 | ETputs(KTEPS) 794.13Epoch 00150 | Time(s) 0.0133 | Loss 0.2937 | Accuracy 0.8000 | ETputs(KTEPS) 795.73Epoch 00160 | Time(s) 0.0132 | Loss 0.2944 | Accuracy 0.8333 | ETputs(KTEPS) 797.04Epoch 00170 | Time(s) 0.0132 | Loss 0.2161 | Accuracy 0.8167 | ETputs(KTEPS) 799.74Epoch 00180 | Time(s) 0.0132 | Loss 0.1972 | Accuracy 0.8200 | ETputs(KTEPS) 801.31Epoch 00190 | Time(s) 0.0131 | Loss 0.2339 | Accuracy 0.8167 | ETputs(KTEPS) 802.92Test accuracy 80.40%

5.結論

以上便是本教程的全部，當然還有其他實現的方法，比如說，直接利用矩陣相乘來進行迭代。

參考目錄

DGL GithubDGL 官方文檔《深度學習——Xavier初始化方法》《DGL 作者答疑！關於 DGL 你想知道的都在這裡-周金晶》

利用 AssemblyAI 在 PyTorch 中建立端到端的語音識別模型京東姚霆：推理能力，正是多模態技術未來亟需突破的瓶頸性能超越最新序列推薦模型，華為諾亞方舟提出記憶增強的圖神經網絡 FPGA 無解漏洞「StarBleed」轟動一時，今天來扒一下技術細節！真慘！連各大程式語言都擺起地攤了發送0.55 ETH花費近260萬美元！這筆神秘交易引發大猜想

關於GCN,我有三種寫法

相關焦點

NAVI擺脫組傳gcn架構?或許是一個玩笑

齊湣王的mǐn字有四種寫法,究竟哪一種寫法是正確的?

草書的我字寫法有什麼來歷?正確寫法是那個字

「茴」字有四種寫法,你知道幾種

調研報告的寫法

希臘字母的讀法與寫法

直接練行楷,可以,以偏旁部首草字頭、廠字旁為例,講解行楷寫法

書法結構入門橫折折鉤兩種寫法分清寫法

毛筆書法中偏旁部首的寫法:皿字底的寫法

硬筆書法中點、折、提、鉤的寫法

乾貨:硬筆書法,獨體字間架結構,10種典型類型,漂亮寫法詳解

「茴」有四樣寫法,「和」有六種讀音

單人旁、雙人旁、三點水、言字旁,草書寫法均相似,該如何分辨?

五種楷書撇的寫法,簡簡單單的認識和分辨,三分鐘區分撇的寫法

你知道，量子力學可以有三種寫法麼？

你知道,量子力學可以有三種寫法麼?

硬筆隸書八大基本筆畫寫法

研究生學位論文幾個主要部分的寫法

Fibonacci 斐波那契數列的幾種寫法、時間複雜度對比

高中英語讀後續寫-議論文跨越式創新寫法

關於GCN,我有三種寫法

相關焦點

NAVI擺脫組傳gcn架構?或許是一個玩笑

齊湣王的mǐn字有四種寫法,究竟哪一種寫法是正確的?

草書的我字寫法有什麼來歷?正確寫法是那個字

「茴」字有四種寫法,你知道幾種

調研報告的寫法

希臘字母的讀法與寫法

直接練行楷,可以,以偏旁部首草字頭、廠字旁為例,講解行楷寫法

書法結構入門 橫折折鉤兩種寫法 分清寫法

毛筆書法中偏旁部首的寫法:皿字底的寫法

硬筆書法中點、折、提、鉤的寫法

乾貨:硬筆書法,獨體字間架結構,10種典型類型,漂亮寫法詳解

「茴」有四樣寫法,「和」有六種讀音

單人旁、雙人旁、三點水、言字旁,草書寫法均相似,該如何分辨?

五種楷書撇的寫法,簡簡單單的認識和分辨,三分鐘區分撇的寫法

你知道，量子力學可以有三種寫法麼？

你知道,量子力學可以有三種寫法麼?

硬筆隸書八大基本筆畫寫法

研究生學位論文幾個主要部分的寫法

Fibonacci 斐波那契數列的幾種寫法、時間複雜度對比

高中英語讀後續寫-議論文跨越式創新寫法

書法結構入門橫折折鉤兩種寫法分清寫法