[DL] PyTorch 折桂 6:torch.nn 總覽 & torch.nn.Module

2021-03-02 花解語NLP

往期匯總：1 torch.nn 總覽

PyTorch 把與深度學習模型搭建相關的全部類全部在 torch.nn 這個子模塊中。根據類的功能分類，常用的有如下十幾個部分：

Containers：容器類，如 torch.nn.Module；Convolution Layers：卷積層，如 torch.nn.Conv2d；Pooling Layers：池化層，如 torch.nn.MaxPool2d；Non-linear activations：非線性激活層，如 torch.nn.ReLU；Normalization layers：歸一化層，如 torch.nn.BatchNorm2d；Recurrent layers：循環神經層，如 torch.nn.LSTM；Transformer layers：transformer 層，如 torch.nn.TransformerEncoder；Linear layers：線性連接層，如 torch.nn.Linear；Dropout layers：dropout 層，如 torch.nn.Dropout；Sparse layers：稀疏層，如 torch.nn.Embedding；Vision layers：vision 層，如 torch.nn.Upsample；DataParallel layers：平行計算層，如 torch.nn.DataParallel；Utilities：其它功能，如 torch.nn.utils.clip_grad_value_。

而在 torch.nn 下面還有一個子模塊 torch.nn.functional，基本上是 torch.nn 裡對應類的函數，比如 torch.nn.ReLU 的對應函數是 torch.nn.functional.relu。為什麼要這麼做呢？

你可能會疑惑為什麼需要這兩個功能如此相近的模塊，其實這麼設計是有其原因的。如果我們只保留 nn.functional 下的函數的話，在訓練或者使用時，我們就要手動去維護 weight，bias，stride 這些中間量的值，這顯然是給用戶帶來了不便。而如果我們只保留 nn 下的類的話，其實就犧牲了一部分靈活性，因為做一些簡單的計算都需要創造一個類，這也與 PyTorch 的風格不符。（知乎回答[1]）

torch.nn 可以被 nn.Module 識別，並成為網絡組成的一部分；torch.nn.functional 則不行。比較以下兩個模型：

>>> class Simple(nn.Module):
...     def __init__(self):
...         super(Simple, self).__init__()
...         self.fc = nn.Linear(10, 1)
...         self.dropout = nn.Dropout(0.5) # 使用 nn.Dropout 類
        
...     def forward(self, x):
...         x = self.fc(x)
...         x = self.dropout(x)
...         return x
>>> simple = Simple()
>>> print(simple)
Simple(
  (fc): Linear(in_features=10, out_features=1, bias=True)
  (dropout): Dropout(p=0.5, inplace=False) #可以被識別成一層
)

>>> class Simple2(nn.Module):
...     def __init__(self):
...         super(Simple2, self).__init__()
...         self.fc = nn.Linear(10, 1)
        
...     def forward(self, x):
...         x = F.dropout(self.fc(x)) # 使用 nn.functional.dropout，不能被識別
...         return x
>>> simple2 = Simple2()
>>> print(simple2)
Simple2(
  (fc): Linear(in_features=10, out_features=1, bias=True)
)
什麼時候調用 torch.nn，什麼時候調用 torch.nn.functional 呢？很多人的經驗是：不需要存儲權重的時候使用 torch.nn.functional，需要存儲權重的時候使用 torch.nn ：
激活函數使用 torch.nn.functional；這裡要額外說一下 dropout 層。理論上 dropout 沒有權重，可以使用 torch.nn.functional.dropout，然而 dropout 有train 和 eval 模式，使用 torch.nn.Dropout 可以方便地使用 model.train() 或 model.eval() 對模式進行控制，而 torch.nn.functional.dropout 函數就不行。所以為了方便，推薦使用 torch.nn.Dropout。
以後若沒有特殊說明，均在引入模塊時省略 torch 模塊名稱。
創造一個模型分兩步：構建模型和權值初始化。而構建模型又有「定義單獨的網絡層」和「把它們拼在一起」兩步。
2. torch.nn.Moduletorch.nn.Module 是所有 torch.nn 中的類的父類。我們來看一個非常簡單的神經網絡：
class SimpleNet(nn.Module):
    def __init__(self, x):
        super(SimpleNet,self).__init__()
        self.fc = nn.Linear(x.shape[0], 1)
        
    def forward(self, x):
        x = self.fc(x)
        return x
我們隨便餵給它一個張量，列印它的網絡：
>>> simpleNet = SimpleNet(torch.tensor((10, 2)))
>>> print(simpleNet)
SimpleNet(
  (fc): Linear(in_features=2, out_features=1, bias=True)
)
所有自定義的神經網絡都要繼承 torch.nn.Module。定義單獨的網絡層在 __init__ 函數中實現，把定義好的網絡層拼接在一起在 forward 函數中實現。網絡類有兩個重要的函數：parameters 存儲了模型的權重；modules 存儲了模型的結構。
>>> list(simpleNet.modules())
[SimpleNet(
   (fc): Linear(in_features=2, out_features=1, bias=True)
 ),
 Linear(in_features=2, out_features=1, bias=True)]
 
 >>> list(simpleNet.parameters())
[Parameter containing:
 tensor([[ 0.1533, -0.2574]], requires_grad=True),
 Parameter containing:
 tensor([-0.1589], requires_grad=True)]
3. torch.nn.Sequential這是一個序列容器，既可以放在模型外面單獨構建一個模型，也可以放在模型裡面成為模型的一部分。
# 單獨成為一個模型
model1 = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
# 成為模型的一部分
class LeNetSequential(nn.Module):
    def __init__(self, classes):
        super(LeNetSequential, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 6, 5),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(6, 16, 5),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),)

        self.classifier = nn.Sequential(
            nn.Linear(16*5*5, 120),
            nn.ReLU(),
            nn.Linear(120, 84),
            nn.ReLU(),
            nn.Linear(84, classes),)

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size()[0], -1)
        x = self.classifier(x)
        return x
放在模型裡面的話，模型還是需要 __init__ 和 forward 函數。
這樣構建出來的模型的層沒有名字：
>>> model2 = nn.Sequential(
...           nn.Conv2d(1,20,5),
...           nn.ReLU(),
...           nn.Conv2d(20,64,5),
...           nn.ReLU()
...         )
>>> model2
Sequential(
  (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (1): ReLU()
  (2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1))
  (3): ReLU()
)
為了方便區分不同的層，我們可以使用 collections 裡的 OrderedDict 函數：
>>> from collections import OrderedDict
>>> model3 = nn.Sequential(OrderedDict([
...           ('conv1', nn.Conv2d(1,20,5)),
...           ('relu1', nn.ReLU()),
...           ('conv2', nn.Conv2d(20,64,5)),
...           ('relu2', nn.ReLU())
...         ]))
>>> model3
Sequential(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (relu1): ReLU()
  (conv2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1))
  (relu2): ReLU()
)
4. torch.nn.ModuleList將網絡層存儲進一個列表，可以使用列表生成式快速生成網絡，生成的網絡層可以被索引，也擁有列表的方法 append，extend 或 insert。
>>> class MyModule(nn.Module):
...     def __init__(self):
...         super(MyModule, self).__init__()
...         self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])
...         self.linears.append(nn.Linear(10, 1)) # append
...     def forward(self, x):
...         for i, l in enumerate(self.linears):
...             x = self.linears[i // 2](x) + l(x)
...         return x
    
>>> myModeul = MyModule()
>>> myModeul
MyModule(
  (linears): ModuleList(
    (0): Linear(in_features=10, out_features=10, bias=True)
    (1): Linear(in_features=10, out_features=10, bias=True)
    (2): Linear(in_features=10, out_features=10, bias=True)
    (3): Linear(in_features=10, out_features=10, bias=True)
    (4): Linear(in_features=10, out_features=10, bias=True)
    (5): Linear(in_features=10, out_features=10, bias=True)
    (6): Linear(in_features=10, out_features=10, bias=True)
    (7): Linear(in_features=10, out_features=10, bias=True)
    (8): Linear(in_features=10, out_features=10, bias=True)
    (9): Linear(in_features=10, out_features=10, bias=True)
    (10): Linear(in_features=10, out_features=1, bias=True) # append 進的層
  )
)
5. torch.nn.ModuleDict這個函數與上面的 torch.nn.Sequential(OrderedDict(...)) 的行為非常類似，並且擁有 keys，values，items，pop，update 等詞典的方法：
>>> class MyDictDense(nn.Module):
...     def __init__(self):
...         super(MyDictDense, self).__init__()
...         self.params = nn.ModuleDict({
...                 'linear1': nn.Linear(512, 128),
...                 'linear2': nn.Linear(128, 32)
...         })
...         self.params.update({'linear3': nn.Linear(32, 10)}) # 添加層

...     def forward(self, x, choice='linear1'):
...         return torch.mm(x, self.params[choice])

>>> net = MyDictDense()
>>> print(net)
MyDictDense(
  (params): ModuleDict(
    (linear1): Linear(in_features=512, out_features=128, bias=True)
    (linear2): Linear(in_features=128, out_features=32, bias=True)
    (linear3): Linear(in_features=32, out_features=10, bias=True)
  )
)

>>> print(net.params.keys())
odict_keys(['linear1', 'linear2', 'linear3'])

>>> print(net.params.items())
odict_items([('linear1', Linear(in_features=512, out_features=128, bias=True)), ('linear2', Linear(in_features=128, out_features=32, bias=True)), ('linear3', Linear(in_features=32, out_features=10, bias=True))])
參考資料[1]PyTorch 中，nn 與 nn.functional 有什麼區別？: https://www.zhihu.com/question/66782101

[DL] PyTorch 折桂 6:torch.nn 總覽 & torch.nn.Module

相關焦點

【PyTorch】torch.nn.Module 源碼分析

從零開始深度學習Pytorch筆記(12)—— nn.Module

「PyTorch 學習筆記」3.1 模型創建步驟與 nn.Module

PyTorch 源碼解讀之 torch.autograd

【Pytorch】PyTorch 中,nn 與 nn.functional 有什麼區別?

PyTorch 中,nn 與 nn.functional 有什麼區別?

教你使用torchlayers 來構建PyTorch 模型(附連結)

動手學Pytorchday4--nn.Module()類

PyTorch 源碼解讀之 torch.cuda.amp: 自動混合精度詳解

深度學習-Pytorch框架學習之模型定義與簡單操作

專欄 | pytorch入門總結指南(1)

半小時學會 PyTorch Hook

Pytorch中的分布式神經網絡訓練

PyTorch系列 | 如何加快你的模型訓練速度呢?

PyTorch 深度學習新手入門指南

這可能是關於Pytorch底層算子擴展最詳細的總結了!

PyTorch最佳實踐,教你寫出一手風格優美的代碼

【Pytorch】pytorch權重初始化方式與原理

60分鐘PyTorch快速教程(二):TORCH.AUTOGRAD簡介

深度學習大講堂之pytorch入門