前言:這個系列文章將會從經典的卷積神經網絡歷史開始,然後逐個講解卷積神經網絡結構,代碼實現和優化方向。
(以下內容來翻譯自史丹福大學課程:http://cs231n.github.io/convolutional-networks/)
在卷積網絡領域有幾個有名字的體系結構。最常見的是:
卷積神經網絡的第一批成功應用,是由 Yann LeCun 在 20 世紀 90 年代開發的。其中最著名的是用於識別郵政編碼,數字等的LeNet架構。
首先讓卷積神經網絡在計算機視覺中流行的是由 Alex Krizhevsky,Ilya Sutskever 和 Geoff Hinton 開發的 AlexNet。 AlexNet 於 2012 年應用於 ImageNet ILSVRC 挑戰賽,並且明顯超越亞軍(top 5 錯誤率 16%,對比亞軍有 26%)。該卷積神經網絡與 LeNet 具有非常相似的架構,但是更深,更大,並且具有彼此堆疊的卷積層(之前通常一個卷積層總是緊接著池化層)。
GoogLeNet 是 ILSVRC 2014獲獎者,是來自 Google 由 Szegedy 等人開發的卷積網絡。其主要貢獻是開發了一個 Inception 模塊,該模塊大大減少了網絡中的參數數量(4M,與帶有 60M 的 AlexNet 相比)。另外,這個論文在卷積神經網絡的頂部使用平均池化(Average Pooling)而不是完全連接層(Fully Connected layers),從而消除了大量似乎並不重要的參數。 GoogLeNet 還有幾個後續版本,最近的 Inception-v4。
2014 年 ILSVRC 亞軍是來自 Karen Simonyan和 Andrew Zisserman 的卷積神經網絡,被稱為VGGNet。它的主要貢獻在於表明網絡的深度是良好表現的關鍵組成部分。他們最終的最佳網絡包含16個CONV / FC層,並且極具吸引力的是,這個卷積神經網絡具有非常均勻的架構,從開始到結束只執行 3×3 卷積和 2×2 池化。他們的預訓練模型可直接用於 Caffe。 VGGNet 的缺點是花費更大的代加評估和使用更多的內存和參數(140M)。這些參數中的大部分都位於第一個全連接層中,因為發現這些 FC 層可以在不降低性能的情況下被移除,從而大大減少了必要參數的數量。
殘差網絡(ResNet)由 Kaiming He 等開發。它是 ILSVRC 2015的獲勝者。它具有特殊的跳躍連接和大量的使用批量標準化。該體系結構在網絡末端也沒有完全連接層。讀者還可以參考 Kaiming He 的演講(視頻,幻燈片)以及最近在 Torch 中重現這些網絡的實驗。 ResNets 目前是迄今為止最先進的卷積神經網絡模型,並且是實踐中使用卷積神經網絡的默認選擇(截至2016年5月10日)。特別是,還可以看到更多最新的進展,調整了 Kaiming He 等人的原始架構。深度殘差網絡中的 Identity Mappings(2016年3月發布)。
英文原文:
There are several architectures in the field of Convolutional Networks that have a name. The most common are:
LeNet. The first successful applications of Convolutional Networks were developed by Yann LeCun in 1990’s. Of these, the best known is the LeNet architecture that was used to read zip codes, digits, etc.
AlexNet. The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton. The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer always immediately followed by a POOL layer).
GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. There are also several followup versions to the GoogLeNet, most recently Inception-v4.
VGGNet. The runner-up in ILSVRC 2014 was the network from Karen Simonyan and Andrew Zisserman that became known as the VGGNet. Its main contribution was in showing that the depth of the network is a critical component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3×3 convolutions and 2×2 pooling from the beginning to the end. Their pretrained model is available for plug and play use in Caffe. A downside of the VGGNet is that it is more expensive to evaluate and uses a lot more memory and parameters (140M). Most of these parameters are in the first fully connected layer, and it was since found that these FC layers can be removed with no performance downgrade, significantly reducing the number of necessary parameters.
ResNet. Residual Network developed by Kaiming He et al. was the winner of ILSVRC 2015. It features special skip connections and a heavy use of batch normalization. The architecture is also missing fully connected layers at the end of the network. The reader is also referred to Kaiming’s presentation (video, slides), and some recent experimens that reproduce these networks in Torch. ResNets are currently by far state of the art Convolutional Neural Network models and are the
default choice for using ConvNets in practice (as of May 10, 2016). In particular, also see more recent developments that tweak the
original architecture from Kaiming He et al. Identity Mappings in
DeepResidual Networks (published March 2016).
經典卷積神經網絡論文下載:
http://www.tensorflownews.com/wp-content/uploads/2018/04/1409.1556.pdf
http://www.tensorflownews.com/wp-content/uploads/2018/04/1512.03385.pdf
深度殘差網絡不同框架的實現:
作者原版
KaimingHe/deep-residual-networks
Deep Residual Learning for Image Recognition
https://github.com/KaimingHe/deep-residual-networks
TensorFlow 版
ry/tensorflow-resnet
ResNet model in TensorFlow
https://github.com/ry/tensorflow-resnet
Keras 版
raghakot/keras-resnet
Residual networks implementation using Keras-1.0 functional API
https://github.com/raghakot/keras-resnet
Torch 版
facebook/fb.resnet.torch
Torch implementation of ResNet from http://arxiv.org/abs/1512.03385 and training scripts
https://github.com/facebook/fb.resnet.torch
下一篇文章將會是 LeNet 卷積神經網絡結構,代碼實現和優化方向。