本資源整理了與自然語言處理(NLP)相關的深度學習技術核心概念,以及2019年概念相關最新的論文,涉及算法優化(Adam,Adagrad、AMS、Mini-batch SGD等),參數初始化(Glorot initialization、 He initialization),模型約束(Dropout、 Word Dropout、Patience、Weight Decay等),歸一化,損失函數類型,網絡訓練方法,激活函數選擇,CNN、RNN網絡結構等核心概念。
核心概念連個方面:1、梳理了深度學習、NLP相關技術核心概念;2、整理了這些概念相關最新論文。非常值得推薦。
資源整理自網絡,源地址:https://github.com/neulab/nn4nlp-concepts/blob/master/concepts.md
帶論文連結資源下載地址:
連結: https://pan.baidu.com/s/1lC8DiPJnyzbxtvns-HXr_w
提取碼: yv6g
參數優化/學習
優化器與優化策略
•Mini-batch SGD: optim-sgd
•Adam: optim-adam (implies optim-sgd)
•Adagrad: optim-adagrad (implies optim-sgd)
•Adadelta: optim-adadelta (implies optim-sgd)
•Adam with Specialized Transformer Learning Rate ("Noam" Schedule): optim-noam (implies optim-adam)
•SGD with Momentum: optim-momentum (implies optim-sgd)
•AMS: optim-amsgrad (implies optim-sgd)
•Projection / Projected Gradient Descent: optim-projection (implies optim-sgd)
參數初始化
•Glorot/Xavier Initialization: init-glorot
•He Initialization: init-he
參數約束策略
•Dropout: reg-dropout
•Word Dropout: reg-worddropout (implies reg-dropout)
•Norm (L1/L2) Regularization: reg-norm
•Early Stopping: reg-stopping
•Patience: reg-patience (implies reg-stopping)
•Weight Decay: reg-decay
•Label Smoothing: reg-labelsmooth
歸一化策略
•Layer Normalization: norm-layer
•Batch Normalization: norm-batch
•Gradient Clipping: norm-gradient
損失函數
•Canonical Correlation Analysis (CCA): loss-cca
•Singular Value Decomposition (SVD): loss-svd
•Margin-based Loss Functions: loss-margin
•Contrastive Loss: loss-cons
•Noise Contrastive Estimation (NCE): loss-nce (implies loss-cons)
•Triplet Loss: loss-triplet (implies loss-cons)
訓練方法
•Multi-task Learning (MTL): train-mtl
•Multi-lingual Learning (MLL): train-mll (implies train-mtl)
•Transfer Learning: train-transfer
•Active Learning: train-active
•Data Augmentation: train-augment
•Curriculum Learning: train-curriculum
•Parallel Training: train-parallel
序列模型結構
激活函數
•Hyperbolic Tangent (tanh): activ-tanh
•Rectified Linear Units (RelU): activ-relu
池化操作
•Max Pooling: pool-max
•Mean Pooling: pool-mean
•k-Max Pooling: pool-kmax
循環結構
•Recurrent Neural Network (RNN): arch-rnn
•Bi-directional Recurrent Neural Network (Bi-RNN): arch-birnn (implies arch-rnn)
•Long Short-term Memory (LSTM): arch-lstm (implies arch-rnn)
•Bi-directional Long Short-term Memory (LSTM): arch-bilstm (implies arch-birnn, arch-lstm)
•Gated Recurrent Units (GRU): arch-gru (implies arch-rnn)
•Bi-directional Gated Recurrent Units (GRU): arch-bigru (implies arch-birnn, arch-gru)
其他序列化/結構化結構
•Bag-of-words, Bag-of-embeddings, Continuous Bag-of-words (BOW): arch-bow
•Convolutional Neural Networks (CNN): arch-cnn
•Attention: arch-att
•Self Attention: arch-selfatt (implies arch-att)
•Recursive Neural Network (RecNN): arch-recnn
•Tree-structured Long Short-term Memory (TreeLSTM): arch-treelstm (implies arch-recnn)
•Graph Neural Network (GNN): arch-gnn
•Graph Convolutional Neural Network (GCNN): arch-gcnn (implies arch-gnn)
結構優化技巧
•Residual Connections (ResNet): arch-residual
•Gating Connections, Highway Connections: arch-gating
•Memory: arch-memo
•Copy Mechanism: arch-copy
•Bilinear, Biaffine Models: arch-bilinear
•Coverage Vectors/Penalties: arch-coverage
•Subword Units: arch-subword
•Energy-based, Globally-normalized Mdels: arch-energy
標準複合結構
•Transformer: arch-transformer (implies arch-selfatt, arch-residual, arch-layernorm, optim-noam)
模型組合
•Ensembling: comb-ensemble
尋優搜索算法
•Greedy Search: search-greedy
•Beam Search: search-beam
•A* Search: search-astar
•Viterbi Algorithm: search-viterbi
•Ancestral Sampling: search-sampling
•Gumbel Max: search-gumbel (implies search-sampling)
預測任務
•Text Classification (text -> label): task-textclass
•Text Pair Classification (two texts -> label: task-textpair
•Sequence Labeling (text -> one label per token): task-seqlab
•Extractive Summarization (text -> subset of text): task-extractive (implies text-seqlab)
•Span Labeling (text -> labels on spans): task-spanlab
•Language Modeling (predict probability of text): task-lm
•Conditioned Language Modeling (some input -> text): task-condlm (implies task-lm)
•Sequence-to-sequence Tasks (text -> text, including MT): task-seq2seq (implies task-condlm)
•Cloze-style Prediction, Masked Language Modeling (right and left context -> word): task-cloze
•Context Prediction (as in word2vec) (word -> right and left context): task-context
•Relation Prediction (text -> graph of relations between words, including dependency parsing): task-relation
•Tree Prediction (text -> tree, including syntactic and some semantic semantic parsing): task-tree
•Graph Prediction (text -> graph not necessarily between nodes): task-graph
•Lexicon Induction/Embedding Alignment (text/embeddings -> bi- or multi-lingual lexicon): task-lexicon
•Word Alignment (parallel text -> alignment between words): task-alignment
預訓練向量融合策略
•word2vec: pre-word2vec (implies arch-cbow, task-cloze, task-context)
•fasttext: pre-fasttext (implies arch-cbow, arch-subword, task-cloze, task-context)
•GloVe: pre-glove
•Paragraph Vector (ParaVec): pre-paravec
•Skip-thought: pre-skipthought (implies arch-lstm, task-seq2seq)
•ELMo: pre-elmo (implies arch-bilstm, task-lm)
•BERT: pre-bert (implies arch-transformer, task-cloze, task-textpair)
•Universal Sentence Encoder (USE): pre-use (implies arch-transformer, task-seq2seq)
結構化模型/算法
•Hidden Markov Models (HMM): struct-hmm
•Conditional Random Fields (CRF): struct-crf
•Context-free Grammar (CFG): struct-cfg
•Combinatorial Categorical Grammar (CCG): struct-ccg
不可導函數訓練方法
•Complete Enumeration: nondif-enum
•Straight-through Estimator: nondif-straightthrough
•Gumbel Softmax: nondif-gumbelsoftmax
•Minimum Risk Training: nondif-minrisk
•REINFORCE: nondif-reinforce
對抗方法
•Generative Adversarial Networks (GAN): adv-gan
•Adversarial Feature Learning: adv-feat
•Adversarial Examples: adv-examp
•Adversarial Training: adv-train (implies adv-examp)
隱變量模型
•Variational Auto-encoder (VAE): latent-vae
•Topic Model: latent-topic
元學習
•Meta-learning Initialization: meta-init
•Meta-learning Optimizers: meta-optim
•Meta-learning Loss functions: meta-loss
•Neural Architecture Search: meta-arch