演講題目:A Statistical Learning of Deep Learning
主講嘉賓:王漢生
嘉賓簡介:王漢生教授現任北京大學光華管理學院商務統計與經濟計量系系主任。1998年北京大學數學科學學院,概率統計系,統計學本科,2001年美國威斯康星大學麥迪遜分校,統計學博士。現為國際統計協會會員(International Statistical Institute),美國統計學會(American Statistical Association),美國數理統計研究員(Institute of Mathematical Statistics),英國皇家統計協會(Royal Statistical Society),以及泛華統計學會(International Chinese Statistical Association)會員。
他發表英文學術論文五十餘篇,中文論文近二十篇。合著英文專著1本,獨立完成中文教材2本。先後擔任多個學術刊物副主編(Associate Editor)。這些刊物包括:The Annals of Statistics (2008—2009),Computational Statistics & Data Analysis (2008—2011),Statistics and its Interface (2010至今),Journal of the American Statistical Association (2011至今), 以及Statistica Sinica (2011至今)。現主要理論研究興趣為:高維數據分析、變量選擇、數據降維、極值理論、以及半參數模型。主要應用研究興趣為:搜尋引擎營銷、社會關係網絡。
講座摘要:Deep learning is a method of fundamental importance for many AI related applications. From a statistical perspective, deep learning can be viewed as a regression method with complicated input X and output Y. One unique character about deep learning is that the input X could be highly unstructured data (e.g., images and sentences) and the output(outputs?) Y are usual responses (e.g., class label). Another unique character about deep learning is that it has a highly nonlinear model structure with many layers. This provides us a basic theoretical foundation to understand deep learning from statistical perspectives. We do believe classical statistical wisdom combined with modern deep learning techniques can bring us novel methodology with outstanding empirical performance. In this talk, I would like to report 3 recent progresses made by our team in this regard. The first progress is about stochastic gradient descent (SGD) algorithm. SGD is the most popularly optimization algorithm used for deep learning model training. Its practical implementation relies on the specification of a tuning parameter: learning rate. In current practical applications, the choice of learning rate is highly subjective and depends on personal experiences. To solve the problem, we propose a local quadratic approximation (LQA) idea for an automatic and nearly optimal determination of the tuning parameter. The second progress focuses on model compression for convolutional neural networks (CNNs). Convolution neural network is one of the representative models in deep learning which has shown excellent performance in the field of computer vision. However, it is extremely complicated with a huge number of parameters. We then try to apply the classical principal component analysis (PCA) method to each convolutional layer, which leads to a new method called progressive principal component analysis (PPCA) method. Our PPCA method brings a significant reduction in model complexity without significantly sacrifices the out-of-sample forecasting accuracy. The last progress considers factor modeling. The input feature of a deep learning model is often of ultrahigh dimension (e.g., an image). In this case, a strong factor structure of the input feature can be detected constantly. This means a significant proportion of variability about the input feature can be explained by a low dimensional latent factor, which could be modeled in a very easy way. We then develop a novel factor normalization (FN) methodology. We first decompose an input feature X into two parts: the factor part and the residual part. Next, we reconstruct the baseline DNN model into a factor-assisted DNN model. Lastly, we provide a new SGD algorithm with adaptive learning rates for new model training. Our method leads to superior convergence speed and excellent out-of-sample performances compared to original model with a series of gradient type algorithms.
講座語言:中文
講座時間:2020年10月9日(星期五),10:30AM-12:00NOON
講座線下地點:廈門大學經濟樓N402
講座線上連結(即報名連結,掃描二維碼或點擊文末「閱讀原文」直達):