偶然間在 youtube 上看到 Dan Knights 的 Microbiome Discovery 宏基因組入門課程,大致瀏覽了一下,由淺入深,從理論到實踐講得非常不錯,真是相見恨晚 QAQ,只看這個應該完全足夠入門宏基因組了~
課程播放列表:https://www.youtube.com/playlist?list=PLOPiWVjg6aTzsA53N19YqJQeZpSCH9QPc
RMarkdown 示例數據及實踐代碼:https://github.com/danknights/mice8992-2016
視頻目錄1. Intro to the Microbiome•介紹微生物組•如何進行研究•面對的一些挑戰(微生物組數據相對不穩定,biomarker discovery)
網址 https://youtu.be/6564K4-_DBI
2. How microbiome data are generated•如何產生這些數據的•兩種測序方法的優劣•宏基因組測序•擴增子測序
網址 https://youtu.be/FWT1HBzlWOE
3. 16S Variable Regions•為什麼選擇 16S 片段,16S rRNA 的結構功能•OTU 從何而來
網址 https://youtu.be/8Aa_mnyXm70
4. QIIME•QIIME 分析流程介紹
網址 https://youtu.be/iy0JWgzmM_A
4.5. (Optional) UNIX Command Line•UNIX 命令介紹以及 Git 的使用
網址 https://youtu.be/u2IQQUMeWy8
5. Picking OTUs•OTU 聚類方法•closed reference•de novo•UCLUST•CD-HIT•SUMACLUST•mothur•SWARM•open reference
網址 https://youtu.be/Ok5h24KZbAE
6. Assigning Taxonomy•如何注釋菌群分類•The Random Forests classifier seems to work better•Nearest neighbor using optimal gapped alignment with large reference databases will probably win eventually
網址 https://youtu.be/HkwFdzFLZ0I
7. Alpha Diversity•Alpha diversity measures diversity within communities•Beta diversity measures diversity between communities•Rarefaction determines saturation•There is room for experimental validation•不同計算 Alpha Diversity 的方法•species count•phylogenetic diversity (PD)•Chao1 Estimator
網址 https://youtu.be/9ZvoR89HYP8
8. Beta Diversity•Beta diversity measures diversity between communities•不同 Beta Diversity 的計算方法•euclidean distance•Chi-square distance, Chi-square is usually best for gradients•Bray-Curtis•Most people use Bray Curtis or UniFrac•用 PCoA 可視化
網址 https://youtu.be/lcbp6EecDg4
9. UniFrac•Beta diversity using UniFrac
網址 https://youtu.be/M8ylvsS0MHg
10. Statistical testing part 1•統計學基礎•Linear models are not always appropriate•Non-parametric tests (no distribution assumptions)•Generalized linear models(better underlying distributions)
網址 https://youtu.be/_uDv7LRUUsY
11. Statistical testing part 2•統計學基礎•t-test:Compare 2 groups•ANOVA:Compare three or more groups•Correlation:Compare to a continuous variable (e.g.Age)•Linear Regression:Similar to correlation,but you can regress on multiple variables at the same time•NOTE:all of these assume normal distributions!•When linear regression tests do not have normally distributed residuals,use a generalized linear model with the negative binomial distribution.This is in the edgeR package in R.•Use false discovery rate (FDR) to correct for multiple hypothesis testing.•If you don't need to control for confounders, non-parametric tests are very safe (although lower power than linear models or generalized linear models).•Two-category test:Mann-Whitney U (Wilcoxon) test (like a t-test)•Multi-category test:Kruskal-Wallis (like ANOVA)•Continuous test:Spearman correlation (like Pearson correlation)
網址 https://youtu.be/tNxfYqa5Rtc
12. Visualizing Microbiome Diversity, Ordination•用 R 或 QIIME 可視化•PCA•PCoA•NMDS
網址 https://youtu.be/H-u2iyiTzj0
13. Detrending and detecting gradients•用 QIIME 進行 detrending•Detrending does not have strong statistical foundations•Use detrending for visualizing a primary gradient•Use detrending to test whether your ordination recovered the primary gradient in axis 1
網址 https://youtu.be/aNLPzdfivkM
14. Constrained Ordination•CCA does direct gradient analysis•Never use more than 3-4 variates•More will simply over fit the data•Measure success by ratio of constrained variance explained to unconstrained variance explained•Canonical Correspondence analysis == Constrained Correspondence analysis•Not to be confused with canonical correlation analysis
網址 https://youtu.be/wHSECEI2tnQ
15. Clustering•Use caution with supervised ordination - need to assess significance carefully•Prediction strength >0.9 or Silhouette index >0.5•Clusters can be useful ways to analyze high-dimensional data•However, direct analysis is generally better when you have known gradients/groups•Diagnostics based on direct supervised analysis generally better
網址 https://youtu.be/ORX968xJqiA
16. Supervised Learning Background•Supervised learning tries to learning a model that will predict outcomes for novel samples•Example: classify cancer patients to determine treatment path•Models have to balance low complexity (underfitting) and high complexity (overfitting)•Model accuracy should be assessed in separate test data that it has never seen•10-fold cross validation is standard
網址 https://youtu.be/-eXnrA_3xzA
17. Supervised Learning Applications•用 QIIME 進行隨機森林分類
網址 https://youtu.be/ecz5SzP6Z_U
18. Source Tracking•介紹 Source Tracking 實現原理以及 SourceTracker 應用•Microbial source tracking can be done at the community-wide level•SourceTracker uses Bayesian methods to deconvolute mixtures of communities•Can identify contributions of individual species from each source environment•Does not model changes after mixing (temporal dynamics)•SourceTracker:github.com/danknights/sourcetracker/releases
網址 https://youtu.be/sDevHMuYJ28
19. Compositionality•Compositionality can cause spurious and even opposite conclusions•Dominant bugs can skew the relative abundance of minor bugs•Correlation is hard to infer•See Sparco, SPIEC-EASI•Best to do analysis with absolute abundances when possible•Spike-ins of foreign bugs and/or q PCR can circumvent this
網址 https://youtu.be/X60nFYpLWRs
20. PICRUSt and predicting functions. PICRUSt and predicting functions
•Shotgun metagenomics can describe the full functional repertoire of a metagenome, but it is expensive•PICRUSt can produce 80-85% accurate metagenomes from 16S data sets•Useful for mining published data•Can be used to select a subset of 16S samples for shotgun sequencing•Be sure to treat the results as "suggestive only"in publications•Mostly useful on human gut samples
網址 https://youtu.be/mPQCl_cHCsM
21. Shotgun Taxonomy•Shotgun metagenomics can be used for identifying species•Far superior to 16S•Approaches to Shotgun taxonomy•MetaPhlAn and MetaPhlAn2•Pre-identify a set of marker genes•Genes that are conserved within a species but not elsewhere•Requires alignment,but uses small database•Kraken,others•Use all unique k-mers as markers•UItrafast,but large database
網址 https://youtu.be/DlQTXdb2rhg
看到這裡的小夥伴恭喜你發現了隱藏福利~ 我幫大家搬運了全集
連結:https://pan.baidu.com/s/194r0zs5WbcNFQKQrV0Nnkg 密碼:0rjr
生信技能樹目前已經公開了三個生信知識庫,記得來關注哦~
每周文獻分享
https://www.yuque.com/biotrainee/weeklypaper
腫瘤外顯子分析指南
https://www.yuque.com/biotrainee/wes
生物統計從理論到實踐
https://www.yuque.com/biotrainee/biostat
友情宣傳強烈建議你推薦給身邊的博士後以及年輕生物學PI,多一點數據認知,讓他們的科研上一個臺階:
•底褲價轉錄組產品線(還送數據分析培訓)(八九百一個樣品)•三維基因組學分析實戰培訓班,線上直播課,2天僅需399(生信技能樹粉絲特權價格)•生信技能樹的2019年終總結 ,你的生物信息學成長寶藏•2020學習主旋律,B站74小時免費教學視頻為你領路