HUMAnN2:人類微生物組統一代謝網絡分析2

2021-01-14 宏基因組

關於宏基因組常用的有參分析流程，主要是快速獲得物種組成和功能組成，之前分享了

今天再介紹來自同一作者的另一個軟體，可以一步完成功能和代謝組成。

HUMAnN2: The HMP Unified Metabolic Analysis Network 2，它在宏基因組研究中非常有用，通過這個分析，不僅能獲得微生物的物種豐度信息，還能準確有效地獲得微生物代謝途徑和功能模塊信息。

主頁：http://www.huttenhower.org/humann2

官方教程：https://bitbucket.org/biobakery/humann2/wiki/Home (版本2017-12-14)

中文版本翻譯日期(2018-05-01)

HUMAnN是基於宏基因組、宏轉錄組數據分析微生物通路豐度的有效工具。這一過程稱為功能譜，目的是描述群體成員的代謝潛能。可以回答微生物群體成員可能幹什麼，或在幹什麼的問題。

軟體特點：

可對已知和末知生物分析群體功能譜

可獲得基因組、基因和通路層面的結果

UniRef資料庫提供基因家族的定義

MetaCyc通路基因通路的定義

MinPath提供定義的最小通路集

簡單的使用界面(單行命令工作流)

加速序列比對

採用Bowtie2加速核酸水平搜索

採用Diamond加速翻譯蛋白水平搜索

HUMAnN2工作流程圖

安裝

如果你安裝過python，且有pip安裝工具，可以輕鬆安裝humann2

# 軟體安裝
pip install humann2

# 或可選手動下載安裝
wget //files.pythonhosted.org/packages/43/07/ec41577c3c1f9b578875ade8ed549d14fc2944c13cb7504579d542b62a69/humann2-0.11.1.tar.gz

# 前面仍不成功，推薦conda安裝更快更好用
conda install humann2

# 測試安裝
humann2_test

# 比如我使用conda安裝程序至/conda/bin目錄，且沒有添加環境變量，可以使用絕對路徑調用程序

# 下載資料庫
wd=/conda/bin
$wd/humann2_databases --available
# 5.37GB
$wd/humann2_databases --download chocophlan full /data/humann2
# 5.87GB，解壓後11G
$wd/humann2_databases --download uniref uniref90_diamond /data/humann2

依賴關係

# Diaomond http://ab.inf.uni-tuebingen.de/software/diamond/
wget http://github.com/bbuchfink/diamond/releases/download/v0.9.21/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz
sudo ln -fs `pwd`/diamond /usr/local/bin/

分析實戰

輸入文件為fastq，輸出文件為指定目錄中有各定量表格

cd ~/ath/jt.terpene.meta/clean_data/JT-545
# 可接受壓縮文件fastq，並自建目錄
$wd/humann2 --input 25/JT-545_25.rmhost.1.fq.gz --output humann2_25 &
$wd/humann2 --input 26/JT-545_26.rmhost.1.fq.gz --output humann2_26 &
$wd/humann2 --input 27/JT-545_27.rmhost.1.fq.gz --output humann2_27 &

輸出文件

1. 基因家族文件

# Gene Family $SAMPLENAME_Abundance-RPKs
UNMAPPED 187.0
UniRef50_unknown 150.0
UniRef50_unknown|g__Bacteroides.s__Bacteroides_fragilis 150.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon 67.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|g__Bacteroides.s__Bacteroides_fragilis 57.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|g__Bacteroides.s__Bacteroides_finegoldii 5.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|g__Bacteroides.s__Bacteroides_stercoris 4.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|unclassified 1.0
UniRef50_O83668: Fructose-bisphosphate aldolase 60.0
UniRef50_O83668: Fructose-bisphosphate aldolase|g__Bacteroides.s__Bacteroides_vulgatus 31.0
UniRef50_O83668: Fructose-bisphosphate aldolase|g__Bacteroides.s__Bacteroides_thetaiotaomicron 22.0
UniRef50_O83668: Fructose-bisphosphate aldolase|g__Bacteroides.s__Bacteroides_stercoris 7.0

文件名：SAMPLENAME_genefamilies.tsv

群體中每個基因家族的豐度。基因家族是一組進化上相關的編碼蛋白序列，通常具有相似功能。

基因家族的豐度在群體水平分級顯著，顯示已知和未知物種的貢獻度。

使用MetaPhlAn2軟體和ChocoPhlAn資料庫，檢索核酸翻譯的蛋白資料庫

基因家族的豐度採用RPK(每Kb的reads)以標準化不同的基因長度；RPK單位代表基因或轉錄本在群體中的拷貝數；RPK值可進一步求和標準化，用於不同樣品測序深度的比較。

如果輸入文件是基因表，不會創建基因家族文件

UNMAPPED是兩步核酸和蛋白搜索後，仍無法比對的reads數量。

UniRef50_unknown 代表可比對ChocoPhlAn，但沒有注釋

2. 通路豐度文件

#Pathway $SAMPLENAME_Abundance
UNMAPPED 140.0
UNINTEGRATED 87.0
UNINTEGRATED|g__Bacteroides.s__Bacteroides_caccae 23.0
UNINTEGRATED|g__Bacteroides.s__Bacteroides_finegoldii 20.0
UNINTEGRATED|unclassified 12.0
PWY0-1301: melibiose degradation 57.5
PWY0-1301: melibiose degradation|g__Bacteroides.s__Bacteroides_caccae 32.5
PWY0-1301: melibiose degradation|g__Bacteroides.s__Bacteroides_finegoldii 4.5
PWY0-1301: melibiose degradation|unclassified 3.0
PWY-5484: glycolysis II (from fructose-6P) 54.7
PWY-5484: glycolysis II (from fructose-6P)|g__Bacteroides.s__Bacteroides_caccae 16.7
PWY-5484: glycolysis II (from fructose-6P)|g__Bacteroides.s__Bacteroides_fi

文件名：OUTPUT_DIR/$SAMPLENAME_pathabundance.tsv

代表群體中通路的豐度

即有群體水平，又有物種水平豐度

通路按豐度大小排序，物種組分也按豐度大小排序，全為0的通路不輸出

通路的比例是是完整拷貝的豐度，如線性通路Gene1-4，分別為10，5，5，5。則按5計算。

與基因不同，通路的豐度並一定是群體組分的總合。A物種[5, 5, 10, 10]為5個拷貝，B物種[10, 10, 5, 5]為5個拷貝，而總體有15個拷貝；詳細的計算說明見英文幫助原文

對於單個基因必須的反應步驟為零豐度時，可進行所需最低豐度填充。

MetaCyc默認定義最簡通路解析群體觀測的代謝通路；

用戶也可以自定義通路資料庫

非線性基因拷貝數、無法比對序列處理方法請參考英文原文

3. 通路覆蓋度文件

# Pathway $SAMPLENAME_Coverage
UNMAPPED 1.0
UNINTEGRATED 1.0
UNINTEGRATED|g__Bacteroides.s__Bacteroides_caccae 1.0
UNINTEGRATED|g__Bacteroides.s__Bacteroides_finegoldii 1.0
UNINTEGRATED|unclassified 1.0
PWY0-1301: melibiose degradation 1.0
PWY0-1301: melibiose degradation|g__Bacteroides.s__Bacteroides_caccae 1.0
PWY0-1301: melibiose degradation|g__Bacteroides.s__Bacteroides_finegoldii 1.0
PWY0-1301: melibiose degradation|unclassified 1.0
PWY-5484: glycolysis II (from fructose-6P) 1.0
PWY-5484: glycolysis II (from fructose-6P)|g__Bacteroides.s__Bacteroides_caccae 0.7
PWY-5484: glycolysis II (from fructose-6P)|g__Bacteroides.s__Bacteroides_finegoldii 0.7
PWY-5484: glycolysis II (from fructose-6P)|unclassified 0.3

4. 中間臨時文件

Bowtie2比對結果($DIR/$SAMPLENAME_bowtie2_aligned.sam)

簡化Bowtie2比對結果$DIR/$SAMPLENAME_bowtie2_aligned.tsv

Bowtie2資料庫索引$DIR/$SAMPLENAME_bowtie2_index*

Bowtie2末比對的reads$DIR/$SAMPLENAME_bowtie2_unaligned.fa

自定義ChocoPhlAn資料庫$DIR/$SAMPLENAME_custom_chocophlan_database.ffn

MetaPhlAn2的bowtie2比對結果$DIR/$SAMPLENAME_metaphlan_bowtie2.txt

MetaPhlAn2 bugs list$DIR/$SAMPLENAME_metaphlan_bugs_list.tsv

翻譯後的比對結果$DIR/$SAMPLENAME_$TRANSLATEDALIGN_aligned.tsv

翻譯後仍末比對序列$DIR/$SAMPLENAME_$TRANSLATEDALIGN_unaligned.fa

日誌文件$DIR/$SAMPLENAME.log

配置軟體

# 顯示參數
$wd/humann2_config --print
# 修改參數格式
$wd/humann2_config --update $SECTION $NAME $VALUE
# 如修改線程數
$wd/humann2_config --update run_modes threads 12

HUMAnN2小工具humann2_barplot

Basic usage: $ humann2_barplot --input $TABLE.tsv --feature $FEATURE --outfile $FIGURE
$TABLE.tsv = a stratified HUMAnN2 output file
$FEATURE = Feature from the table to plot (defaults to first feature)
$FIGURE = Where to save the figure
Run with -h to see additional command line options

可選擇某個Feature進行柱狀圖可視化。—help參數可查看相關排序、標準化選項。

合併多樣品結果為表格

此步非常重要，我們無法多少個樣品，humann2結果僅為一列。多樣品需經本步合併為矩陣，方便下遊統計分析和差異比較。

Basic usage: $ humann2_join_tables --input $INPUT_DIR --output $TABLE
$INPUT_DIR = a directory containing gene/pathway tables (tsv or biom format)
$TABLE = the file to write the new single gene table (biom format if input is biom format)
Optional: --file_name $STR will only join gene tables with $STR in file name
Run with -h to see additional command line options

其它小工具

構建自定義資料庫 humann2_build_custom_database

查看屬水平基因家族與通路 humann2_gene_families_genus_level

增加物種分類(降低可信度) humann2_infer_taxonomy

合併MetaPhlAn2分類結果 humann2_reduce_table

對樣品、通路進行合併/重分組操作humann2_regroup_table

對Feature進行重命名 humann2_rename_table

表格標準化 humann2_renorm_table

宏轉錄組標準化 humann2_rna_dna_norm

將結果分為分層、末分層兩個文件 humann2_split_stratified_table

將多樣品表分類單樣品 humann2_split_table

挖掘菌株水平差異 humann2_strain_profiler

輸出組成通路的基因豐度 humann2_unpack_pathways

其它說明

選擇基因家族精度的水平

選擇UniRfe90還是50? 推薦90

選擇翻譯搜索模式

Bypass translated search: 無法比對泛基因組的保存，再比對蛋白，結果中沒有末分類的層級

Filtered translated search，是EC-filtered protein database(是UniRef中Level4層面國際生物化學聯合會酶委員分類)的默認方案

Comprehensive translated search 使用最綜合的蛋白資料庫，但速度比2慢5倍。默認為方案2

雙端序列：HUMAnN2不考慮雙端序列，全當作單端

PICRUSt輸出可繼續用本軟體分析，需要下載KEGG完整資料庫，拆分預測宏基因組為每個樣品單個文件，再運行humann2，再合併。詳見官網

使用KEEG資料庫，目前新版收費，免費版本最新為v56，可同HUMAnN1一起下載

結果可使用QIIME的core_diversity_analyses.py進行多樣性分析

HUMAnN2分析宏轉錄組

常見問題

如何輸出分析過程更多信息？添加--verbose參數

如何使用多核？--threads $CORES或修改默認設置

如何清空臨時文件？--remove-temp-output

指定ChocoPhlAn資料庫位置？--nucleotide-database $DIR

指定UniRef資料庫位置？--protein-database $DIR

使用Metaphlan2結果繼續分析？--taxonomic-profile bugs_list.tsv

修改樣本名？--output-basename $NAME

去除分層的結果？--remove-stratified-output

使用unipathways databases？--pathways unipathway

輸出biom格式結果--output-format biom

修改相似度閾值--identity-threshold <50.0>

修改metaphlan2參數--metaphlan-options="-t rel_ab"

報錯與解決

CRITICAL ERROR: Can not call software version for diamond

diamond沒有在環境變量，下載解壓並確保添加到環境變量

The database file for MetaPhlAn does not exist at /mnt/bai/yongxin/software/metaphlan2/db_v20/mpa_v20_m200.pkl . Please provide the location with —metaphlan-options

沒有找到metaphlan2的資料庫，是metaphlan2新版本目錄更改了位置，永久方法是建一個舊位置的硬鏈。

進入metaphlan2安裝目錄

mkdir db_v20
ln `pwd`/databases/mpa_v20_m200.* db_v20/

CRITICAL ERROR: Error executing: /mnt/bai/public/bin/diamond blastx —query /mnt/bai/yongxin/ath/jt.terpene.meta/cleandata/JT-545/humann2/JT-545_25.rmhost.1_humann2_temp/JT-545_25.rmhost.1_bowtie2_unaligned.fa —evalue 1.0 —threads 1 —max-target-seqs 20 —outfmt 6 —db /data/humann2/uniref/uniref90_annotated.1.1 —out /mnt/bai/yongxin/ath/jt.terpene.meta/clean_data/JT-545/humann2/JT-545_25.rmhost.1_humann2_temp/tmp2_lUDg/diamond_m8_TL5Rl —tmpdir /mnt/bai/yongxin/ath/jt.terpene.meta/clean_data/JT-545/humann2/JT-545_25.rmhost.1_humann2_temp/tmp2_lUDg

/mnt/bai/public/bin/diamond是個目錄，不知為什麼系統會找這個目錄當前程序，系統我也裝在了 /usr/local/bin/diamond 中。修改此目錄為程序

宏基因組相關學習資源

1. 基礎理論教程

2. 分析實戰有參系列：

3. 分析實戰De novo系列:

如果基礎知識體系不完善，自學存在困難的小夥伴，急時上車也是不錯的選擇。成為實驗中不可或缺的人，趕快點擊閱讀原文報名我們的培訓，加速你入行！http://www.ehbio.com/Training

HUMAnN2:人類微生物組統一代謝網絡分析2

相關焦點

HUMAnN2 --宏基因組代謝通路分析

Nature子刊:HUMAnN2實現宏基因組和宏轉錄組種水平功能組成分析

技術貼 | 宏基因組 + 宏轉錄組分析工具:HUMAnN

健康的人類微生物組

人類微生物組研究設計、樣本採集和生物信息分析指南

青島能源所等開發微生物組內部代謝互作示蹤新技術

微生物組研究:關乎人類的未來

資源分享 | 微生物組分析必備書籍《Microbiome Analysis》

青島能源所微生物組大數據分析工具開發獲進展

健康的人類微生物組|微生物群|腸道菌群|人類|健康|細菌|植物|...

9文聚焦：宏基因組學與微生物組分析方法和工具

科學家分析人類腫瘤微生物組組成

Nature靈魂拷問：微生物組數據一大堆，如何能改變人類健康？

Nature靈魂問：微生物組數據一大堆，如何能改變人類健康？

深度解讀人類微生物組研究的25個重大裡程碑事件!

可用於分析微生物與代謝產物之間相互作用的人工神經網絡

中國土壤微生物組:進展與展望*

Microbiome: 微生物組的定義重新審視：舊概念和新挑戰

mSystems:蘇曉泉、徐健等發布全球微生物組結構和功能搜索平臺2.0

Cell:深刻反思腸道菌群與人類疾病研究,是時候將微生物組研究拉回...