科學家開發卷積神經網絡預測三維基因組
作者:
小柯機器人發布時間:2020/10/15 15:45:23
格萊斯頓研究所的數據科學和生物技術Katherine S. Pollard小組在研究中取得進展。他們開發了Akita用於基於DNA序列預測三維基因組。 相關論文發表在2020年10月12日出版的《自然—方法學》雜誌上。
這裡研究團隊提出一個卷積神經網絡Akita,只通過DNA序列就可以準確地預測基因組的摺疊。由Akita學習的演示強調了一個特定方向語法對於CTCF結合位點的重要性。Akita學到了基因組摺疊中可預測的核苷酸水平特徵,揭示了核心CTCF模體(motif)以外核苷酸的作用。訓練後,Akita可以進行快速的計算機模擬預測。為展示Akita的預測能力,課題組人員演示Akita如何進行計算機上的飽和突變(saturation mutagenesis),解釋eQTLs,預測結構變異以及探測特定種類的基因組摺疊。總而言之,這些結果使得能夠從序列到結構解碼基因組功能。
據悉,在複製間期,人類基因組序列在三維空間摺疊成多樣的基因座特有的聯繫模式。黏著蛋白和CTCF(CCCTC結合因子)是關鍵調控因子。如通過染色體構象捕獲方法所測定的,幹擾任一水平都會極大地破壞全基因組摺疊。一個給定DNA序列如何編碼基因座特有的摺疊模式仍然是未知的。
附:英文原文
Title: Predicting 3D genome folding from DNA sequence with Akita
Author: Geoff Fudenberg, David R. Kelley, Katherine S. Pollard
Issue&Volume: 2020-10-12
Abstract: In interphase, the human genome sequence folds in three dimensions into a rich variety of locus-specific contact patterns. Cohesin and CTCF (CCCTC-binding factor) are key regulators; perturbing the levels of either greatly disrupts genome-wide folding as assayed by chromosome conformation capture methods. Still, how a given DNA sequence encodes a particular locus-specific folding pattern remains unknown. Here we present a convolutional neural network, Akita, that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of an orientation-specific grammar for CTCF binding sites. Akita learns predictive nucleotide-level features of genome folding, revealing effects of nucleotides beyond the core CTCF motif. Once trained, Akita enables rapid in silico predictions. Accounting for this, we demonstrate how Akita can be used to perform in silico saturation mutagenesis, interpret eQTLs, make predictions for structural variants and probe species-specific genome folding. Collectively, these results enable decoding genome function from sequence through structure.
DOI: 10.1038/s41592-020-0958-x
Source: https://www.nature.com/articles/s41592-020-0958-x