4月bioRxiv生信好文速覽|galaxy聯合創始人離世

2021-03-01 生信人

介於飛速增長的新冠肺炎相關preprint,bioRxiv,medRxiv,以及谷歌學術都開出了專門通道供大家查找新冠相關主題的文章。

 需要注意的是,biorxiv將拒絕你的研究成果在其上發布,如果你的文章是:針對新冠肺炎的、純生信的、藥物作用或療效的預測。為此,biorxiv聯合創始人Richard Server發推表示這是團隊深思熟慮的結果:The balance we have to strike is speeding up science but also minimising potential for harm。此舉正是為了平衡預印本快速靈活的特點及其質量殘次不齊和缺乏審稿所帶來的可能的巨大的臨床風險,然而,被退稿的美國東北大學教授Albert-László Barabási對此卻不以為然:

針對新冠肺炎預印本的質量問題,美國西奈醫學院免疫學研究所(Immunology Institute at the Mount Sinai School of Medicine)的研究人員對bioRxiv和medRxiv上的preprint進行了所謂的pre-review,也就是對預印本的預審稿,截止目前為止已發布了168個點評,小編走馬觀花看了一下,都十分詳盡。大家如感興趣可以follow ID為 @sinaiimmunologyreviewproject的medrxiv用戶,以及@SinaiImmunol的推特帳號。總之,如何平衡速度與質量仍然是biorxiv和整個preprint community未來面臨的重要課題。

 

剛剛過去的一個月也有悲痛的消息傳來:Galaxy聯合創始人、約翰霍普金斯大學教授James Taylor在4月2日不幸離開了我們,年僅40歲。作為開放科學的旗手,Taylor共於biorxiv上投放過11篇手稿,且致力於打造透明和可重複的生物信息分析——這一切都與預印本的理念不謀而合。05年,身為賓州州立大學比較基因組與生物信息學中心博士生的Taylor與合伙人開始了Galaxy的創建工作【1】。15年過去了,Galaxy早已名滿天下,其友好而又強大的互動分析讓高深莫測的生物信息分析不再神秘,也帶領無數生物學從業人員走入基因組學的世界,而今天Galaxy中層出不窮的分析工具大概是對Taylor教授最好的紀念吧。

 ——The Galaxy Project【2】

 

1. 一個基於Galaxy的scRNA-seq互動式分析集成環境

User-friendly, scalable tools and workflows for single-cell analysis

Single-cell RNA-Seq (scRNA-Seq) data analysis requires expertise in command-line tools, programming languages and scaling on compute infrastructure. As scRNA-Seq becomes widespread, computational pipelines need to be more accessible, simpler and scalable. We introduce an interactive analysis environment for scRNA-Seq, based on Galaxy, with ~70 functions from major single-cell analysis tools, which can be run on compute clusters, cloud providers or single machines, to bring compute to the data in scRNA-Seq.

 

2. gamete binning,染色體+單倍型水平的基因組組裝技術

Chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes

Generating haplotype-resolved, chromosome-level assemblies of heterozygous genomes remains challenging. To address this, we developed gamete binning, a method based on single-cell sequencing of hundreds of haploid gamete genomes, which enables the separation of conventional long sequencing reads into two haplotype-specific read sets. After independently assembling the reads of each haplotype, the contigs are scaffolded to chromosome-level using a genetic map derived from the recombination patterns within the same gamete genomes. As a proof-of-concept, we assembled the two genomes of a diploid apricot tree supported by the analysis of 445 pollen genomes. Both assemblies (N50: 25.5 and 25.8 Mb) featured a haplotyping precision of >99% and were accurately scaffolded to chromosome-level as reflected by high levels of synteny to closely-related species. These two assemblies allowed for first insights into haplotype diversity of apricot and enabled the identification of non-allelic crossover events introducing severe chromosomal anomalies in 1.6% of the pollen genomes.

 

3. 港中文學者:動物界活化石——馬蹄蟹(鱟)基因組揭秘三輪全基因組倍增後的微小RNA進化

Horseshoe crab genomes reveal the evolutionary fates of genes and microRNAs after three rounds (3R) of whole genome duplication

Whole genome duplication (WGD) has occurred in relatively few sexually reproducing invertebrates. Consequently, the WGD that occurred in the common ancestor of horseshoe crabs ~135 million years ago provides a rare opportunity to decipher the evolutionary consequences of a duplicated invertebrate genome. Here, we present a high-quality genome assembly for the mangrove horseshoe crab Carcinoscorpius rotundicauda (1.7Gb, N50 = 90.2Mb, with 89.8% sequences anchored to 16 pseudomolecules, 2n = 32), and a resequenced genome of the tri-spine horseshoe crab Tachypleus tridentatus (1.7Gb, N50 = 109.7Mb). Analyses of gene families, microRNAs, and synteny show that horseshoe crabs have undergone three rounds (3R) of WGD, and that these WGD events are shared with spiders. Comparison of the genomes of C. rotundicauda and T. tridentatus populations from several geographic locations further elucidates the diverse fates of both coding and noncoding genes. Together, the present study represents a cornerstone for a better understanding of the consequences of invertebrate WGD events on evolutionary fates of genes and microRNAs at individual and population levels, and highlights the genetic diversity with practical values for breeding programs and conservation of horseshoe crabs.

4. 三個新的猴基因組助力靈長類系統發育分析

Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression

Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here we present new reference genome assemblies for three Old World Monkey species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.

 

5. 結構變異分析工具(SV caller)哪家強?

A comprehensive benchmarking of WGS-based structural variant callers

Advances in whole genome sequencing promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from whole genome sequencing (WGS) data presents a substantial number of challenges and a plethora of SV-detection methods have been developed. Currently, there is a paucity of evidence which investigators can use to select appropriate SV-detection tools. In this paper, we evaluated the performance of SV-detection tools using a comprehensive PCR-confirmed gold standard set of SVs. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of SV-detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance, as the SV-detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV-detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low and ultra-low pass sequencing data.

這是卡在了第一步?

 

6. 基於RAD-Seq的性染色體計算工具,法國農業食品環境研究所(INRAE)出品

RADSex: a computational workflow to study sex determination using Restriction Site-Associated DNA Sequencing data

The study of sex determination and sex chromosome organisation in non-model species has long been technically challenging, but new sequencing methodologies are now enabling precise and high-throughput identification of sex-specific genomic sequences. In particular, Restriction Site-Associated DNA Sequencing (RAD-Seq) is being extensively applied to explore sex determination systems in many plant and animal species. However, software designed to specifically search for sex-biased markers using RAD-Seq data is lacking. Here, we present RADSex, a computational analysis workflow designed to study the genetic basis of sex determination using RAD-Seq data. RADSex is simple to use, requires few computational resources, makes no prior assumptions about type of sex-determination system or structure of the sex locus, and offers convenient visualization through a dedicated R package. To demonstrate the functionality of RADSex, we re-analyzed a published dataset of Japanese medaka, Oryzias latipes, where we uncovered a previously unknown Y chromosome polymorphism. We then used RADSex to analyze new RAD-Seq datasets from 15 fish species spanning multiple systematic orders. We identified the sex determination system and sex-specific markers in six of these species, five of which had no known sex-markers prior to this study. We show that RADSex greatly facilitates the study of sex determination systems in non-model species and outperforms the commonly used RAD-Seq analysis software STACKS. RADSex in speed, resource usage, ease of application, and visualization options. Furthermore, our analysis of new datasets from 15 species provides new insights on sex determination in fish.

 

7. 普通小麥的新基因組組裝:找出5700新基因?

Chromosome-scale assembly of the bread wheat genome, Triticum aestivum, reveals over 5700 new genes

Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all 3 wheat subgenomes at chromosome scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 gigabases of genomic sequence. We earlier published an independent wheat assembly (Triticum 3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC 1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum 4.0, contains 15.07 gigabases of non-gap sequence anchored to chromosomes, which is 1.2 gigabases more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered more than 5700 new genes, all of them duplications in the Chinese Spring genome that are missing from the IWGSC assembly and annotation. The Triticum 4.0 assembly and annotations are freely available at www.ncbi.nlm.nih.gov/bioproject/PRJNA392179.

 

8. 馬普發育所Detlef Weigel:從single-genome到multi-genome參考序列

An Algorithm to Build a Multi-genome Reference

To overcome the limits imposed by mapping sequence reads against a single reference genome, or serially mapping them against multiple reference genomes, we have developed the MGR method that allows simultaneous comparison against multiple high-quality reference genomes, in order to remove the bias that comes from using only a single-genome reference and to simplify downstream analyses. To this end, we present the MGR algorithm that creates a graph (MGR graph) as a multi-genome reference. To reduce the size and complexity of the multi-genome reference, highly similar orthologous1 and paralogous2 regions are collapsed while more substantial differences are retained. To evaluate the performance of our model, we have developed a genome compression tool, which can be used to estimate the amount of shared information between genomes.

9. 【新冠肺炎】大連理工大學:Computational analysis suggests putative intermediate animal hosts of the SARS-CoV-2

The recent emerged SARS-CoV-2 may first transmit to intermediate animal host from bats before the spread to humans. The receptor recognition of ACE2 protein by SARS-CoVs or bat-originated coronaviruses is one of the most important determinant factors for the cross-species transmission and human-to-human transmission. To explore the hypothesis of possible intermediate animal host, we employed molecular dynamics simulation and free energy calculation to examine the binding of bat coronavirus with ACE2 proteins of 47 representing animal species collected from public databases. Our results suggest that intermediate animal host may exist for the zoonotic transmission of SARS-CoV-2. Furthermore, we found that tree shrew and ferret may be two putative intermediate hosts for the zoonotic spread of SARS-CoV-2. Collectively, the continuous surveillance of pneumonia in human and suspicious animal hosts are crucial to control the zoonotic transmission events caused by SARS-CoV-2.

 

10. 【新冠肺炎】Introductions and early spread of SARS-CoV-2 in France

Following the emergence of coronavirus disease (COVID-19) in Wuhan, China in December 2019, specific COVID-19 surveillance was launched in France on January 10, 2020. Two weeks later, the first three imported cases of COVID-19 into Europe were diagnosed in France. We sequenced 97 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from samples collected between January 24 and March 24, 2020 from infected patients in France. Phylogenetic analysis identified several early independent SARS-CoV-2 introductions without local transmission, highlighting the efficacy of the measures taken to prevent virus spread from symptomatic cases. In parallel, our genomic data reveals the later predominant circulation of a major clade in many French regions, and implies local circulation of the virus in undocumented infections prior to the wave of COVID-19 cases. This study emphasizes the importance continuous and geographically broad genomic sequencing and calls for further efforts with inclusion of asymptomatic infections.

 

11. 【作者自薦】 CytoTalk: De novo construction of signal transduction networks using single-cell RNA-Seq data

The study of sex determination and sex chromosome organisation in non-model species has long been technically challenging, but new sequencing methodologies are now enabling precise and high-throughput identification of sex-specific genomic sequences. In particular, Restriction Site-Associated DNA Sequencing (RAD-Seq) is being extensively applied to explore sex determination systems in many plant and animal species. However, software designed to specifically search for sex-biased markers using RAD-Seq data is lacking. Here, we present RADSex, a computational analysis workflow designed to study the genetic basis of sex determination using RAD-Seq data. RADSex is simple to use, requires few computational resources, makes no prior assumptions about type of sex-determination system or structure of the sex locus, and offers convenient visualization through a dedicated R package. To demonstrate the functionality of RADSex, we re-analyzed a published dataset of Japanese medaka, Oryzias latipes, where we uncovered a previously unknown Y chromosome polymorphism. We then used RADSex to analyze new RAD-Seq datasets from 15 fish species spanning multiple systematic orders. We identified the sex determination system and sex-specific markers in six of these species, five of which had no known sex-markers prior to this study. We show that RADSex greatly facilitates the study of sex determination systems in non-model species and outperforms the commonly used RAD-Seq analysis software STACKS. RADSex in speed, resource usage, ease of application, and visualization options. Furthermore, our analysis of new datasets from 15 species provides new insights on sex determination in fish.

 

引文

1. Nekrutenko, A., Schatz, M.C. In memory of James Taylor: the birth of Galaxy. Genome Biol 21, 105 (2020). https://doi.org/10.1186/s13059-020-02016-0

2. https://galaxyproject.org/jxtx/

相關焦點

  • bioRxiv生信好文速覽
    上個月6號,biorxiv上post出了一篇獨特的預印本(preprint):來自biorxiv的創建團隊的Richard Server等人,以
  • 2019年5月bioRxiv生信好文速覽
    到上個月,距生信人推出月度的bioRxiv生信好文速覽欄目已經整整一年了。
  • 2019年3月bioRxiv生信好文速覽
    Nature Communications為大家提供了一份列表以供參考:https://nature-research-under-consideration.nature.com/posts/20010-community-recognised-preprint-serversarXiv, bioRxiv, ChemRxiv, EarthArXiv, ESSOAr
  • bioRxiv速覽 | 拓撲異構酶專題
    Wojciech Kaspera, Wojciech Szopa, Mateusz Bujko, Bartosz Czapski, Miroslaw  Zabek, Ewa Iżycka-Świeszewska, Wojciech Kloc,  Pawel Nauman, Bartosz Wojtas, Bozena Kaminska連結:https://www.biorxiv.org
  • BioRxiv到底是什麼期刊?為什麼突然火了?
    比如我前幾年投到《BMCCancer》的一篇論文,審稿過程就長達12月,如果把修稿時間算在內就更長了。一般來講,一篇論文從投稿到發表一般需要12-24個月的時間,超過三年也不罕見,在自己身上就發生過數次。無疑這嚴重地影響了學術交流的效率,是嚴重的科學資源的浪費。
  • bioRxiv | 基於結構的高效PROTAC分子設計策略
    2020年5月30日,美國Fox Chase Cancer Center的John Karanicolas課題組和以色列Weizmann Institute of Science的Nir London課題組在bioRxiv上傳了各自的研究成果,使用基於Rosetta軟體的方法設計PROTAC分子。
  • [bioRxiv] ROP-GEF3是根毛髮生所必須的重要調控因子
    今日德國海德堡大學Guido Grossmann實驗室在bioRxiv上的文章指出ROP鳥嘌呤核苷酸交換因子3(GEF3)參與到將ROP定位到RHID參與到根毛的發生,而GEF4參與調控之後的根毛生長。 首先作者結合已經發表的數據以及對不同Marker基因在根毛髮生、生長過程進行觀察。
  • BioRxiv已有8篇論文關注新型冠狀病毒—2019-nCoV
    月23日中科院武漢病毒所石正麗研究團隊在預印本網站BioRxiv上發布了武漢新型冠狀病毒2019-nCoV與一種蝙蝠中的冠狀病毒的序列一致性高達96%,推測2019-nCoV或來源於蝙蝠(詳見:特別報導 | 相似度96%,石正麗團隊報導新型冠狀病毒或來源於蝙蝠),該論文引起了廣泛關注。
  • ...同行評議|預印本|biorxiv|洛斯阿拉莫斯國家實驗室|冷泉港實驗室
    關於同行評議的另一個不好消息是:據Science網站報導,一項最新研究表明,經過同行評審的論文質量僅比預印本文章提升了4%。該研究將2016年發表於bioRxiv的56篇預印本文章與後來在期刊上發表的同行評審版本進行了比較。其中,大多數文章屬於遺傳學和神經科學。
  • bioRxiv|廈門大學夏寧邵團隊合作發表基於SARS-CoV-2 S蛋白假病毒系統用於篩選中和抗體相關文章
    歡迎點擊上方BioResearcher關注我們廈門大學夏寧邵教授,張天英博士,廈門大學附屬第一醫院李志勇,中國疾病預防控制中心病毒疾病預防控制所黃保英副研究員在bioRxiv
  • bioRxiv:利用比較基因組分析揭示新型冠狀病毒的進化機制
    2020年3月7日 訊 /生物谷BIOON/ --近日,一篇發表在預印版平臺bioRxiv上題為「Comparative genomic analysis revealed specific mutation pattern between human coronavirus SARS-CoV-2 and Bat-SARSr-CoV RaTG13
  • bioRxiv:華人科學家揭示2019-nCoV的潛在T細胞和B細胞表位
    2020年2月26日訊 /生物谷BIOON /--截至2020年2月26日,2019-nCoV已經感染了超過26個國家的超過77000人,奪走了超過2000條生命。2019-nCoV是一種引起COVID-19的新型冠狀病毒,與SARS-CoV有很高的相似性。目前還沒有針對2019-nCoV或任何形式的冠狀病毒的疫苗被批准。
  • bioRxiv:體外篩選找到FDA批准的SARS-CoV-2複製抑制劑
    2020年4月13日訊 /生物谷BIOON /——新型冠狀病毒SARS-CoV-2於2019年底開始出現,並迅速在全球傳播,目前已經導致全球130餘萬人感染,7萬餘人死亡。圖片來源:https://cn.bing.com近日來自艾克斯馬賽大學的研究人員在預印本平臺bioRxiv上發表了一項題為"In vitro screening of a FDA approved chemical library reveals potential inhibitors of SARS-CoV-2 replication
  • SARS-CoV-2感染6個月後的體液免疫記憶和細胞免疫記憶
    12-11 14:25:29 來源: BioArtReports 舉報   2020年11月4
  • bioRxiv:科學家構建出了首個COVID-19開源全原子模型
    2020年6月10日 訊 /生物谷BIOON/ --SARS-CoV-2是誘發COVID-19的主要原因,刺突蛋白或S蛋白會促進病毒進入到宿主細胞內;日前,一篇發表在預印版平臺bioRxiv上的研究報告中,來自國立首爾大學等機構的科學家們通過研究開發出了首個全長S蛋白的開源全原子模型,相關研究結果非常重要,因為S蛋白在病毒進入宿主細胞內扮演著關鍵角色
  • bioRxiv:核苷酸類似物可以抑制SARS-CoV聚合酶
    2020年3月25日訊 /生物谷BIOON /——2020年3月25日訊 /生物谷BIOON /--新型冠狀病毒SARS-CoV-2已經造成了全球公共衛生緊急狀況。相關研究成果近日發表在預印本平臺bioRxiv上,題為"Nucleotide Analogues as Inhibitors of SARS-CoV Polymerase"。(生物谷Bioon.com)參考資料:
  • 復旦大學等證實:新冠病毒附著三文魚上在4℃下能存活一周以上
    北京時間9月6日,預印本bioRxiv上的一項新研究中(未經同行評審),來自華南農業大學、廣東省農業科學院、廣東海大集團海大研究院>和復旦大學的研究人員聯合研究表明,新冠病毒附著三文魚上在4℃能存活一周以上,仍具有感染性。
  • bioRxiv:科學家基於CRISPR-Cas12技術開發出了一種超靈敏、快速...
    2020年3月7日 訊 /生物谷BIOON/ --近日,一篇發表在預印版平臺bioRxiv上題為「An ultrasensitive, rapid, and portable coronavirus SARS-CoV-2 sequence detection method based on CRISPR-Cas12」的研究報告中,來自布宜諾斯艾利斯大學等機構的科學家們通過研究基於