昆蟲基因組綜述

2021-02-21 顫抖吧小蟲子

最近看了幾篇昆蟲基因組方面的文章,作個筆記,正好下周一組會講。

Insect Genomics Methods and ProtocolsChapter 1:Arthropod Genome Sequencing and Assembly StrategiesKey words :

Genome project strategy, Genome assembly, Genome sequencing, Genomics, Insectgenomes, Oxford nanopore, Pacific biosciences, 10×Genomics

1 Introduction

we have far to go for the remainder of the greater than 1.5 million described species on planet Earth. Assequencing costs have decreased, researchers focusing on non-model organisms are jumping into genome projects themselves.

(路漫漫其修遠兮,吾將上下而求索)

2 Major Steps in a Genome ProjectAppropriate DNA isolation and DNA sequencing to feed the decided genome assembly choice.Genome assembly, annotation, and submission to public databases.Analysis and possible publication.

==Strategy Choice.==

genome size, which affects final cost, genome polymorphism, which affects the probability of high quality assembly or requires methods that address this, and individual size, which determines the minimum number of individuals and thus different haplotypes required for DNA isolation and sequencing.

舉例子:

一般哺乳動物基因組大小為2.5–3.5 Gb,通常有很低的多態性(1 SNP per 1000 bp in humans—below the sequencing error rate),而且從單一個體血液中可以獲取足夠量的DNA。鳥類的基因組大小一般為~1–1.5 Gb,很容易測序和組裝。爬行類和兩棲類動物的基因組很大,測序的很少。海膽或其他海洋生物有很高的多態性(海膽–2% SNP),所以它們容易產生短的contigs 以及 scaffold gaps,這就容易導致注釋出現不全或錯誤。

昆蟲的體型小,測序時可能需要多個體混合測序,種群數量大意味著個體間有很高的genome sequence polymorphism。而傳統的測序組裝方式很可能會產生 low contiguity (i.e., contig N50s) and more gaps 產生 poor gene model annotation。

DNA isolation and sequencing10X Genomics assembly strategy:10-100Kb,   55–60×fold genome coverage with 150 bp paired Illumina reads, and assembles using the company’s Supernova assembly softwarePB long read genome assembly strategy:  20–30 kb, 50–70×fold genome coverage using a PB single  Canu by first error correcting the longest reads using shorter PB reads, and then using 15–25×error corrected long read data for the final assembly.Illumina De Bruijn genome assembly strategy:  multiple sized insert Illumina paired end libraries (e.g., 500 bp, 2 kb, 5 kb, and 8–10 kb insert sizes) each sequenced to 20–40×genome coverage, fed to a De Bruijn short read genome assembler such as soap de-novo or allpaths LG. The final genome coverage might be 150×. polymorphism and repeat content of the species genome.GemCode single cell platform

10X Genomics原理:

通過將來自相同DNA片段(10-100kb)的reads加上相同的barcode,然後在illumina平臺上進行測序,從而實現長片段的測序。其基本原理是同一長片段的reads會具有同樣的標籤,稱為linked-reads,利用這些barcode的信息,可將短reads拼接為長reads。這樣的linked-reads可進行結構變異檢測及單倍型定相的分析。

從本質上看,10X Genomics只是將Illumina短片段變長,並沒有徹底消除Illumina平臺GC偏好的問題。歸根結底,是否進行PCR擴增是影響10X Genomics和PacBio長片段覆蓋度的主要因素。

Thus it is best to simply keep interesting symbiont information. Some amount of starvation prior to DNA isolation to allow the digestion of gut contents can be a good idea.(研究腸道細菌共生體時,要注意保留細菌微生物信息)

RAM and CPU is really important. Due to the new assemblers merge, assembly times have dramatically decreased over time.

It doesn’t count unless it is in a public database.

The potential to be scooped on publication of a new genome is low today, instead collaboration and data sharing help to understand the mass of data that is a genome and prevent duplication of work.

Maker, AUGUSTUS and 「Just Annotate My genome」 (http://jamg.sourceforge.net) to complete automated annotation.

Quality control using a set of known sequences such as BUSCO or CEGMA is key to measure progress.

Research group manual curation

The most common automated annotation errors are the erroneous merging or splitting of gene models.

The number of genes in a family of interest needs to be confirmed and can be particularly difficult in gene families with high sequence diversity such as olfactory receptors.

The focus of journals has rightly returned to biological analysis and better understanding of the life style and phenotype of the species of interest.

Time spent on incremental genome improvement freezes and new annotations is time not investigating biological questions.(注重生物學問題而不是基因組本身)

Chapter 2: Genome Size Estimation and Quantitative Cytogenetics in Insects

With care, it is possible using flow cytometry to create a precise and accurate estimate of the genome size of an insect.

Chapter 3: Isolation of High Molecular Weight DNA from Insects

For insects, the least complex tissue is the best. Late stage embryos contain hundreds-thousands of nuclei, but have not completed organogenesis. Newly hatched larvae or pupa may also work. Adults and late larva are usually too complex. If possible, remove any food substrates from sample prior to grinding.

Chapter 4: Long Range Sequencing and Validation of Insect Genome Assemblies1 Introduction

The prevalence of complex gene families like olfactory receptors and P450 genes, DNA polymorphismsand transposable elements in arthropod genomes further obfuscates resolution of the assembly using only short reads.

Genome assembly and validation workflow2 MaterialsMiniasm assembly

The Miniasm assembler does not correct reads before assembly unlike most other long read assemblers.

It takes as input an all-to-all alignment from Minimap and identifies all raw read overlaps. The mappings are then trimmed and modeled into an assembly graph before unitigs are determinedfrom paths through the graph.

We use Miniasm to rapidly generate candidate assemblies from uncorrected raw reads as well as corrected reads.

Canu assembly

(a) Self-correct the raw long Pacbio reads using the shorter Pacbio reads.

(b) Trim the corrected reads based on the overlap with other corrected reads so that erroneous regions of reads are excised.

(c) Finally assemble the contigs from the corrected and trimmed Pacbio reads.

Redundans duplication removal and scaffolding

Highly heterozygous insect genome assemblies may contain a large number of duplicates even after careful genome assembly. Redundans is a pipeline to remove duplicated contigs from an assembly, followed by iterative gap closing and scaffolding using paired-end, mate-pair or long reads.

Although the assembly process includes read correction during the assembly process, it is advisable to perform multiple rounds of error correction after the final assembly is created.

Pilon

Pilon uses high quality Illumina sequences to polish Pac-Bio and Oxford Nanopore long reads assemblies. Illumina reads are aligned to the genome assembly and supplied to Pilon as a BAM file . Illumina reads from unpaired and paired-end libraries can be used as input. The correct base at a position is inferred by consensus and the reference allele is corrected if required. Pilon can also fix small insertion anddeletion errors, fill gaps and perform local reassembly in regions where many errors are identified.

3 NotesPacBio requires at least a few micrograms of high molecular weight DNA before size selection. it is preferable to use individuals from a colony that has been inbred for many generations. Another choice is to use haploid individuals, if available.Memory and disk space requirements will vary according to genome size and specific tool. Minimum RAM required for most assemblers is 500 Gb–1 Tb.Coverage of the genome is another important factor that determines quality of assembly. De novo assemblies require 50–80X coverage in most cases. Canu can be optimized for a high coverage genome assembly.The high error rates have been offset by applying self-correction based on information from shorter reads for PacBio data sets.HaploMerger2 can be used to build haploid subassemblies from a heterozygous diploid genome assembly.Chapter 5 Using BUSCO to Assess Insect Genomic Resources

This chapter details the use of the Benchmarking Universal Single-Copy Orthologue (BUSCO) assessment tool to estimate the completeness of transcriptomes, genome assemblies, and annotated gene sets in terms of their expected gene content.

For transcriptomes the longest open reading frames are assessed, while for genome assessments, gene models are first built using ab initio gene prediction with AUGUSTUS for the potential matches identified using TBLASTN searches.Matches that meet the BUSCO HMM score cutoffs are classified as 「complete」 if their lengths fall within BUSCO profile length expectations, and if found more than once they are classified as 「duplicated」. Those that do not meet the length requirements are considered as partial matches and are classified as 「fragmented」, and BUSCOs without matches that pass the thresholds are classified as 「missing」.Insect genomes: progress and challengesmarkAbstract

In 2011, Robinson and colleagues proposed the 『i5k』 initiative to sequence the genomes of 5000 insects and other arthropods with important biological significance or economic value before 2017

At the time of preparing this paper, 1219 insect genome-sequencing projects have been registeredwith the National Center for Biotechnology Information (NCBI): 401 insect species have complete genome assemblies with varied quality; the genome annotations of 155 insects have been publicly released; and over 100 insect genomes have been published in peer-reviewed journals.

Insect genome assembly and annotationInsect genome assembly

目前,基因組的測序策略一般為whole genome shotgun strategy (WGS),它會產生大量短的片段,需要組裝。

組裝有兩種策略:de novo or mapping assembly

De novo genome assembly depends entirely on overlapping information between the reads.

three de novo assembly algorithms:

The first category is based on overlap/layout/-consensus between long sequences

CABOG , NEWBLER, SHORTY, EDENA , CELERA這些軟體適合組裝中等長度的reads,比如Sanger測序或者三代長reads,不適合二代Illumina短reads

The second category of assembler uses De Bruijn graph algorithms, which are well suited for short reads produced by second-generation sequencing techniques such as the Illumina sequencing platform.

包括SOAPDENOVO, EULER VELVET,and WTDBG等軟體(計算不同的K-mers size是耗時步驟).

The third category includes software implementing greedy graph algorithms.

SSAKE, SHARCGS and VCAKE. Many published insect genomes were assembled by CABOG , SOAPdenovo, ALLPATH-LGor ABYSS.

Scaffolding method that can assist assembly: Hi-C technology. It is a sequencing-based approach for determining how a genome is folded by measuring the frequency of contact between pairs of loci.

It can assist genome assembly to the chromosome level without additional genetic map information.Though it does not generate or improve existing contigs, this technology is useful for obtaining information with chromosome-length scaffolds.

mapping assembly first determines the position of reads relative to the reference genome and then assembles the reads into contigs or scaffolds.

影響基因組組裝質量的因素:

repetitive sequences and heterozygosity

A large number of repetitive sequences in the genome can cause substantial ambiguity in the processof assembling contigs and scaffolds.

Heterozygosity, or allelic variation in the sequenced individual(s), also greatly complicates the problem of genome assembly.

inbred homozygous individuals (or haploid males in the case of Hymenoptera).

PLATANUS and REDUNDANS, are reported to improve the assembly quality of heterozygous genomes.

novel 『long read』 sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore,are also contributing to major advances in the quality of genome assembly. some assembly programssuch as FALCON , CANU and WTDBG.

評估draft genome質量(需要多種方法評估基因組組裝質量)

The size of the genome assembly

whole body without intestinal tract and ovarium

estimates of genome size: flow cytometry or K-mer analysis.

If the assembly is smaller than expected, it is likely because it is incomplete or due to repeat collapse.

If the assembly is larger than expected, this often reflects the fact that independent assembly of haplotypes has resulted in redundancies.

The correctness of the genome assembly

previously assembled from independent data (eg BAC sequencing), these can be used to evaluate the correctness of the assembled genome.

the congruence of paired-end or mate-pair reads when mapped to the assembly can inform the quality of genome assembly.

the distance between the mapped reads should be consistent with the insertion size when constructing the library. This congruence mapping approach has been implemented in software such as QUAST and REAPR.

all scaffolds or contigs are sorted from longest to shortest, then the sequence lengths of each scaffold or contig are sequentially summed. When the accumulated value reaches half of the total assembly length, the length of the corresponding scaffold or contig is defined as the N50.

in some special cases, high N50 might be due to aggressive misassembly, which is worthy of further research.

The Core Eukaryotic Genes Mapping Approach (CEGMA) pipeline is one widely used method for implementing this approach (Parra et al., 2007). CEGMA identifies 458 core genesthat are highly conserved in eukaryotes and searches for these genes in the assembled scaffolds(May 18th 2015 - CEGMA is no longer being supported).

Benchmarking Universal Single-Copy Orthologs (BUSCO), BUSCO uses a set of 2647 1 : 1 orthologous genes for arthropods. The quality of genome assembly is reflected by the percentage of these orthologous genes that can be found in the assembled scaffolds.

Genome annotation

structural annotation and functional annotation

Structural annotation comes first, identifying which regions of the assembly correspond to specific features, such as genes (including intron–exon boundaries) and transposable elements (TEs).

Once the structural features are delineated, functional annotation aims to infer the function and identity of genes and other elements, based on sequence similarities.

Identifying repeat sequence

homology searching

Homology searching identifies homologous repeat sequences based on sequence similarity. The software REPEATMASKER is widely used for this task, in conjunction with the RepBase collection of characterized TEs.

ab initio prediction

The ab initio prediction method uses structural features of the repetitive sequence to identify novelrepeat sequences. This method has great advantages in predicting repetitive sequence with distinct structural features, such as miniature inverted-repeat TEs and long terminal repeats.

RECON, PILER , REPEATSCOUT (Price et al., 2005), LTR-FINDER and REPEATMODELER .

Identification of noncoding RNA

Noncoding RNA is a class of RNA genes that do not produce protein products such as transfer RNA(tRNA), ribosomal RNA, piwi-interacting RNA, micro- RNA (miRNA), small nucleolar RNA, or repeat associatedsmall interfering RNA.

Software: MIRDEEP, TRNASCAN, INFERNAL and RNASTRUCTURE.

Database: RNAdb, NONCODE, Rfam, miRBase and snoRNABase.

Prediction of protein-coding genes

(1) identifying homologues of known protein coding genes through sequence similarity.

(2) de novo predicting the protein-coding genes with software developed via machine learning of protein-coding gene structures.

(3) determining the exonic regions by direct transcriptome sequencing [eg RNA sequencing (RNAseq)or expressed sequence tags (ESTs)] and aligning to the assembled scaffolds.

strengths and weaknesses

the protein-coding genes found by homologue searching are typically robust to false positives, but only known protein genes can be found.

De novo prediction can find more candidates but may have high false-positive rates.

Expression evidence by RNA-seq data is typically the most definitive approach, but relies heavily on the quality and quantity of the transcriptome and the samples chosen for RNA-seq.

AUGUSTUS, EVIDENCEMODELER, GLEAN, EVIGAN, MAKER, JIGSAW and EVIGENE/EVIDENTIALGENE, OMIGA and NCBI’s eukaryotic genome annotation pipeline.

A chromosome‐level genome assembly reveals the genetic basis of cold tolerance in a notorious rice insect pest, Chilo suppressalismarkAbstract

二化螟是一種全球性危害水稻的害蟲,本研究通過結合 Illumina 以及 PacBio 測序技術.得到二化螟基因組大小為 824.35 Mb, contig N50 為 307 kb,scaffold N50 為 1.75 Mb. 通過Hi-C scaffold1ing將99.2%的鹼基分配到29條染色體上,注釋到了15,653 蛋白編碼基因。

通過特定基因家族的進化分析,從基因組的角度闡明了二化螟抗寒性(cold tolerance)的分子機制。具體包括以下代謝通路:glucose‐originated glycerol biosynthesis, triacylglycerol‐originated glycerol biosynthesis, fatty acid synthesis and trehalose transport‐intermediate cold tolerance.

INTRODUCTION

昆蟲主要通過兩種方式越冬:一種是遷徙,逃離寒冷區域;另外一種是昆蟲自身的抗寒機制。

對於後者同樣有兩種機制:Freeze avoidance and Freeze tolerance

Freeze avoidance involves the accumulation of extremely high levels of cryoprotectants (chemicals thatprevent/minimize freezing damage at the cellular level) and the synthesis of antifreeze proteins (AFPs).

Freeze tolerance involves specific ice‐nucleating agents or proteins (INAs or INPs) that trigger the freezing of extracellular water, and cryoprotectants that protect the cell membranes of desiccated cells by maintaining a liquid intracellular space.

Cryoprotective dehydration and vitrification (extreme dehydration) have also been proposed as insect winter survival strategies.

C. suppressalis overwintering strategy involves the accumulation of glycerol, the most common insect cryoprotectant.

Enzyme activities associated with glycerol biosynthesis in C. suppressalis have also been reported to be dependent on seasonal changes.

C. suppressalis draft genome (sequenced solely using the Illumina Solexa platform) was reported, the assembly was hampered by high heterozygosity and its contig N50 was only 5.2 kb.

RNA sequencing (RNA‐seq) analysis of larval responses to low temperatures under diapause (DA)/nondiapause (ND) and cold acclimation (CA)/noncold acclimation (NC).

MATERIALS AND METHODSInsects

Pupae were used to obtain DNA samples for both Illumina and PacBio sequencing, and newly hatched larvae were used to obtain DNA samples for Hi‐C sequencing.

Genome sequencing

Illumina sequencing

Illumina pair-end and mate-paired sequencing was performed

Pacific Biosciences SMRT sequencing

One library with insert sizes of approximately 5,000-bp was constructed for the PacBio RS II sequencing system.

K-mer analysis and estimation of the genome sizeK-mer counts were determined using on the pair-end sequencing data (insert size of the library <2 kb) with Jellyfish and a K-mer setting = 23. a predicted genome size was 788,873,227 bp with 121-fold coverage and a heterozygosity of 0.75%

Genome assemblyAssembly polishing and filteringQuality assessmentde novo assemblyHi‐C scaffolding

30 μg DNA; Newly hatched larvae

Unmapped paired‐end reads, singleton reads, multiple mapped reads and PCR duplication were  filtered by an optimized and flexible pipeline (HiC‐Pro).

lachesis was used to cluster the scaffold into 29 groups according to the agglomerative hierarchical clustering algorithm with default parameters.

Genome annotation

Protein coding genes were predicted based on three approaches: ab initio prediction, homology-based prediction, and transcript-based prediction

(1) ab initio prediction. Augustus and GeneMark-ES were used for ab initio gene prediction.GeneMark-ES is a self-training algorithm that has been used for gene identification of novel eukaryotic genomes. In order to improve the accuracy of de novo training, Augustus (parameters: --strand = both --species = heliconius_melpomene1 --extrinsicCfgFile = extrinsic.E.XNT.cfg --alternatives-from-evidence = true --hintsfile = hints.gff --allow_hinted_splicesites = atac) was trained using homologous protein sequences from Bombyx mori and Amyelois transitella along with RNA-Seq data.(2) Homology-based prediction. The annotated gene sets from ten species, Drosophila melanogaster, Drosophila pseudoobscura, Aedes aegypti, Anopheles gambiae, Bombyx mori, Amyelois transitella, Danaus plexippus, Papilio machaon, Helicoverpa armigera, and Spodoptera litura were aligned to the C.suppressalis genome utilizing exonerate  protein2genome with a 「--percent 50」 parameter to identify exact intron/exon positions.(3) Transcript-based prediction. Hisat2  was used to align the transcriptomic data to the genome and gene information was predicted using Stringtie. The PASA annotation pipeline, which uses GMAP and BLAT  to align Trinity transcript assemblies to the genome. The PASA alignment assemblies can then be used to automatically extract protein coding regions for use in TransDecoder.

The combined gene structures from the three approaches were integrated via EvidenceModeler and filtered for sequences that lacked homologs in the other species or were absent from the RNA-Seq data

(1) First, a search of regions 2 Kb upstream and downstream of the 15,658 genes found that the vast majority of genes (14,722, 94.02%) lacked ambiguous bases (Ns), indicating that these gene models are not located near an assembly gap and thus the gene models are unlikely to be fragments.(2) Second, we used BUSCO software to assess the completeness of genome annotations. The presence of95.5% of the single copy BUSCO genes suggests that the predicted genes are complete.

Annotation of repetitive elementsTo identify repeat elements in the C. suppressalis genome, a de novo self-specificity repeat library was constructed using RepeatModeler with RECON , RepeatScout , TRF , and NSEG. Then an ab initio repeat library was generated that combined RepeatMasker and the Repbase database, following the suggested parameter values.

Functional annotationTo obtain putative functions of the annotated genes, protein sequences were aligned to the NCBI NR , NT, and TrEMBL databases with an E-value cutoff of 1E-10. We also annotated Motifs and domains using InterProScan with publicly available databases including Gene3D, PRINTS, Pfam, CDD, SMART, MobiDBLite and PROSITE. Descriptions of gene function, which were retrieved from the InterProScan results, were classified with Gene Ontology .

RNA‐seq for larval cold‐response

In both treatments**(DA and ND)**, the fifth‐instar larvae were again divided into two groups, respectively: (a) to simulate seasonal cooling, CA larvae were generated by gradually decreasing temperatures 5°C per day over a 5‐day period from the beginning of the final instar to a final temperature of 5°C; (b) NC larvae were maintained at 25°C and collected on the second day of the final instar.

All CA larvae (DA‐CA and ND‐CA) were then exposed to cold shock at 0 or −5°C for 2 hr. NC larvae (DA‐NC and ND‐NC) were likewise exposed to cold shock at 5, 0 or −5°C for 2 hr.

DA = diapause; ND = non-diapause; CA = cold training; NC =  non-cold training

RNA‐seq analysis

Quality control and alignment: Trimmomatic to remove adapter and low‐quality sequences, and then aligned to the C. suppressalis genome generated above using hisat2.

Quantification: The total number of genome‐mapped reads to gene was quantified using htseq2 (‐t exon, ‐i gene_id, mode = union) with the annotated C. suppressalis GFF file.

Differential expression: The mapped reads count was used to test differential expression with Deseq2. Genes with an adjusted p value <.05 and |log2 ratio| ≥ 1 in at least one temperature condition were considered to be differentially expressed.

Orthology and evolution

Protein sequence data sets from 10 species.

Redundant alternative splicing events were filtered to generate a single transcript for each protein set,and then all‐against‐all protein comparisons were performed using blastp with a cutoff of E < 10−5.

High‐scoring segment pair (HSP) segments between the same pair of proteins were processed using orthomcl, which was followed by homology identification among protein sequences based on bit‐scores to identify final orthologues, inparalogues and co‐orthologues.

直系同源基因是通過物種形成事件產生的; 而旁系同源基因是通過基因重複事件產生的。

如果在物種形成事件之後發生基因重複事件, 那麼, 就會出現一對多或者多對多的直系同源關係, 稱其為共同直系同源基因 (co-orthologues)。所以在兩個物種的基因組中, 最相似的序列僅僅意味著它們可能是直系同源基因, 但是並不能充分證明它們肯定就是直系同源基因。

物種外旁系同源基因(outparalogues) 指由發生在物種形成之前的基因重複事件產生的旁系同源基因;物種內旁系同源基因(inparalogues) 指由發生在物種形成之後的基因重複事件產生的旁系同源基因,有助於我們區分更古老的旁系同源基因和近期產生的旁系同源基因。

To estimate species phylogeny, protein sequences for 500 single copy conserved orthologues were selected. Multiple protein sequence alignments for each orthologous group were then performed using muscle and the conserved blocks were extracted using gblocks. Conserved protein blocks with single copies in all species were concatenated to 11 super genes of 199,150 amino acids and used to construct maximum likelihood phylogenetic trees with the JTT model in phyml. Statistical support was obtained with 100 bootstrap replicates.

mcmctree (paml 4.8 package) with parameters 「‐clock 2 ‐alpha 0.5 ‐model 3」 and known time divergencedata in timetree (http://www.timetree.org/) were used to estimate divergence times among species.

To examine gene family expansion and contraction among species, we used cafe  with p < .05 and "‐s" to automatically search for the birth and death parameter (λ) of genes.

The gene family results from the orthomcl pipeline and the estimated divergence times between species were used as inputs.

RESULTS AND DISCUSSIONGenome assemblyKaryotype and chromosome synteny

Hi‐C long‐range scaffolding to obtain chromosomal information. Hi‐C linking information anchored, ordered and oriented 25,944 scaffolds to 29 chromosomes by lachesis with default parameters.

The map indicates that intra-chromosome interactions were strong while inter-chromosome interactions were weak.

Synteny analysis results for C. suppressalis suggest two fusion events: Chr1 arising from fusion of T. ni Chr0, Chr13 and Chr14, and Chr2 from fusion of T. ni Chr1, Chr26 and some segments of Chr29.

The phylogenetic relationship of Chilo suppressalis was estimated using a maximum likelihood analysis of a concatenation of 500 single‐copy orthologous protein sequences over 100 bootstrap replicates.

Species phylogenetic tree and gene orthology

1:1:1 indicates absolute single‐copy genes (absence or duplication in a single genome is not tolerated); N:N:N indicates multicopy gene paralogues found across all 11 species (absence of one in a single genome is not tolerated); SS indicates species‐specific genes (excluded from other species genomes); SD indicates multiple copies of species‐specific genes; ND indicates species‐specific genes in single copies.

Genomic basis for cold tolerance

two α‐amylases(α‐澱粉酶), which catalyse the catabolism of long‐chain carbohydrates to yield maltose, glucose or 「limit dextrin」(極限糊精) from amylopectin(支鏈澱粉).

a single β‐1,3‐glucanase(葡聚糖酶) that breaks down β‐1,3‐glucans(葡聚糖)  to yield glucose or polysaccharides(多糖) made of multiple glucose subunits;and a maltase that catalyses the hydrolysis of maltose to glucose.

fructose‐bisphosphate aldolase(FBA) (二磷酸果糖酶) is essential for producing glycerol biosynthetic precursors from glucose by catalysing the reversible reaction that splits fructose‐1,6‐bisphosphate(F1,6P2) into the triose phosphates dihydroxyacetone phosphate (DHAP)(磷酸二羥基丙酮) and glyceraldehyde phosphate (GAP)(磷酸甘油醛).

two glycerol biosynthesis‐related gene homologues: glycerol‐3‐phosphate dehydrogenase (G3PDH)(三磷酸甘油醛脫氫酶), which catalyses the reversible redox conversion(可逆氧化還原反應) of DHAP to glycerol 3‐phosphate (G3P)(甘油三磷酸); and the glycerol kinase (GK)(甘油激酶) that catalyses the reverse reaction of G3Pase to generate G3P by transferring a phosphate from ATP to glycerol.

G3PDH transcription is dependent on cold acclimation more than diapause state, and that the GK gene is activated in response to early cold

Some lipases catalyse the hydrolysis of glycerol esters to yield free fatty acids and glycerol.

glycerol from triacylglycerols requires cold acclimation and a state of diapause similar to that observed for glucose‐dependent glycerol biosynthesis.

one of the most highly expressed transcripts in DA‐CA larvae was a putative fatty acid synthase (FAS). FAS catalyses the synthesis of the palmitoleic acid precursor, palmitic acid(軟脂酸).

trehalose metabolic pathway‐related include a putative trehalose transporter (TRET) orthologue.

Trehalase(TREH) catalyses the conversion of trehalose back to glucose ; given that energy storage is critical for survival in cold environments.

heat shock protein (Hsp), transferrin(轉鐵蛋白), catalase(過氧化氫酶), arylphorin(芳基貯存蛋白) and methionine‐rich storage protein(蛋氨酸豐富貯存蛋白). All were significantly upregulated at low temperatures.

下期再見

Reference

Brown, S.J., Pfrender, M.E. (Eds.), 2019. Insect Genomics: Methods and Protocols, Methods in Molecular Biology. Springer New York, New York, NY. https://doi.org/10.1007/978-1-4939-8775-7

Li, F., Zhao, X., Li, M., He, K., Huang, C., Zhou, Y., Li, Z., Walters, J.R., 2019. Insect genomes: progress and challenges. Insect Mol. Biol. 28, 739–758. https://doi.org/10/ggsjwv

Ma, W., Zhao, X., Yin, C., Jiang, F., Du, X., Chen, T., Zhang, Q., Qiu, L., Xu, H., Joe Hull, J., Li, G., Sung, W., Li, F., Lin, Y., 2020. A chromosome‐level genome assembly reveals the genetic basis of cold tolerance in a notorious rice insect pest, Chilo suppressalis. Mol Ecol Resour 20, 268–282. https://doi.org/10/ggn4kk

相關焦點

  • 最新MP發文綜述"植物基因組三維結構研究的現狀與展望"
    Mol Plant | 華中農業大學李興旺/李國亮發表"植物基因組三維結構研究的現狀與展望"綜述文章該文概述了植物基因組的三維組織結構特徵,植物基因組在應對環境變化時三維結構的改變以及不同二倍體和四倍體雜種中等位基因三維結構的差異,介紹了染色質相關RNA參與的DNA-RNA交互作為基因組三維結構的新的研究層面在植物基因組三維結構中的重要作用,探討了「液-液相分離」在基因組三維結構的形成中的潛在功能意義
  • 綜述| 基於二代測序對癌症基因組異質性的研究推動癌症的精準治療
    耶魯大學遺傳系張嘉玲博士(Jialing Zhang)和南方醫科大學教授、耶魯大學顧問潘星華博士(Xinghua Pan)等在PCM第一期發表綜述:基於二代測序對癌症基因組異質性的研究推動癌症的精準治療(Characterization of cancer genomic heterogeneity by next-generation
  • NEJM綜述告訴你,如何在癌症治療中應用基因組測序
    二代測序,又稱「高通量測序」,可以同時對患者的腫瘤組織樣本和正常組織樣本(通常是血液)進行測序,從而識別出腫瘤中的基因變異。基因測序在腫瘤醫學中的臨床實踐最初只針對大約300-600個已知的致癌基因,或只測序編碼蛋白質的基因(通常佔整個基因組的大約1%,也稱為」全外顯子組測序「)。這種只對基因組中特定重要區域進行測序分析的優點是成本比較低。
  • 人才強校 | 彩萬志教授課題組在昆蟲線粒體基因組裂化的進化研究中...
    線粒體是一種普遍存在於真核細胞內的半自主性細胞器,為細胞的活動提供能量,還參與細胞分化、細胞信息傳遞和細胞凋亡等過程,擁有自身的遺傳物質—線粒體基因組。其起源與進化一直是近年來生物學研究的熱點問題之一。
  • 綜述:新型冠狀病毒的基因組、結構、受體和起源 | VS推薦
    本文就SARS-CoV-2的基因組、結構、受體和起源等方面的研究現狀進行了系統綜述。
  • Plants | 中科院邱金龍研究員和高彩霞研究員發表植物基因組編輯進展和前景的綜述文章
    2017年7月31日,《Nature Plants》雜誌在線發表中科院微生物研究所植物基因組學國家重點實驗室
  • 枯葉蛾科昆蟲馬尾松毛蟲基因組首次解析
    這是枯葉蛾科昆蟲的首次基因組解析,將為馬尾松毛蟲和其他枯葉蛾科昆蟲的功能和進化研究提供重要依據。馬尾松毛蟲是我國發生範圍最廣、危害面積最大的針葉林食葉害蟲,其幼蟲取食松針。蟲害爆發期間連片松林在數日內被蠶食精光,遠看枯黃、焦黑,如同火燒一般,被稱為「不冒煙的森林火災」。但從松毛蟲內部分子機理探索其成災機理目前還鮮有報導。
  • 基因獨特的昆蟲「南極蠓」
    原標題:基因獨特的昆蟲「南極蠓」   在寒冷的南極,竟然也有一種不會飛的蚊子「南極蠓」。科學家最新報告稱,他們對「南極蠓」進行了基因測序,發現其基因組規模極小,這可能就是「南極蠓」能在南極生存的一個秘訣。   「南極蠓」是南極大陸上唯一一種真正意義上的昆蟲,也是南極大陸特有的物種。
  • 碧蛾蠟蟬線粒體基因組全序列首次測定
    日前,中科院動物研究所梁愛萍研究組首次獲得了蠟蟬總科昆蟲——碧蛾蠟蟬Geisha distinctissima (Hemiptera: Flatidae)的線粒體基因組全序列。結果顯示:碧蛾蠟蟬線粒體基因組是一個15,971 bp 的閉合環狀分子,整個基因組的A+T含量為75.1%,其基因內容、基因順序和基因結構都與果蠅Drosophila yakuba相同。所有13個蛋白質編碼基因均以ATR或者ATT作為起始密碼子。9個蛋白質編碼基因利用典型的TAA或TAG作為終止密碼子,其餘四個蛋白質編碼基因(cox1, atp6, cox3, 和nad4)使用不完全終止密碼子T。
  • 微生物組學數據分析工具綜述 | 16S+宏基因組+宏病毒組+宏轉錄組
    宏轉錄組的好處是,跳出了DNA層面的束縛,可以獲得實時活躍的、真正對群落有貢獻的基因和通路,然而mRNA不如DNA穩定,此外多純化和擴增的步驟也可能引入錯誤。表1 三種技術的選擇策略在17年發表於Briefings in Bioinformatics的一篇題為《A review of methods and databases for metagenomic classification and assembly》的綜述中,也有很多可參考的思路和軟體匯總。
  • ...Product Reports》綜述真菌沉默基因簇激活策略研究進展
    通過對真菌基因組進行生物信息學分析發現,超過90%次級代謝生物合成的基因簇在常規培養條件下是「沉默」的,表明基因組中隱藏著大量的結構新穎的化合物寶藏。因此,基於海量的真菌基因組數據,如何挖掘發現真菌中新的活性天然產物便成為近年來國內外研究的重點和熱點。
  • 蝴蝶基因組 大小差異高達6.4倍之多
    記者近日從中科院昆明動物研究所獲悉,該所進化基因組學與基因起源研究組在進行蝴蝶基因組大小進化方面的研究時,發現不同科屬之間,竟存在很大差異。 此前,昆明動物研究所李學燕副研究員帶領的昆蟲研究團隊在2015年完成所有蝴蝶模式種金鳳蝶及其近緣種柑橘鳳蝶兩種鳳蝶基因組的基因編輯,並以蝴蝶為例首次實現野生昆蟲的基因編輯。
  • 高彩霞組綜述CRISPR/Cas在農業和植物生物技術中的應用
    Nat Rev(IF 55.4)高彩霞組綜述CRISPR/Cas在農業和植物生物技術中的應用,值得收藏來源: 潛心者 iPlants 能夠連續在兩大權威頂尖綜述雜誌上發表關於基因編輯在農業育種中應用的綜述文章,也從另一個方面說明了該課題在植物基因編輯領域處在世界領先水平。
  • 何勝洋院士等綜述植物韌皮部-昆蟲-病原菌相互作用研究的挑戰與問題!
    2019年11月11日,PNAS雜誌在線發表了來自中科院西雙版納熱帶植物園姜豔娟組和美國科學院院士何勝洋組聯合通訊
  • Science:利用單細胞基因組學進行人類細胞表型分析
    2019年10月20日訊/生物谷BIOON/---在一篇近期發表在Science期刊上的標題為「Mapping human cell phenotypes to genotypes with single-cell genomics」的綜述類型文章中,瑞士研究人員認為在了解構成人體的細胞表型和人類基因組如何被用來構建和維持每個細胞的目標中
  • 系統發育基因組學 讓跳蚤認祖歸宗
    本文轉自【科技日報】;科技日報訊 (記者張曄)蚤目(俗稱跳蚤)是一種令人討厭的生物,但是科學家對它卻非常感興趣,因為跳蚤與其它昆蟲目之間的親緣關係,一直是昆蟲系統學領域的未解之謎。12月25日,記者從中國科學院南京地質古生物研究所獲悉,該所與英國科研團隊合作,通過對開源組學數據的挖掘和深入系統發育基因組學分析,揭示跳蚤是一類特化的蠍蛉,相關研究成果發表於《古昆蟲學》。 跳蚤屬於節肢動物門、昆蟲綱,是一類體型側扁,外形高度特化,並嚴格以吸血為生的全變態昆蟲。跳蚤體型很小,通常體長1—3毫米。它是外寄生昆蟲,吸食哺乳動物(包括人類)和鳥類的血液。
  • 2018植物病理學年評:綜述精選(1)
    2018年的年評已出,在此精選3篇貼合植物病毒與植物免疫領域的綜述文章,作為國慶特別節目。該綜述強調了CaMV P6的模塊化結構和功能多樣性,並說明了其與其他成熟的病原體效應物的相似性。科學家們正在利用它來研究植物 - 微生物和蛋白質 - 蛋白質的相互作用,它是激活質粒編碼病毒,用病毒誘導的基因沉默(VIGS)篩選基因功能以及農桿菌浸潤葉片瞬時表達基因的理想物種。然而,關於該物種的起源,多樣性,遺傳學和基因組學的信息很少,生物學家們好奇的是,本氏菸草是否是第二小提琴或藝術家(second fiddle or virtuoso)。
  • 西南大學完成桑樹基因組測序
    原標題:西南大學完成桑樹基因組測序  本報訊 (記者 李星婷 實習生 王濛昀)數千年來,家蠶只以桑葉作為唯一的食物。二者之間有什麼必然的聯繫?9月23日,西南大學家蠶基因組生物學國家重點實驗室發布桑樹全基因組測序成果,成功分析出桑樹的染色體基數,以及各種基因的網絡和功能,為推動傳統蠶桑業的轉型、建立現代桑樹學奠定了科研基礎。
  • 昆蟲的翅膀可能是從甲殼綱動物的腿進化而來的
    但是兩篇關於有翼昆蟲的遠親的新論文表明,它們的腿是更好的搭配。 第一項研究敲除了夏威夷蝦類鸚鵡身上的某些基因,發現了一種類似於昆蟲翅膀的基因網絡,這種甲殼類動物的體甲和離身體最近的腿部部分都有類似的基因網絡。這表明,它們都以某種方式擠過了體壁,然後又折出來,形成了翅膀。
  • 美史密森學會成立生物多樣性基因組學研究所
    原標題:美史密森學會成立生物多樣性基因組學研究所   HURLBERT   緊隨兩項針對鳥類和昆蟲的大型基因組學分析研究之後,位於美國華盛頓特區的史密森學會近日宣布將成立一個生物多樣基因組學實體研究機構,以加速獲得與編纂地球上所有植物和動物的DNA。   史密森學會每年已投入1000萬美元用於進化生物學、生物多樣性、物種保護以及生態研究等方面的基因組學研究。