最近看了幾篇昆蟲基因組方面的文章,作個筆記,正好下周一組會講。
Insect Genomics Methods and ProtocolsChapter 1:Arthropod Genome Sequencing and Assembly StrategiesKey words :Genome project strategy, Genome assembly, Genome sequencing, Genomics, Insectgenomes, Oxford nanopore, Pacific biosciences, 10×Genomics
1 Introductionwe have far to go for the remainder of the greater than 1.5 million described species on planet Earth. Assequencing costs have decreased, researchers focusing on non-model organisms are jumping into genome projects themselves.
(路漫漫其修遠兮,吾將上下而求索)
2 Major Steps in a Genome ProjectAppropriate DNA isolation and DNA sequencing to feed the decided genome assembly choice.Genome assembly, annotation, and submission to public databases.Analysis and possible publication.==Strategy Choice.==
genome size, which affects final cost, genome polymorphism, which affects the probability of high quality assembly or requires methods that address this, and individual size, which determines the minimum number of individuals and thus different haplotypes required for DNA isolation and sequencing.
舉例子:
一般哺乳動物基因組大小為2.5–3.5 Gb,通常有很低的多態性(1 SNP per 1000 bp in humans—below the sequencing error rate),而且從單一個體血液中可以獲取足夠量的DNA。鳥類的基因組大小一般為~1–1.5 Gb,很容易測序和組裝。爬行類和兩棲類動物的基因組很大,測序的很少。海膽或其他海洋生物有很高的多態性(海膽–2% SNP),所以它們容易產生短的contigs 以及 scaffold gaps,這就容易導致注釋出現不全或錯誤。
昆蟲的體型小,測序時可能需要多個體混合測序,種群數量大意味著個體間有很高的genome sequence polymorphism。而傳統的測序組裝方式很可能會產生 low contiguity (i.e., contig N50s) and more gaps 產生 poor gene model annotation。
DNA isolation and sequencing10X Genomics assembly strategy:10-100Kb, 55–60×fold genome coverage with 150 bp paired Illumina reads, and assembles using the company’s Supernova assembly softwarePB long read genome assembly strategy: 20–30 kb, 50–70×fold genome coverage using a PB single Canu by first error correcting the longest reads using shorter PB reads, and then using 15–25×error corrected long read data for the final assembly.Illumina De Bruijn genome assembly strategy: multiple sized insert Illumina paired end libraries (e.g., 500 bp, 2 kb, 5 kb, and 8–10 kb insert sizes) each sequenced to 20–40×genome coverage, fed to a De Bruijn short read genome assembler such as soap de-novo or allpaths LG. The final genome coverage might be 150×. polymorphism and repeat content of the species genome.GemCode single cell platform10X Genomics原理:
通過將來自相同DNA片段(10-100kb)的reads加上相同的barcode,然後在illumina平臺上進行測序,從而實現長片段的測序。其基本原理是同一長片段的reads會具有同樣的標籤,稱為linked-reads,利用這些barcode的信息,可將短reads拼接為長reads。這樣的linked-reads可進行結構變異檢測及單倍型定相的分析。
從本質上看,10X Genomics只是將Illumina短片段變長,並沒有徹底消除Illumina平臺GC偏好的問題。歸根結底,是否進行PCR擴增是影響10X Genomics和PacBio長片段覆蓋度的主要因素。
Thus it is best to simply keep interesting symbiont information. Some amount of starvation prior to DNA isolation to allow the digestion of gut contents can be a good idea.(研究腸道細菌共生體時,要注意保留細菌微生物信息)
RAM and CPU is really important. Due to the new assemblers merge, assembly times have dramatically decreased over time.
It doesn’t count unless it is in a public database.The potential to be scooped on publication of a new genome is low today, instead collaboration and data sharing help to understand the mass of data that is a genome and prevent duplication of work.
Maker, AUGUSTUS and 「Just Annotate My genome」 (http://jamg.sourceforge.net) to complete automated annotation.
Quality control using a set of known sequences such as BUSCO or CEGMA is key to measure progress.
Research group manual curationThe most common automated annotation errors are the erroneous merging or splitting of gene models.
The number of genes in a family of interest needs to be confirmed and can be particularly difficult in gene families with high sequence diversity such as olfactory receptors.
The focus of journals has rightly returned to biological analysis and better understanding of the life style and phenotype of the species of interest.
Time spent on incremental genome improvement freezes and new annotations is time not investigating biological questions.(注重生物學問題而不是基因組本身)
Chapter 2: Genome Size Estimation and Quantitative Cytogenetics in InsectsWith care, it is possible using flow cytometry to create a precise and accurate estimate of the genome size of an insect.
Chapter 3: Isolation of High Molecular Weight DNA from InsectsFor insects, the least complex tissue is the best. Late stage embryos contain hundreds-thousands of nuclei, but have not completed organogenesis. Newly hatched larvae or pupa may also work. Adults and late larva are usually too complex. If possible, remove any food substrates from sample prior to grinding.
Chapter 4: Long Range Sequencing and Validation of Insect Genome Assemblies1 IntroductionThe prevalence of complex gene families like olfactory receptors and P450 genes, DNA polymorphismsand transposable elements in arthropod genomes further obfuscates resolution of the assembly using only short reads.
Genome assembly and validation workflow2 MaterialsMiniasm assemblyThe Miniasm assembler does not correct reads before assembly unlike most other long read assemblers.
It takes as input an all-to-all alignment from Minimap and identifies all raw read overlaps. The mappings are then trimmed and modeled into an assembly graph before unitigs are determinedfrom paths through the graph.
We use Miniasm to rapidly generate candidate assemblies from uncorrected raw reads as well as corrected reads.
Canu assembly(a) Self-correct the raw long Pacbio reads using the shorter Pacbio reads.
(b) Trim the corrected reads based on the overlap with other corrected reads so that erroneous regions of reads are excised.
(c) Finally assemble the contigs from the corrected and trimmed Pacbio reads.
Redundans duplication removal and scaffoldingHighly heterozygous insect genome assemblies may contain a large number of duplicates even after careful genome assembly. Redundans is a pipeline to remove duplicated contigs from an assembly, followed by iterative gap closing and scaffolding using paired-end, mate-pair or long reads.
Although the assembly process includes read correction during the assembly process, it is advisable to perform multiple rounds of error correction after the final assembly is created.
PilonPilon uses high quality Illumina sequences to polish Pac-Bio and Oxford Nanopore long reads assemblies. Illumina reads are aligned to the genome assembly and supplied to Pilon as a BAM file . Illumina reads from unpaired and paired-end libraries can be used as input. The correct base at a position is inferred by consensus and the reference allele is corrected if required. Pilon can also fix small insertion anddeletion errors, fill gaps and perform local reassembly in regions where many errors are identified.
3 NotesPacBio requires at least a few micrograms of high molecular weight DNA before size selection. it is preferable to use individuals from a colony that has been inbred for many generations. Another choice is to use haploid individuals, if available.Memory and disk space requirements will vary according to genome size and specific tool. Minimum RAM required for most assemblers is 500 Gb–1 Tb.Coverage of the genome is another important factor that determines quality of assembly. De novo assemblies require 50–80X coverage in most cases. Canu can be optimized for a high coverage genome assembly.The high error rates have been offset by applying self-correction based on information from shorter reads for PacBio data sets.HaploMerger2 can be used to build haploid subassemblies from a heterozygous diploid genome assembly.Chapter 5 Using BUSCO to Assess Insect Genomic ResourcesThis chapter details the use of the Benchmarking Universal Single-Copy Orthologue (BUSCO) assessment tool to estimate the completeness of transcriptomes, genome assemblies, and annotated gene sets in terms of their expected gene content.
For transcriptomes the longest open reading frames are assessed, while for genome assessments, gene models are first built using ab initio gene prediction with AUGUSTUS for the potential matches identified using TBLASTN searches.Matches that meet the BUSCO HMM score cutoffs are classified as 「complete」 if their lengths fall within BUSCO profile length expectations, and if found more than once they are classified as 「duplicated」. Those that do not meet the length requirements are considered as partial matches and are classified as 「fragmented」, and BUSCOs without matches that pass the thresholds are classified as 「missing」.Insect genomes: progress and challengesmarkAbstractIn 2011, Robinson and colleagues proposed the 『i5k』 initiative to sequence the genomes of 5000 insects and other arthropods with important biological significance or economic value before 2017
At the time of preparing this paper, 1219 insect genome-sequencing projects have been registeredwith the National Center for Biotechnology Information (NCBI): 401 insect species have complete genome assemblies with varied quality; the genome annotations of 155 insects have been publicly released; and over 100 insect genomes have been published in peer-reviewed journals.
Insect genome assembly and annotationInsect genome assembly目前,基因組的測序策略一般為whole genome shotgun strategy (WGS),它會產生大量短的片段,需要組裝。
組裝有兩種策略:de novo or mapping assembly
De novo genome assembly depends entirely on overlapping information between the reads.three de novo assembly algorithms:
The first category is based on overlap/layout/-consensus between long sequencesCABOG , NEWBLER, SHORTY, EDENA , CELERA這些軟體適合組裝中等長度的reads,比如Sanger測序或者三代長reads,不適合二代Illumina短reads
The second category of assembler uses De Bruijn graph algorithms, which are well suited for short reads produced by second-generation sequencing techniques such as the Illumina sequencing platform.包括SOAPDENOVO, EULER VELVET,and WTDBG等軟體(計算不同的K-mers size是耗時步驟).
The third category includes software implementing greedy graph algorithms.
SSAKE, SHARCGS and VCAKE. Many published insect genomes were assembled by CABOG , SOAPdenovo, ALLPATH-LGor ABYSS.
Scaffolding method that can assist assembly: Hi-C technology. It is a sequencing-based approach for determining how a genome is folded by measuring the frequency of contact between pairs of loci.
It can assist genome assembly to the chromosome level without additional genetic map information.Though it does not generate or improve existing contigs, this technology is useful for obtaining information with chromosome-length scaffolds.
mapping assembly first determines the position of reads relative to the reference genome and then assembles the reads into contigs or scaffolds.影響基因組組裝質量的因素:
repetitive sequences and heterozygosity
A large number of repetitive sequences in the genome can cause substantial ambiguity in the processof assembling contigs and scaffolds.
Heterozygosity, or allelic variation in the sequenced individual(s), also greatly complicates the problem of genome assembly.
inbred homozygous individuals (or haploid males in the case of Hymenoptera).
PLATANUS and REDUNDANS, are reported to improve the assembly quality of heterozygous genomes.
novel 『long read』 sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore,are also contributing to major advances in the quality of genome assembly. some assembly programssuch as FALCON , CANU and WTDBG.
評估draft genome質量(需要多種方法評估基因組組裝質量)
The size of the genome assemblywhole body without intestinal tract and ovarium
estimates of genome size: flow cytometry or K-mer analysis.
If the assembly is smaller than expected, it is likely because it is incomplete or due to repeat collapse.
If the assembly is larger than expected, this often reflects the fact that independent assembly of haplotypes has resulted in redundancies.
The correctness of the genome assemblypreviously assembled from independent data (eg BAC sequencing), these can be used to evaluate the correctness of the assembled genome.
the congruence of paired-end or mate-pair reads when mapped to the assembly can inform the quality of genome assembly.
the distance between the mapped reads should be consistent with the insertion size when constructing the library. This congruence mapping approach has been implemented in software such as QUAST and REAPR.
all scaffolds or contigs are sorted from longest to shortest, then the sequence lengths of each scaffold or contig are sequentially summed. When the accumulated value reaches half of the total assembly length, the length of the corresponding scaffold or contig is defined as the N50.
in some special cases, high N50 might be due to aggressive misassembly, which is worthy of further research.The Core Eukaryotic Genes Mapping Approach (CEGMA) pipeline is one widely used method for implementing this approach (Parra et al., 2007). CEGMA identifies 458 core genesthat are highly conserved in eukaryotes and searches for these genes in the assembled scaffolds(May 18th 2015 - CEGMA is no longer being supported).
Benchmarking Universal Single-Copy Orthologs (BUSCO), BUSCO uses a set of 2647 1 : 1 orthologous genes for arthropods. The quality of genome assembly is reflected by the percentage of these orthologous genes that can be found in the assembled scaffolds.
Genome annotationstructural annotation and functional annotation
Structural annotation comes first, identifying which regions of the assembly correspond to specific features, such as genes (including intron–exon boundaries) and transposable elements (TEs).
Once the structural features are delineated, functional annotation aims to infer the function and identity of genes and other elements, based on sequence similarities.
Identifying repeat sequencehomology searching
Homology searching identifies homologous repeat sequences based on sequence similarity. The software REPEATMASKER is widely used for this task, in conjunction with the RepBase collection of characterized TEs.
ab initio prediction
The ab initio prediction method uses structural features of the repetitive sequence to identify novelrepeat sequences. This method has great advantages in predicting repetitive sequence with distinct structural features, such as miniature inverted-repeat TEs and long terminal repeats.
RECON, PILER , REPEATSCOUT (Price et al., 2005), LTR-FINDER and REPEATMODELER .
Identification of noncoding RNANoncoding RNA is a class of RNA genes that do not produce protein products such as transfer RNA(tRNA), ribosomal RNA, piwi-interacting RNA, micro- RNA (miRNA), small nucleolar RNA, or repeat associatedsmall interfering RNA.
Software: MIRDEEP, TRNASCAN, INFERNAL and RNASTRUCTURE.
Database: RNAdb, NONCODE, Rfam, miRBase and snoRNABase.
Prediction of protein-coding genes(1) identifying homologues of known protein coding genes through sequence similarity.
(2) de novo predicting the protein-coding genes with software developed via machine learning of protein-coding gene structures.
(3) determining the exonic regions by direct transcriptome sequencing [eg RNA sequencing (RNAseq)or expressed sequence tags (ESTs)] and aligning to the assembled scaffolds.
strengths and weaknesses
the protein-coding genes found by homologue searching are typically robust to false positives, but only known protein genes can be found.
De novo prediction can find more candidates but may have high false-positive rates.
Expression evidence by RNA-seq data is typically the most definitive approach, but relies heavily on the quality and quantity of the transcriptome and the samples chosen for RNA-seq.
AUGUSTUS, EVIDENCEMODELER, GLEAN, EVIGAN, MAKER, JIGSAW and EVIGENE/EVIDENTIALGENE, OMIGA and NCBI’s eukaryotic genome annotation pipeline.
A chromosome‐level genome assembly reveals the genetic basis of cold tolerance in a notorious rice insect pest, Chilo suppressalismarkAbstract二化螟是一種全球性危害水稻的害蟲,本研究通過結合 Illumina 以及 PacBio 測序技術.得到二化螟基因組大小為 824.35 Mb, contig N50 為 307 kb,scaffold N50 為 1.75 Mb. 通過Hi-C scaffold1ing將99.2%的鹼基分配到29條染色體上,注釋到了15,653 蛋白編碼基因。
通過特定基因家族的進化分析,從基因組的角度闡明了二化螟抗寒性(cold tolerance)的分子機制。具體包括以下代謝通路:glucose‐originated glycerol biosynthesis, triacylglycerol‐originated glycerol biosynthesis, fatty acid synthesis and trehalose transport‐intermediate cold tolerance.
INTRODUCTION昆蟲主要通過兩種方式越冬:一種是遷徙,逃離寒冷區域;另外一種是昆蟲自身的抗寒機制。
對於後者同樣有兩種機制:Freeze avoidance and Freeze tolerance
Freeze avoidance involves the accumulation of extremely high levels of cryoprotectants (chemicals thatprevent/minimize freezing damage at the cellular level) and the synthesis of antifreeze proteins (AFPs).
Freeze tolerance involves specific ice‐nucleating agents or proteins (INAs or INPs) that trigger the freezing of extracellular water, and cryoprotectants that protect the cell membranes of desiccated cells by maintaining a liquid intracellular space.
Cryoprotective dehydration and vitrification (extreme dehydration) have also been proposed as insect winter survival strategies.
C. suppressalis overwintering strategy involves the accumulation of glycerol, the most common insect cryoprotectant.
Enzyme activities associated with glycerol biosynthesis in C. suppressalis have also been reported to be dependent on seasonal changes.
C. suppressalis draft genome (sequenced solely using the Illumina Solexa platform) was reported, the assembly was hampered by high heterozygosity and its contig N50 was only 5.2 kb.
RNA sequencing (RNA‐seq) analysis of larval responses to low temperatures under diapause (DA)/nondiapause (ND) and cold acclimation (CA)/noncold acclimation (NC).
MATERIALS AND METHODSInsectsPupae were used to obtain DNA samples for both Illumina and PacBio sequencing, and newly hatched larvae were used to obtain DNA samples for Hi‐C sequencing.
Genome sequencingIllumina sequencing
Illumina pair-end and mate-paired sequencing was performed
Pacific Biosciences SMRT sequencing
One library with insert sizes of approximately 5,000-bp was constructed for the PacBio RS II sequencing system.
K-mer analysis and estimation of the genome sizeK-mer counts were determined using on the pair-end sequencing data (insert size of the library <2 kb) with Jellyfish and a K-mer setting = 23. a predicted genome size was 788,873,227 bp with 121-fold coverage and a heterozygosity of 0.75%
Genome assemblyAssembly polishing and filteringQuality assessmentde novo assemblyHi‐C scaffolding30 μg DNA; Newly hatched larvae
Unmapped paired‐end reads, singleton reads, multiple mapped reads and PCR duplication were filtered by an optimized and flexible pipeline (HiC‐Pro).
lachesis was used to cluster the scaffold into 29 groups according to the agglomerative hierarchical clustering algorithm with default parameters.
Genome annotationProtein coding genes were predicted based on three approaches: ab initio prediction, homology-based prediction, and transcript-based prediction
(1) ab initio prediction. Augustus and GeneMark-ES were used for ab initio gene prediction.GeneMark-ES is a self-training algorithm that has been used for gene identification of novel eukaryotic genomes. In order to improve the accuracy of de novo training, Augustus (parameters: --strand = both --species = heliconius_melpomene1 --extrinsicCfgFile = extrinsic.E.XNT.cfg --alternatives-from-evidence = true --hintsfile = hints.gff --allow_hinted_splicesites = atac) was trained using homologous protein sequences from Bombyx mori and Amyelois transitella along with RNA-Seq data.(2) Homology-based prediction. The annotated gene sets from ten species, Drosophila melanogaster, Drosophila pseudoobscura, Aedes aegypti, Anopheles gambiae, Bombyx mori, Amyelois transitella, Danaus plexippus, Papilio machaon, Helicoverpa armigera, and Spodoptera litura were aligned to the C.suppressalis genome utilizing exonerate protein2genome with a 「--percent 50」 parameter to identify exact intron/exon positions.(3) Transcript-based prediction. Hisat2 was used to align the transcriptomic data to the genome and gene information was predicted using Stringtie. The PASA annotation pipeline, which uses GMAP and BLAT to align Trinity transcript assemblies to the genome. The PASA alignment assemblies can then be used to automatically extract protein coding regions for use in TransDecoder.
The combined gene structures from the three approaches were integrated via EvidenceModeler and filtered for sequences that lacked homologs in the other species or were absent from the RNA-Seq data
(1) First, a search of regions 2 Kb upstream and downstream of the 15,658 genes found that the vast majority of genes (14,722, 94.02%) lacked ambiguous bases (Ns), indicating that these gene models are not located near an assembly gap and thus the gene models are unlikely to be fragments.(2) Second, we used BUSCO software to assess the completeness of genome annotations. The presence of95.5% of the single copy BUSCO genes suggests that the predicted genes are complete.
Annotation of repetitive elementsTo identify repeat elements in the C. suppressalis genome, a de novo self-specificity repeat library was constructed using RepeatModeler with RECON , RepeatScout , TRF , and NSEG. Then an ab initio repeat library was generated that combined RepeatMasker and the Repbase database, following the suggested parameter values.
Functional annotationTo obtain putative functions of the annotated genes, protein sequences were aligned to the NCBI NR , NT, and TrEMBL databases with an E-value cutoff of 1E-10. We also annotated Motifs and domains using InterProScan with publicly available databases including Gene3D, PRINTS, Pfam, CDD, SMART, MobiDBLite and PROSITE. Descriptions of gene function, which were retrieved from the InterProScan results, were classified with Gene Ontology .
RNA‐seq for larval cold‐responseIn both treatments**(DA and ND)**, the fifth‐instar larvae were again divided into two groups, respectively: (a) to simulate seasonal cooling, CA larvae were generated by gradually decreasing temperatures 5°C per day over a 5‐day period from the beginning of the final instar to a final temperature of 5°C; (b) NC larvae were maintained at 25°C and collected on the second day of the final instar.
All CA larvae (DA‐CA and ND‐CA) were then exposed to cold shock at 0 or −5°C for 2 hr. NC larvae (DA‐NC and ND‐NC) were likewise exposed to cold shock at 5, 0 or −5°C for 2 hr.
DA = diapause; ND = non-diapause; CA = cold training; NC = non-cold training
RNA‐seq analysisQuality control and alignment: Trimmomatic to remove adapter and low‐quality sequences, and then aligned to the C. suppressalis genome generated above using hisat2.
Quantification: The total number of genome‐mapped reads to gene was quantified using htseq2 (‐t exon, ‐i gene_id, mode = union) with the annotated C. suppressalis GFF file.
Differential expression: The mapped reads count was used to test differential expression with Deseq2. Genes with an adjusted p value <.05 and |log2 ratio| ≥ 1 in at least one temperature condition were considered to be differentially expressed.
Orthology and evolutionProtein sequence data sets from 10 species.
Redundant alternative splicing events were filtered to generate a single transcript for each protein set,and then all‐against‐all protein comparisons were performed using blastp with a cutoff of E < 10−5.
High‐scoring segment pair (HSP) segments between the same pair of proteins were processed using orthomcl, which was followed by homology identification among protein sequences based on bit‐scores to identify final orthologues, inparalogues and co‐orthologues.
直系同源基因是通過物種形成事件產生的; 而旁系同源基因是通過基因重複事件產生的。
如果在物種形成事件之後發生基因重複事件, 那麼, 就會出現一對多或者多對多的直系同源關係, 稱其為共同直系同源基因 (co-orthologues)。所以在兩個物種的基因組中, 最相似的序列僅僅意味著它們可能是直系同源基因, 但是並不能充分證明它們肯定就是直系同源基因。
物種外旁系同源基因(outparalogues) 指由發生在物種形成之前的基因重複事件產生的旁系同源基因;物種內旁系同源基因(inparalogues) 指由發生在物種形成之後的基因重複事件產生的旁系同源基因,有助於我們區分更古老的旁系同源基因和近期產生的旁系同源基因。
To estimate species phylogeny, protein sequences for 500 single copy conserved orthologues were selected. Multiple protein sequence alignments for each orthologous group were then performed using muscle and the conserved blocks were extracted using gblocks. Conserved protein blocks with single copies in all species were concatenated to 11 super genes of 199,150 amino acids and used to construct maximum likelihood phylogenetic trees with the JTT model in phyml. Statistical support was obtained with 100 bootstrap replicates.
mcmctree (paml 4.8 package) with parameters 「‐clock 2 ‐alpha 0.5 ‐model 3」 and known time divergencedata in timetree (http://www.timetree.org/) were used to estimate divergence times among species.
To examine gene family expansion and contraction among species, we used cafe with p < .05 and "‐s" to automatically search for the birth and death parameter (λ) of genes.
The gene family results from the orthomcl pipeline and the estimated divergence times between species were used as inputs.
RESULTS AND DISCUSSIONGenome assemblyKaryotype and chromosome syntenyHi‐C long‐range scaffolding to obtain chromosomal information. Hi‐C linking information anchored, ordered and oriented 25,944 scaffolds to 29 chromosomes by lachesis with default parameters.
The map indicates that intra-chromosome interactions were strong while inter-chromosome interactions were weak.
Synteny analysis results for C. suppressalis suggest two fusion events: Chr1 arising from fusion of T. ni Chr0, Chr13 and Chr14, and Chr2 from fusion of T. ni Chr1, Chr26 and some segments of Chr29.
The phylogenetic relationship of Chilo suppressalis was estimated using a maximum likelihood analysis of a concatenation of 500 single‐copy orthologous protein sequences over 100 bootstrap replicates.
Species phylogenetic tree and gene orthology1:1:1 indicates absolute single‐copy genes (absence or duplication in a single genome is not tolerated); N:N:N indicates multicopy gene paralogues found across all 11 species (absence of one in a single genome is not tolerated); SS indicates species‐specific genes (excluded from other species genomes); SD indicates multiple copies of species‐specific genes; ND indicates species‐specific genes in single copies.
Genomic basis for cold tolerancetwo α‐amylases(α‐澱粉酶), which catalyse the catabolism of long‐chain carbohydrates to yield maltose, glucose or 「limit dextrin」(極限糊精) from amylopectin(支鏈澱粉).
a single β‐1,3‐glucanase(葡聚糖酶) that breaks down β‐1,3‐glucans(葡聚糖) to yield glucose or polysaccharides(多糖) made of multiple glucose subunits;and a maltase that catalyses the hydrolysis of maltose to glucose.
fructose‐bisphosphate aldolase(FBA) (二磷酸果糖酶) is essential for producing glycerol biosynthetic precursors from glucose by catalysing the reversible reaction that splits fructose‐1,6‐bisphosphate(F1,6P2) into the triose phosphates dihydroxyacetone phosphate (DHAP)(磷酸二羥基丙酮) and glyceraldehyde phosphate (GAP)(磷酸甘油醛).
two glycerol biosynthesis‐related gene homologues: glycerol‐3‐phosphate dehydrogenase (G3PDH)(三磷酸甘油醛脫氫酶), which catalyses the reversible redox conversion(可逆氧化還原反應) of DHAP to glycerol 3‐phosphate (G3P)(甘油三磷酸); and the glycerol kinase (GK)(甘油激酶) that catalyses the reverse reaction of G3Pase to generate G3P by transferring a phosphate from ATP to glycerol.
G3PDH transcription is dependent on cold acclimation more than diapause state, and that the GK gene is activated in response to early cold
Some lipases catalyse the hydrolysis of glycerol esters to yield free fatty acids and glycerol.glycerol from triacylglycerols requires cold acclimation and a state of diapause similar to that observed for glucose‐dependent glycerol biosynthesis.
one of the most highly expressed transcripts in DA‐CA larvae was a putative fatty acid synthase (FAS). FAS catalyses the synthesis of the palmitoleic acid precursor, palmitic acid(軟脂酸).
trehalose metabolic pathway‐related include a putative trehalose transporter (TRET) orthologue.
Trehalase(TREH) catalyses the conversion of trehalose back to glucose ; given that energy storage is critical for survival in cold environments.
heat shock protein (Hsp), transferrin(轉鐵蛋白), catalase(過氧化氫酶), arylphorin(芳基貯存蛋白) and methionine‐rich storage protein(蛋氨酸豐富貯存蛋白). All were significantly upregulated at low temperatures.
下期再見
ReferenceBrown, S.J., Pfrender, M.E. (Eds.), 2019. Insect Genomics: Methods and Protocols, Methods in Molecular Biology. Springer New York, New York, NY. https://doi.org/10.1007/978-1-4939-8775-7
Li, F., Zhao, X., Li, M., He, K., Huang, C., Zhou, Y., Li, Z., Walters, J.R., 2019. Insect genomes: progress and challenges. Insect Mol. Biol. 28, 739–758. https://doi.org/10/ggsjwv
Ma, W., Zhao, X., Yin, C., Jiang, F., Du, X., Chen, T., Zhang, Q., Qiu, L., Xu, H., Joe Hull, J., Li, G., Sung, W., Li, F., Lin, Y., 2020. A chromosome‐level genome assembly reveals the genetic basis of cold tolerance in a notorious rice insect pest, Chilo suppressalis. Mol Ecol Resour 20, 268–282. https://doi.org/10/ggn4kk