RGAs是和植物抗病相關基因。主要包括含有一下domain活著motif的基因:
nucleotide binding site (NB-ARC)
leucine rich repeat (LRR)
transmembrane (TM)
serine/threonine and tyrosine kinase (STTK)
lysin motif (LysM)
coiled-coil (CC)
Toll/Interleukin-1 receptor (TIR)
其中PRGdb收集了植物的相關RGAs(http://www.prgdb.org/prgdb/)。這裡我們主要介紹一個pipeline-RGAugury進行whole genome水平預測植物抗病基因。
1.pipeline安裝
下載pipeline連結
https://bitbucket.org/yaanlpc/rgaugury/src/master/
2.軟體要求
在我的server我構建了一個conda環境,會給後續帶來很大方便。
BLAST+ package download the file ending with "x64-linux.tar.gz" extension
Hmmer3 install Hmmer prior to pfam_scan package
pfam_scan package, make sure pfam_scan.pl can directly run from anywhere without adding path prefix. Check this link for easier dependency installation.
phobius1.01 packages, this is a 32bit program, you need to make sure the 64bit Linux Operation System has installed 32bit runtime (libstdc++6:i386) to load it. Refer to this thread for further help.
ncoils package has been embedded in this package, given that a minor modification in source code, making it adatp to the pipeline, thus we don't hope you use original one.
git is optional for you to directly clone our repository. We highly suggest you to use git to clone this repository in that the files' permission can be kept in right way.
jdk, JDK 1.8 is a requisite component when using InterproScan over v57.
interproscan, a HMM based domain/motif identification package
CViT, a genomic linkage feature visualization tools package based on Perl. Be sure all required perl modules have been successfully installed and no error reported when using CViT independent of RGAugury.
3.庫的下載和構建
Prior to installation of GD modules, you might need to install below libraries first.
4.module下載和安裝
RGAugury dependency
CViT needs below modules:
Config::IniFiles
GD::SVG
GD::Arrow
GD::Text
Pfam_scan.pl needs below module:
Moose this is an essential module for pfam_scan package, see Pfam_scan's README to install. Following this guide for easier install. Or use command "cpan install Moose".
bioperl install BioPerl core via CPAN or its official website.
Check above installed software and programs and make sure all of them have been correctly setup the owner and file permission.
這個是安裝成功與否的關鍵步驟,需要嚴格安裝以上過程進行安裝。
Below is a example how I setup my environment variables from scratch in a clean Ubuntu 14.04/16.04 LTS, user should change path correspondingly.
export PATH=$PATH:/home/lipch/bin/phobius1.01 # to specify the path of phobius.pl script and binary.
export PATH=$PATH:/home/lipch/bin/hmmer3/bin # binary path
export PATH=$PATH:/home/lipch/bin/blast/bin # binary path of blast+ package
export PATH=$PATH:/home/lipch/RGAugury_pipeline # this package scripts path
export PATH=$PATH:/home/lipch/RGAugury_pipeline/coils #the path to scoils-ht, which is a modified version of coils to adapt to RGAugury pipeline.
export PATH=$PATH:/home/lipch/database/interproscan-x.xx-xx.0 #download latest one as your wish. Do not add the path of "bin" under interproscan directory.
export PATH=$PATH:/home/lipch/Downloads/PfamScan #to specify the path for script of pfam_scan.pl
export PATH=$PATH:/home/lipch/bin/cvit.1.2.1 #to specify the path of cvit.pl in CViT package, make sure cvit.pl can be found by 'which' command.
export COILSDIR=/home/lipch/RGAugury_pipeline/coils # or create a plain file with putting this command only but a directory all user can access and drop it to /etc/profile.d/, file permission changes to 755, otherwise export it to user's profile and point to another user authorized directory
export PERL5LIB=/home/lipch/Downloads/PfamScan:$PERL5LIB #perl module for pfam_scan.pl
export PFAMDB=/home/lipch/database/pfamdb #to specifiy the hmm pfam-A/B DB path
Due to the parallel modification on Tools.pm, thus we need to change the worker number of interproscan to 1, which will avoid the panic of RAM. Be aware of that we only optimized for regular workstation with multile thread supported, if you want to take advantate of grid, please refer to corresponding interproscan manual.
number.of.embedded.workers=1
maxnumber.of.embedded.workers=1
Download this pipeline by trying below command under Linux system if GIT was installed.
git clone https://bitbucket.org/yaanlpc/rgaugury.git
Before running pipeline, make sure all Perl scripts files permission are modified to 755, in directory of RGAugury:
chmod 755 *.pl
under directory of coils, try:
chmod 755 scoils-ht
pfam Follow the installation guide of pfam_scan package["Download Pfam data files" section] to prepare binary files by using 'Pfam-A.hmm'. Make sure put all files under directory of /home/user_ID_to_be_replaced_by_yours/database/pfam/, because this path has been hard coded in our scripts. Alternatively, make sure pfam folder is consisted with setting of $pfam_index_folder in RGAugury.pl
RGADB, RGADB has been embedded in this package. Be sure to keep its location without any change.
panther, if panther db will be used in either command line or web UI, be sure install it correctly according to instruction of interproscan package, meanwhile, configuration file of interproscan might need proper modification.
8.pipeline的使用main script RGAugury.pl has six options, but only input file is mandatory to be specified in command line, make sure fasta file's seq title has only no-space gene ID. Export the RGAugury directory PATH to ENV variable.
Scripts: Resistance Gene Analogs (RGAs) prediction pipeline
Programmed by Pingchuan Li @ AAFC - Dr. Frank You Lab
Usage :perl RGAugury.pl <options>
arguments:
-p protein fasta file
-n corresponding cDNA/CDS nucleotide for -p (optional)
-g genome file in fasta format (optional)
-gff a modified gff3-like file, see below format (optional)
-c cpu or threads number, default = 2
-pfx prefix for filename, useful for multiple speices input in same folder (optional)
1.鑑定的RGAs從Brassica參考基因組
2.對鑑定的RGAs基因進行聚類分析,分成了4個clades
3.鑑定相關domain基因的分布
其實剩下的就是基本生物信息分析手段,如果感興趣的可以查看references原文。
Refences
Li P, Quan X, Jia G, Xiao J, Cloutier S, You FM. RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics. 2016;17(1):852. Published 2016 Nov 2. doi:10.1186/s12864-016-3197-x
Osuna-Cruz CM, Paytuvi-Gallart A, Di Donato A, Sundesha V, Andolfo G, Aiese Cigliano R, Sanseverino W, Ercolano MR. PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes. Nucleic Acids Research 2017. doi: 10.1093/nar/gkx1119
Dolatabadian, A., Bayer, P.E., Tirnaz, S., Hurgobin, B., Edwards, D. and Batley, J. (2020), Characterization of disease resistance genes in the Brassica napus pangenome reveals significant structural variation. Plant Biotechnol J, 18: 969-982. doi:10.1111/pbi.13262
Soodeh Tirnaz, Philipp Bayer, Fabian Inturrisi, Fangning Zhang, Hua Yang, Aria Dolatabadian, Ting X Neik, Anita Severn-Ellis, Dhwani Patel, Muhammad I Ibrahim, Aneeta Pradhan, David Edwards, Jacqueline Batley. Plant Physiology Aug 2020, pp.00835.2020; DOI: 10.1104/pp.20.00835