GO analysis using clusterProfiler

2021-02-21 YuLabSMU

clusterProfiler supports over-representation test and gene set enrichment analysis of Gene Ontology. It supports GO annotation from OrgDb object, GMT file and user's own data.

support many species

In github version of clusterProfiler, enrichGO and gseGO functions removed the parameter organism and add another parameter OrgDb, so that any species that have OrgDb object available can be analyzed in clusterProfiler. Bioconductor have already provide OrgDb for about 20 species, see
http://bioconductor.org/packages/release/BiocViews.html#___OrgDb, and users can build OrgDb via AnnotationHub.

library(AnnotationHub)
hub <- AnnotationHub()

#
query(hub, "Cricetulus")

#
#
#
#
#
#
#
#
#
#
#
#
#
#
Cgriseus <- hub[["AH48061"]]

#
Cgriseus

#
#
#
#
#
#
#
#
#

#
#
sample_gene <- sample(keys(Cgriseus), 100)
str(sample_gene)

#
library(clusterProfiler)
sample_test <- enrichGO(sample_gene, OrgDb=Cgriseus, pvalueCutoff=1, qvalueCutoff=1)
head(summary(sample_test), 2)

#
#
#
#
#
#
#
#
#

support many ID types

The input ID type can be any type that was supported in OrgDb object.

library(org.Hs.eg.db)
data(geneList)
gene <- names(geneList)[abs(geneList) > 2]
gene.df <- bitr(gene, fromType = "ENTREZID",
toType = c("ENSEMBL", "SYMBOL"),
OrgDb = org.Hs.eg.db)
head(gene.df, 3)

#
#
#
#
ego <- enrichGO(gene = gene,
universe = names(geneList),
OrgDb = org.Hs.eg.db,
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
head(summary(ego), 2)

#
#
#
#
#
#
#
#
#
#
#
#
ego2 <- enrichGO(gene = gene.df$ENSEMBL,
OrgDb = org.Hs.eg.db,
keytype = 'ENSEMBL',
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
head(summary(ego2), 2)

#
#
#
#
#
#
#
#
#
#
#
#
ego3 <- enrichGO(gene = gene.df$SYMBOL,
OrgDb = org.Hs.eg.db,
keytype = 'SYMBOL',
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
head(summary(ego3), 2)

#
#
#
#
#
#
#
#
#
#
#
#

Using SYMBOL directly is not recommended. User can use setReadable
function to translate geneID to gene symbol.

ego <- setReadable(ego, OrgDb = org.Hs.eg.db)
ego2 <- setReadable(ego2, OrgDb = org.Hs.eg.db)
head(summary(ego), n=3)

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
head(summary(ego2), n=3)

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

enrichGO test the whole GO corpus and enriched result may contains very general terms. User can use dropGO function to remove specific GO terms or GO level. If user want to restrict the result at sepcific GO level, they can use gofilter function. We also provide a simplify method to reduce redundancy of enriched GO terms, see thepost.

Visualization functions

dotplot(ego, showCategory=30)

enrichMap(ego, vertex.label.cex=1.2, layout=igraph::layout.kamada.kawai)

cnetplot(ego, foldChange=geneList)

plotGOgraph(ego)

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

Gene Set Enrichment Analysis

gsecc <- gseGO(geneList=geneList, ont="CC", OrgDb=org.Hs.eg.db, verbose=F)
head(summary(gsecc))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
gseaplot(gsecc, geneSetID="GO:0000779")

GO analysis using user's own data

clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

An example of using enricher and GSEA to analyze DisGeNet annotation is presented in the post, use clusterProfiler as an universal enrichment analysis tool.

GMT files

We provides a function, read.gmt, that can parse GMT file into a TERM2GENE data.frame that is ready for both enricher and GSEA functions.

gmtfile <- system.file("extdata", "c5.cc.v5.0.entrez.gmt", package="clusterProfiler")
c5 <- read.gmt(gmtfile)
egmt <- enricher(gene, TERM2GENE=c5)
head(summary(egmt))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
egmt <- setReadable(egmt, OrgDb=org.Hs.eg.db, keytype="ENTREZID")
head(summary(egmt))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
gsegmt <- GSEA(geneList, TERM2GENE=c5, verbose=F)
head(summary(gsegmt))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

Citation

Yu G, Wang L, Han Y and He Q*. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.

GO analysis using clusterProfiler

相關焦點

clusterProfiler事後丸: 轉換ID為SYMBOL

NGC 2244: A Star Cluster in the Rosette Nebula

redis cluster-cluster 命令手動管理redis集群

【News】Seminar on Structural Equation Modelling Analysis | AMOS

K8s單controller多集群監測總結——client-go

Star Cluster R136 Breaks Out

redis cluster 之master 選舉過程

Meta analysis for estimating incubation period of COVID-19

如何做GO和KEGG富集分析(GSEA)?

redis cluster 集群管理工具

Use thermal analysis to predict an IC『s transient behavior...

Dubbo-go v1.5.1 發布,Apache Dubbo 的 Go 實現

世界首創雲端遊戲主機G-cluster將於下月發售

不懂Redis Cluster原理,我被同事diss了!

如何像英語母語者一樣使用模糊用語 Using Vague Language