clusterProfiler supports over-representation test and gene set enrichment analysis of Gene Ontology. It supports GO annotation from OrgDb object, GMT file and user's own data.
support many speciesIn github version of clusterProfiler, enrichGO and gseGO functions removed the parameter organism and add another parameter OrgDb, so that any species that have OrgDb object available can be analyzed in clusterProfiler. Bioconductor have already provide OrgDb for about 20 species, see
http://bioconductor.org/packages/release/BiocViews.html#___OrgDb, and users can build OrgDb via AnnotationHub.
library(AnnotationHub)
hub <- AnnotationHub()
#
query(hub, "Cricetulus")
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Cgriseus <- hub[["AH48061"]]
#
Cgriseus
#
#
#
#
#
#
#
#
#
#
#
sample_gene <- sample(keys(Cgriseus), 100)
str(sample_gene)
#
library(clusterProfiler)
sample_test <- enrichGO(sample_gene, OrgDb=Cgriseus, pvalueCutoff=1, qvalueCutoff=1)
head(summary(sample_test), 2)
#
#
#
#
#
#
#
#
#
The input ID type can be any type that was supported in OrgDb object.
library(org.Hs.eg.db)
data(geneList)
gene <- names(geneList)[abs(geneList) > 2]
gene.df <- bitr(gene, fromType = "ENTREZID",
toType = c("ENSEMBL", "SYMBOL"),
OrgDb = org.Hs.eg.db)
head(gene.df, 3)
#
#
#
#
ego <- enrichGO(gene = gene,
universe = names(geneList),
OrgDb = org.Hs.eg.db,
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
head(summary(ego), 2)
#
#
#
#
#
#
#
#
#
#
#
#
ego2 <- enrichGO(gene = gene.df$ENSEMBL,
OrgDb = org.Hs.eg.db,
keytype = 'ENSEMBL',
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
head(summary(ego2), 2)
#
#
#
#
#
#
#
#
#
#
#
#
ego3 <- enrichGO(gene = gene.df$SYMBOL,
OrgDb = org.Hs.eg.db,
keytype = 'SYMBOL',
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
head(summary(ego3), 2)
#
#
#
#
#
#
#
#
#
#
#
#
Using SYMBOL directly is not recommended. User can use setReadable
function to translate geneID to gene symbol.
ego <- setReadable(ego, OrgDb = org.Hs.eg.db)
ego2 <- setReadable(ego2, OrgDb = org.Hs.eg.db)
head(summary(ego), n=3)
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
head(summary(ego2), n=3)
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
enrichGO test the whole GO corpus and enriched result may contains very general terms. User can use dropGO function to remove specific GO terms or GO level. If user want to restrict the result at sepcific GO level, they can use gofilter function. We also provide a simplify method to reduce redundancy of enriched GO terms, see thepost.
Visualization functionsdotplot(ego, showCategory=30)
enrichMap(ego, vertex.label.cex=1.2, layout=igraph::layout.kamada.kawai)
cnetplot(ego, foldChange=geneList)
plotGOgraph(ego)
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
gsecc <- gseGO(geneList=geneList, ont="CC", OrgDb=org.Hs.eg.db, verbose=F)
head(summary(gsecc))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
gseaplot(gsecc, geneSetID="GO:0000779")
clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.
An example of using enricher and GSEA to analyze DisGeNet annotation is presented in the post, use clusterProfiler as an universal enrichment analysis tool.
GMT filesWe provides a function, read.gmt, that can parse GMT file into a TERM2GENE data.frame that is ready for both enricher and GSEA functions.
gmtfile <- system.file("extdata", "c5.cc.v5.0.entrez.gmt", package="clusterProfiler")
c5 <- read.gmt(gmtfile)
egmt <- enricher(gene, TERM2GENE=c5)
head(summary(egmt))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
egmt <- setReadable(egmt, OrgDb=org.Hs.eg.db, keytype="ENTREZID")
head(summary(egmt))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
gsegmt <- GSEA(geneList, TERM2GENE=c5, verbose=F)
head(summary(gsegmt))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Yu G, Wang L, Han Y and He Q*. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.