GO analysis using clusterProfiler

2021-02-21 YuLabSMU

clusterProfiler supports over-representation test and gene set enrichment analysis of Gene Ontology. It supports GO annotation from OrgDb object, GMT file and user's own data.

support many species

In github version of clusterProfiler, enrichGO and gseGO functions removed the parameter organism and add another parameter OrgDb, so that any species that have OrgDb object available can be analyzed in clusterProfiler. Bioconductor have already provide OrgDb for about 20 species, see
http://bioconductor.org/packages/release/BiocViews.html#___OrgDb, and users can build OrgDb via AnnotationHub.

library(AnnotationHub)
hub <- AnnotationHub()

#
query(hub, "Cricetulus")

#
#
#
#
#
#
#
#
#
#
#
#
#
#
Cgriseus <- hub[["AH48061"]]

#
Cgriseus

#
#
#
#
#
#
#
#
#

#
#
sample_gene <- sample(keys(Cgriseus), 100)
str(sample_gene)

#
library(clusterProfiler)
sample_test <- enrichGO(sample_gene, OrgDb=Cgriseus, pvalueCutoff=1, qvalueCutoff=1)
head(summary(sample_test), 2)

#
#
#
#
#
#
#
#
#

support many ID types

The input ID type can be any type that was supported in OrgDb object.

library(org.Hs.eg.db)
data(geneList)
gene <- names(geneList)[abs(geneList) > 2]
gene.df <- bitr(gene, fromType = "ENTREZID",
       toType = c("ENSEMBL", "SYMBOL"),
       OrgDb = org.Hs.eg.db)
head(gene.df, 3)

#
#
#
#
ego <- enrichGO(gene          = gene,
               universe      = names(geneList),
               OrgDb         = org.Hs.eg.db,
               ont           = "CC",
               pAdjustMethod = "BH",
               pvalueCutoff  = 0.01,
               qvalueCutoff  = 0.05)
head(summary(ego), 2)

#
#
#
#
#
#
#
#
#
#
#
#
ego2 <- enrichGO(gene         = gene.df$ENSEMBL,
               OrgDb         = org.Hs.eg.db,
               keytype       = 'ENSEMBL',
               ont           = "CC",
               pAdjustMethod = "BH",
               pvalueCutoff  = 0.01,
               qvalueCutoff  = 0.05)
head(summary(ego2), 2)

#
#
#
#
#
#
#
#
#
#
#
#
ego3 <- enrichGO(gene         = gene.df$SYMBOL,
               OrgDb         = org.Hs.eg.db,
               keytype       = 'SYMBOL',
               ont           = "CC",
               pAdjustMethod = "BH",
               pvalueCutoff  = 0.01,
               qvalueCutoff  = 0.05)
head(summary(ego3), 2)

#
#
#
#
#
#
#
#
#
#
#
#

Using SYMBOL directly is not recommended. User can use setReadable
function to translate geneID to gene symbol.

ego <- setReadable(ego, OrgDb = org.Hs.eg.db)
ego2 <- setReadable(ego2, OrgDb = org.Hs.eg.db)
head(summary(ego), n=3)

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
head(summary(ego2), n=3)

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

enrichGO test the whole GO corpus and enriched result may contains very general terms. User can use dropGO function to remove specific GO terms or GO level. If user want to restrict the result at sepcific GO level, they can use gofilter function. We also provide a simplify method to reduce redundancy of enriched GO terms, see thepost.

Visualization functions

dotplot(ego, showCategory=30)

enrichMap(ego, vertex.label.cex=1.2, layout=igraph::layout.kamada.kawai)

cnetplot(ego, foldChange=geneList)

plotGOgraph(ego)

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

Gene Set Enrichment Analysis

gsecc <- gseGO(geneList=geneList, ont="CC", OrgDb=org.Hs.eg.db, verbose=F)
head(summary(gsecc))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
gseaplot(gsecc, geneSetID="GO:0000779")

GO analysis using user's own data

clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

An example of using enricher and GSEA to analyze DisGeNet annotation is presented in the post, use clusterProfiler as an universal enrichment analysis tool.

GMT files

We provides a function, read.gmt, that can parse GMT file into a TERM2GENE data.frame that is ready for both enricher and GSEA functions.

gmtfile <- system.file("extdata", "c5.cc.v5.0.entrez.gmt", package="clusterProfiler")
c5 <- read.gmt(gmtfile)
egmt <- enricher(gene, TERM2GENE=c5)
head(summary(egmt))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
egmt <- setReadable(egmt, OrgDb=org.Hs.eg.db, keytype="ENTREZID")
head(summary(egmt))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
gsegmt <- GSEA(geneList, TERM2GENE=c5, verbose=F)
head(summary(gsegmt))

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

Citation

Yu G, Wang L, Han Y and He Q*. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.

相關焦點

  • clusterProfiler事後丸: 轉換ID為SYMBOL
    clusterProfiler系列,全部函數都會輸出,但看基因ID,比如ENTREZID或ENSEMBLE,這些都對人類不友好,看了你也不知道是什麼,為了讓大家看結果的時候,還能有點感覺,我們需要把基因翻譯成symbol,有那麼一批函數比如DO、GO、Reactome的分析都是有readable參數的,但有一些是沒有這個參數的,我被問得最多的是KEGG的分析為什麼沒有!
  • NGC 2244: A Star Cluster in the Rosette Nebula
    NGC 2244: A Star Cluster in the Rosette Nebula Image Credit & Copyright: Don Go
  • redis cluster-cluster 命令手動管理redis集群
    使用cluster命令管理redis cluster集群1、列印集群的信息 CLUSTER INFO cluster_state:okcluster_slots_assigned:16384cluster_slots_ok:16384cluster_slots_pfail
  • 【News】Seminar on Structural Equation Modelling Analysis | AMOS
    Event Name: Seminar on Structural Equation Modelling (SEM) Analysis using AMOSOrganized by: PG Academic ClubVenue: Sanjiang Building
  • K8s單controller多集群監測總結——client-go
    client-go的github地址:https://godoc.org/admiralty.io/multicluster-controller,client-go裡面的examples然後,GoDOC: https://godoc.org/admiralty.io/multicluster-controller
  • Star Cluster R136 Breaks Out
    Virginia Explanation: In the center of nearby star-forming region lies a huge cluster containing some of the largest, hottest, and most massive stars known.
  • redis cluster 之master 選舉過程
    在redis 3.0版本後,官方推出了redis cluster 分布式解決方案,當一個redis節點掛了可以快速地切換到另一個節點。當遇到單機內存、並發等瓶頸時,可以採用分布式方案要解決問題.redis-cluster架構中,被設計成共有16384(2的14次方)個hash slot。每個master分得一部分slot,其算法為:hash_slot = crc16(key) mod 16384 ,這就找到對應slot。群集至少需要3主3從,且每個實例使用不同的配置文件。
  • Meta analysis for estimating incubation period of COVID-19
    報告題目:Meta analysis for estimating incubation period of COVID-19報告 人:周勇 教授 華東師範大學報告時間:2020年8月6日上午9:00—10:00報告地點:騰訊會議 會議ID:219 703 207會議密碼:0806
  • 如何做GO和KEGG富集分析(GSEA)?
    除此之外,還有一些R packages也有類似功能,像fgsea和clusterProfiler.2.數據和資料庫準備首先我們需要經過基因表達差異分析獲得基因列表。其次我們需要下載The Molecular Signatures Database (MSigDB)。
  • redis cluster 集群管理工具
    前言在redis源碼編譯的時候,在src目錄下會有一個redis-trib.rb的腳本,這個腳本是ruby寫的,用於管理redis cluster。info命令也是先執行load_cluster_info_from_node獲取完整的集群信息。/opt/redis/bin/redis-trib.rb info 127.0.0.1:80013、check檢查集群:檢查集群狀態的命令,沒有其他參數,只需要選擇一個集群中的一個節點即可。
  • Use thermal analysis to predict an IC『s transient behavior...
    Based on that analysis, we then propose an equivalent passive RC network for modeling an IC's transient thermal behavior.
  • Dubbo-go v1.5.1 發布,Apache Dubbo 的 Go 實現
    Dubbo-go 團隊近期發布了 Dubbo-go v1.5.1,Dubbo-go 是 Apache Dubbo 項目的 Go
  • 世界首創雲端遊戲主機G-cluster將於下月發售
    據國外媒體報導,日本影像製作/下載服務公司Broadmedia近日宣布,新時代的雲端遊戲機G-cluster將於6月20日在日本國內發售,並將於5月30日通過家電量販店及網路開始預售。
  • 不懂Redis Cluster原理,我被同事diss了!
    圖 5:接受節點把節點槽的對應信息保存在本地如圖 5 所示,當收到發送節點的節點槽信息以後,接受節點會將這些信息保存到本地的 clusterState 的結構中,其中 Slots 的數組就是存放每個槽對應哪些節點信息。
  • 如何像英語母語者一樣使用模糊用語 Using Vague Language
    For example: "I have to go to that thing tonight.""thing" 和 "stuff" 這兩個詞還都可以用於談論你必須做的事情。舉個例子:「我今天晚上要去參加那個活動。」Or "I have a lot of stuff to do next week."