前面我們已經完成了cytof數據處理的主要步驟,讀入文件,質量控制,降維聚類分群,生物學注釋和細胞亞群比例差異分析。目錄如下:
其實跟純粹的單細胞轉錄組就非常類似了,不過單細胞轉錄組數據分析的細節以及背景我就不贅述了,看我在《單細胞天地》的單細胞基礎10講:
以及各式各樣的個性化匯總教程,差不多就明白了。
基本流程走完了,僅僅是萬裡長徵第一步而已。我們可以開始嘗試分析一些文獻的公共數據集啦,不過在處理那些數據的過程中,我們還需要傳授給大家幾個小技巧。
首先取出亞群前面的細胞亞群生物學命名的時候,我們就講解過,根據指定抗體信號強度,可以取子集,代碼如下:
rm(list = ls())
suppressPackageStartupMessages(library(flowCore))
suppressPackageStartupMessages(library(diffcyt))
suppressPackageStartupMessages(library(HDCytoData))
require(cytofWorkflow)
load(file = 'K20_output_of_cytofWorkflow.Rdata')
sce@metadata
p=plotExprHeatmap(sce, features = "type",
by = "cluster_id", k = "meta20",
row_clust = F,
bars = TRUE, perc = TRUE)
p
p@matrix
kp=p@matrix > 0.5
colnames(kp)
k1=which(kp[,"CD45"] & kp[,"CD3"] & kp[,"CD4"] );k1 # CD4
sce
som=sce@metadata$cluster_codes
table(colData(sce)$cluster_id)
choose_som=som[som$meta20 %in% k1,1]
choose_som這樣取出來了就是可能的CD4的T細胞啦,有了這個choose_som後面就可以獲取。
不知道大家是否還記得Seurat分析流程裡面的單細胞對象取子集呢?其實本質上都是考驗大家對R裡面的S4對象的理解而已。
然後重新組合成為新的SingleCellExperiment對象SingleCellExperiment對象的認識非常重要,重點就是:
different quantifications (e.g., counts, CPMs, log-expression) can be stored simultaneously in the assays slot,row and column metadata can be attached using rowData and colData, respectively.全部的代碼如下;
sce
ct=assay(sce,i = 1)[,colData(sce)$cluster_id %in% choose_som]
ex=assay(sce,i = 2)[,colData(sce)$cluster_id %in% choose_som]
phe=colData(sce)[colData(sce)$cluster_id %in% choose_som,]
geneinfor=rowData(sce)
sce <- SingleCellExperiment(list(counts=ct,exprs=ex),
colData=phe,
rowData=geneinfor
)
sce新組建的SingleCellExperiment對象就可以繼續走聚類分群的步驟啦。而且很容易建議,代碼如下:
p <- plotExprs(sce, color_by = "condition")
p$facet$params$ncol <- 3
p可以看到,CD4都是有表達量的:
同樣的聚類分群代跟之前一樣,示例代碼如下:
sce
pro='Cd4-Tcells_sub'
p <- plotExprs(sce, color_by = "condition")
p$facet$params$ncol <- 3
p
ggsave2(filename = paste0(pro,'_all_markers_density.pdf'),height = 11)
n_cells(sce)
plotCounts(sce, group_by = "sample_id", color_by = "condition")
ggsave2(filename = paste0(pro,'_plotCounts.pdf'))
# 10 lineage markers and 14 functional markers across all cells measured for each sample in the PBMC dataset
pdf( paste0(pro,'_plotExprHeatmap.pdf'))
plotExprHeatmap(sce, scale = "last",
hm_pal = rev(hcl.colors(10, "YlGnBu")))
dev.off()
# PCA-based non-redundancy score (NRS)
plotNRS(sce, features = "type", color_by = "condition")
ggsave2(filename = paste0(pro,'_plotNRS.pdf'))
set.seed(1234)
# 通常建議第一次分群,細一點要好,後面可以人工合併
# We call ConsensusClusterPlus() with maximum number of clusters maxK = 20.
sce <- cluster(sce, features = "type",
xdim = 10, ydim = 10, maxK = 10, seed = 1234)
pdf(paste0(pro,'_cluster_plotExprHeatmap_row_clust_F.pdf'))
plotExprHeatmap(sce, features = "type",
by = "cluster_id", k = "meta10",
row_clust = F,
bars = TRUE, perc = TRUE)
dev.off()
pdf(paste0(pro,'_plotClusterExprs.pdf'))
plotClusterExprs(sce, k = "meta10", features = "type")
dev.off()
pdf(paste0(pro,'_cluster_plotMultiHeatmap.pdf'))
plotMultiHeatmap(sce,
hm1 = "type", k = "meta10",
row_anno = FALSE, bars = TRUE, perc = TRUE)
dev.off()
# 節約計算資源
# run t-SNE/UMAP on at most 500/1000 cells per sample
set.seed(1234)
sce <- runDR(sce, "TSNE", cells = 1e4, features = "type")
sce <- runDR(sce, "UMAP", cells = 1e4, features = "type")
# plotDR(sce, "UMAP", color_by = "CD4")
library(ggplot2)
p1 <- plotDR(sce, "TSNE", color_by = "meta10") +
theme(legend.position = "none")
p2 <- plotDR(sce, "UMAP", color_by = "meta10")
lgd <- get_legend(p2)
p2 <- p2 + theme(legend.position = "none")
plot_grid(p1, p2, lgd, nrow = 1, rel_widths = c(5, 5, 2))
ggsave2(filename = paste0(pro,'_umap_vs_tSNE.pdf'))
# facet by sample
plotDR(sce, "TSNE", color_by = "meta10", facet_by = "sample_id")
ggsave2(filename = paste0(pro,'_TSNE_by_samples.pdf'))
# facet by condition
plotDR(sce, "TSNE", color_by = "meta10", facet_by = "condition")
ggsave2(filename = paste0(pro,'_TSNE_by_condition.pdf'))
# facet by sample
plotDR(sce, "UMAP", color_by = "meta10", facet_by = "sample_id")
ggsave2(filename = paste0(pro,'_umap_by_samples.pdf'))
# facet by condition
plotDR(sce, "UMAP", color_by = "meta10", facet_by = "condition")
ggsave2(filename = paste0(pro,'_umap_by_condition.pdf'))
plotCodes(sce, k = "meta10")
pdf(paste0(pro,'_cluster_som100_plotMultiHeatmap.pdf'))
plotMultiHeatmap(sce,
hm1 = "type", k = "som100", m = "meta10",
row_anno = FALSE, col_anno = FALSE, bars = TRUE, perc = TRUE)
dev.off()
sce@metadata
plot_grid(labels = c("A", "B"),
plotDR(sce, "UMAP", color_by = "meta10"),
plotDR(sce, "UMAP", color_by = "meta8"))
plotAbundances(sce, k = "meta10", by = "sample_id")
ggsave2(filename = paste0(pro,'_plotAbundances_barplot.pdf'))
plotAbundances(sce, k = "meta10", by = "cluster_id", shape_by = "patient_id")
ggsave2(filename = paste0(pro,'_plotAbundances_boxplot.pdf'))
save(sce,file = paste0(pro,'k10_output_of_cytofWorkflow.Rdata'))可以看到初步分成的10個群,確實都有不一樣的地方,如果生物學背景足夠,就可以給出解釋。