10x轉錄組數據進行RNA velocity分析

2021-02-20 東林的扯淡小屋

首先看看需要什麼樣的命令才能有loom文件：

Usage: velocyto run10x [OPTIONS] SAMPLEFOLDER GTFFILE

Runs the velocity analysis for a Chromium 10X Sample

10XSAMPLEFOLDER specifies the cellranger sample folder

GTFFILE genome annotation file

Options:

-s, --metadatatable FILE Table containing metadata of the various samples (csv fortmated rows are samples and cols are entries)

-m, --mask FILE .gtf file containing intervals to mask

-l, --logic TEXT The logic to use for the filtering (default: Default)

-M, --multimap Consider not unique mappings (not reccomended)

-@, --samtools-threads INTEGER The number of threads to use to sort the bam by cellID file using samtools

--samtools-memory INTEGER The number of MB used for every thread by samtools to sort the bam file

-t, --dtype TEXT The dtype of the loom file layers - if more than 6000 molecules/reads per gene per cell are expected set uint32 to avoid truncation (default run_10x: uint16)

-d, --dump TEXT For debugging purposes only: it will dump a molecular mapping report to hdf5. --dump N, saves a cell every N cells. If p is prepended a more complete (but huge) pickle report is printed (default: 0)

-v, --verbose Set the vebosity level: -v (only warinings) -vv (warinings and info) -vvv (warinings, info and debug)

--help Show this message and exit.

CLI Usage Guide — velocyto 0.17.16 documentation

http://velocyto.org/velocyto.py/tutorial/cli.html

我一開始還以為自己成功跑上了，但是很快就出錯了：

通過搜索，發現可能是自己下載文件出了問題：

Need preparation data to run velocyto: samtools version, bam file, repeat_msk.gtf · Issue #112 · velocyto-team/velocyto.py

https://github.com/velocyto-team/velocyto.py/issues/112

Table Browser

https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=611454127_NtvlaW6xBSIRYJEBI0iRDEWisITa&clade=mammal&org=Mouse&db=mm10&hgta_group=allTracks&hgta_track=xenoRefGene&hgta_table=0&hgta_regionType=genome&position=chr12%3A56%2C694%2C976-56%2C714%2C605&hgta_outputType=gff&hgta_outFileName=hg38_repeats_repeatMasker_allTracks.gtf

下載之後解壓到伺服器，重新跑，就成功了喲！

幾個小時之後還是沒有報錯！

這是個好兆頭~。

velocyto run10x -m /data/yudonglin/velocity/mm10_rmsk.gtf /data/yudonglin/singlecell/ABFC20190816-04-分析結果/NoPro /data/yudonglin/velocity/refdata-gex-mm10-2020-A/genes/genes.gtf
最終得到loom文件！
然後又遇到問題：
export OPENBLAS_NUM_THREADS=1
「生信Debug」OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable_xuzhougeng blog-CSDN博客
https://blog.csdn.net/u012110870/article/details/106115577
最終成功！
> library(velocyto.R)Loading required package: Matrix> library(pagoda2)Loading required package: igraph
Attaching package: 『igraph』
The following objects are masked from 『package:stats』:
    decompose, spectrum
The following object is masked from 『package:base』:
    union

Attaching package: 『pagoda2』
The following object is masked from 『package:velocyto.R』:
    armaCor
> ldat <- read.loom.matrices("/data/yudonglin/singlecell/ABFC20190816-04-分析結果/NoPro/velocyto/NoPro.loom")reading loom file via hdf5r...
> emat <- ldat$spliced> hist(log10(colSums(emat)),col='wheat',xlab='cell size')> > > emat <- emat[,colSums(emat)>=1e3]   > emat <- emat[!duplicated(rownames(emat)),]> > r <- Pagoda2$new(emat, modelType = "plain", trim = 10, log.scale = TRUE)8158cells,32245genes; normalizing ... using plain model winsorizing ... log scale ... done.
> r$adjustVariance(plot = TRUE, do.par = TRUE, gam.k = 10)calculating variance fit ... using gam 2271overdispersed genes ...2271persisting ... done.
> r$calculatePcaReduction(nPcs = 100, n.odgenes = 3e3, maxit = 300)running PCA using 3000 OD genes .... done
> r$makeKnnGraph(k = 30, type = "PCA", center = TRUE,distance = "cosine")creating space of type angular doneadding data ... donebuilding index ... donequerying ... done> r$getKnnClusters(method = multilevel.community, type = "PCA",name = "multilevel")w.legend = FALSE, mark.clusters = TRUE,                min.group.size = 10, shuffle.colors = FALSE,                mark.cluster.cex = 1, alpha = 0.3, main = "cell clusters")r$plotEmbedding(type = "PCA", embeddingType = "tSNE", colors = r$depth, main = "depth")> r$getEmbedding(type = "PCA", embeddingType = "tSNE",+                perplexity = 50,verbose = FALSE)calculating distance ... pearson ...
running tSNE using56cores:
> r$plotEmbedding(type = "PCA", embeddingType = "tSNE",+                 show.legend = FALSE, mark.clusters = TRUE,+                 min.group.size = 10, shuffle.colors = FALSE,+                 mark.cluster.cex = 1, alpha = 0.3, main = "cell clusters")Warning message:Ignoring unknown parameters: mark.clusters, min.group.size, mark.cluster.cex, main > r$plotEmbedding(type = "PCA", embeddingType = "tSNE", colors = r$depth, main = "depth")> emat <- ldat$spliced> nmat <- ldat$unspliced> emat <- emat[,rownames(r$counts)]nes.by.cluster.expression(emat, cluster.label,                                           min.max.cluster.average = 0.2)nmat <- filter.genes.by.cluster.expression(nmat, cluster.label,                                           min.max.cluster.average = 0.05)length(intersect(rownames(emat), rownames(nmat)))> nmat <- nmat[,rownames(r$counts)]> > cluster.label <- r$clusters$PCA$multilevel > emb <- r$embeddings$PCA$tSNE> cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))> emat <- filter.genes.by.cluster.expression(emat, cluster.label,+                                            min.max.cluster.average = 0.2)> nmat <- filter.genes.by.cluster.expression(nmat, cluster.label,+                                            min.max.cluster.average = 0.05)> length(intersect(rownames(emat), rownames(nmat)))[1] 7526> fit.quantile <- 0.02> rvel.cd <- gene.relative.velocity.estimates(emat, nmat,deltaT = 1,+                                             kCells = 25, cell.dist = cell.dist,+                                             fit.quantile = fit.quantile)calculating cell knn ... donecalculating convolved matrices ... donefitting gamma coefficients ... done. succesfful fit for 7526 genesfiltered out 15 out of 7526 genes due to low nmat-emat correlationfiltered out 562 out of 7511 genes due to low nmat-emat slopecalculating RNA velocity shift ... donecalculating extrapolated cell state ... done> show.velocity.on.embedding.cor(emb,rvel.cd,n=300,scale='sqrt',cell.colors=ac(cell.colors,alpha=0.5),cex=0.8,arrow.scale=5,show.grid.flow=TRUE,min.grid.cell.mass=0.5,grid.n=40,arrow.lwd=1,do.par=F,cell.border.alpha = 0.1)Error in as.character(col) %in% "0" : object 'cell.colors' not found> cell.colors <- sccore::fac2col(cluster.label)> show.velocity.on.embedding.cor(emb, rvel.cd, n = 200, scale = 'sqrt',+                                cell.colors = ac(cell.colors, alpha = 0.5),+                                cex = 0.8, arrow.scale = 3, show.grid.flow = TRUE,+                                min.grid.cell.mass = 0.5, grid.n = 40,+                                arrow.lwd = 1,do.par = TRUE, cell.border.alpha = 0.1)delta projections ... sqrt knn ... transition probs ... donecalculating arrows ... donegrid estimates ... grid.sd= 1.327798  min.arrow.size= 0.02655596  max.grid.arrow.length= 0.0536128  done> > gene.relative.velocity.estimates(emat, nmat, deltaT = 1, kCells = 25,+                                  kGenes = 1, fit.quantile = fit.quantile,+                                  cell.emb = emb, cell.colors = cell.colors,+                                  cell.dist = cell.dist, show.gene = "GAPDH",+                                  old.fit = rvel.cd, do.par = TRUE)Error in gene.relative.velocity.estimates(emat, nmat, deltaT = 1, kCells = 25,  :   gene GAPDH is not present in the filtered expression matrices> dev.off()null device           1 > gene.relative.velocity.estimates(emat, nmat, deltaT = 1, kCells = 25,+                                  kGenes = 1, fit.quantile = fit.quantile,+                                  cell.emb = emb, cell.colors = cell.colors,+                                  cell.dist = cell.dist, show.gene = "Gapdh",+                                  old.fit = rvel.cd, do.par = TRUE)calculating convolved matrices ... done[1] 1> dev.off()null device           1 > 
成功！歐耶！
重複一下結果還不一樣耶，這個有點坑爹哇！
接下來就是如何把這個結果展示在我們自己的tsne圖或者umap圖上。
在原有的UMAP上標註RNA velocity - 雲+社區 - 騰訊雲
https://cloud.tencent.com/developer/article/1620345
當然，還是不出所料有問題，然後就去解決。
（重點是讓行名即barcode一致）
library(velocyto.R)library(pagoda2)YDL<- readRDS("YDL_NONPRO.rds")ldat <- read.loom.matrices(file = "/data/yudonglin/singlecell/ABFC20190816-04-分析結果/NoPro/velocyto/NoPro.loom")emat <- ldat$splicednmat <- ldat$unsplicedseurat.object<-YDLemb <- seurat.object@reductions$umap@cell.embeddingscell.dist <- as.dist(1-armaCor(t(seurat.object@reductions$umap@cell.embeddings)))head(colnames(emat))head(colnames(nmat))colnames(emat) <- paste(substring(colnames(emat),7,22),"-1",sep="")colnames(nmat) <- paste(substring(colnames(nmat),7,22),"-1",sep="")head(colnames(emat))head(colnames(nmat))fit.quantile <- 0.02
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,                                            kCells=10,                                            cell.dist=cell.dist,                                            fit.quantile=fit.quantile,                                            n.cores=24)library("Seurat")library("ggplot2")
gg <- UMAPPlot(seurat.object)ggplot_build(gg)$datacolors <- as.list(ggplot_build(gg)$data[[1]]$colour)names(colors) <- rownames(emb)p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',                                      cell.colors=ac(colors,alpha=0.5),                                      cex=0.8,arrow.scale=2,show.grid.flow=T,                                      min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,                                      do.par=F,cell.border.alpha = 0.1,                                      n.cores=24,main="RNA Velocity")
原來的umap圖如下：