首先看看需要什麼樣的命令才能有loom文件:
Usage: velocyto run10x [OPTIONS] SAMPLEFOLDER GTFFILE
Runs the velocity analysis for a Chromium 10X Sample
10XSAMPLEFOLDER specifies the cellranger sample folder
GTFFILE genome annotation file
Options:
-s, --metadatatable FILE Table containing metadata of the various samples (csv fortmated rows are samples and cols are entries)
-m, --mask FILE .gtf file containing intervals to mask
-l, --logic TEXT The logic to use for the filtering (default: Default)
-M, --multimap Consider not unique mappings (not reccomended)
-@, --samtools-threads INTEGER The number of threads to use to sort the bam by cellID file using samtools
--samtools-memory INTEGER The number of MB used for every thread by samtools to sort the bam file
-t, --dtype TEXT The dtype of the loom file layers - if more than 6000 molecules/reads per gene per cell are expected set uint32 to avoid truncation (default run_10x: uint16)
-d, --dump TEXT For debugging purposes only: it will dump a molecular mapping report to hdf5. --dump N, saves a cell every N cells. If p is prepended a more complete (but huge) pickle report is printed (default: 0)
-v, --verbose Set the vebosity level: -v (only warinings) -vv (warinings and info) -vvv (warinings, info and debug)
--help Show this message and exit.
CLI Usage Guide — velocyto 0.17.16 documentation
http://velocyto.org/velocyto.py/tutorial/cli.html
我一開始還以為自己成功跑上了,但是很快就出錯了:
通過搜索,發現可能是自己下載文件出了問題:
Need preparation data to run velocyto: samtools version, bam file, repeat_msk.gtf · Issue #112 · velocyto-team/velocyto.py
https://github.com/velocyto-team/velocyto.py/issues/112
Table Browser
https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=611454127_NtvlaW6xBSIRYJEBI0iRDEWisITa&clade=mammal&org=Mouse&db=mm10&hgta_group=allTracks&hgta_track=xenoRefGene&hgta_table=0&hgta_regionType=genome&position=chr12%3A56%2C694%2C976-56%2C714%2C605&hgta_outputType=gff&hgta_outFileName=hg38_repeats_repeatMasker_allTracks.gtf
下載之後解壓到伺服器,重新跑,就成功了喲!
幾個小時之後還是沒有報錯!
這是個好兆頭~。
velocyto run10x -m /data/yudonglin/velocity/mm10_rmsk.gtf /data/yudonglin/singlecell/ABFC20190816-04-分析結果/NoPro /data/yudonglin/velocity/refdata-gex-mm10-2020-A/genes/genes.gtf
最終得到loom文件!
然後又遇到問題:
export OPENBLAS_NUM_THREADS=1「生信Debug」OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable_xuzhougeng blog-CSDN博客
https://blog.csdn.net/u012110870/article/details/106115577
最終成功!
> library(velocyto.R)Loading required package: Matrix> library(pagoda2)Loading required package: igraph
Attaching package: 『igraph』
The following objects are masked from 『package:stats』:
decompose, spectrum
The following object is masked from 『package:base』:
union
Attaching package: 『pagoda2』
The following object is masked from 『package:velocyto.R』:
armaCor
> ldat <- read.loom.matrices("/data/yudonglin/singlecell/ABFC20190816-04-分析結果/NoPro/velocyto/NoPro.loom")reading loom file via hdf5r...
> emat <- ldat$spliced> hist(log10(colSums(emat)),col='wheat',xlab='cell size')> > > emat <- emat[,colSums(emat)>=1e3] > emat <- emat[!duplicated(rownames(emat)),]> > r <- Pagoda2$new(emat, modelType = "plain", trim = 10, log.scale = TRUE)8158cells,32245genes; normalizing ... using plain model winsorizing ... log scale ... done.
> r$adjustVariance(plot = TRUE, do.par = TRUE, gam.k = 10)calculating variance fit ... using gam 2271overdispersed genes ...2271persisting ... done.
> r$calculatePcaReduction(nPcs = 100, n.odgenes = 3e3, maxit = 300)running PCA using 3000 OD genes .... done
> r$makeKnnGraph(k = 30, type = "PCA", center = TRUE,distance = "cosine")creating space of type angular doneadding data ... donebuilding index ... donequerying ... done> r$getKnnClusters(method = multilevel.community, type = "PCA",name = "multilevel")w.legend = FALSE, mark.clusters = TRUE, min.group.size = 10, shuffle.colors = FALSE, mark.cluster.cex = 1, alpha = 0.3, main = "cell clusters")r$plotEmbedding(type = "PCA", embeddingType = "tSNE", colors = r$depth, main = "depth")> r$getEmbedding(type = "PCA", embeddingType = "tSNE",+ perplexity = 50,verbose = FALSE)calculating distance ... pearson ...
running tSNE using56cores:
> r$plotEmbedding(type = "PCA", embeddingType = "tSNE",+ show.legend = FALSE, mark.clusters = TRUE,+ min.group.size = 10, shuffle.colors = FALSE,+ mark.cluster.cex = 1, alpha = 0.3, main = "cell clusters")Warning message:Ignoring unknown parameters: mark.clusters, min.group.size, mark.cluster.cex, main > r$plotEmbedding(type = "PCA", embeddingType = "tSNE", colors = r$depth, main = "depth")> emat <- ldat$spliced> nmat <- ldat$unspliced> emat <- emat[,rownames(r$counts)]nes.by.cluster.expression(emat, cluster.label, min.max.cluster.average = 0.2)nmat <- filter.genes.by.cluster.expression(nmat, cluster.label, min.max.cluster.average = 0.05)length(intersect(rownames(emat), rownames(nmat)))> nmat <- nmat[,rownames(r$counts)]> > cluster.label <- r$clusters$PCA$multilevel > emb <- r$embeddings$PCA$tSNE> cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))> emat <- filter.genes.by.cluster.expression(emat, cluster.label,+ min.max.cluster.average = 0.2)> nmat <- filter.genes.by.cluster.expression(nmat, cluster.label,+ min.max.cluster.average = 0.05)> length(intersect(rownames(emat), rownames(nmat)))[1] 7526> fit.quantile <- 0.02> rvel.cd <- gene.relative.velocity.estimates(emat, nmat,deltaT = 1,+ kCells = 25, cell.dist = cell.dist,+ fit.quantile = fit.quantile)calculating cell knn ... donecalculating convolved matrices ... donefitting gamma coefficients ... done. succesfful fit for 7526 genesfiltered out 15 out of 7526 genes due to low nmat-emat correlationfiltered out 562 out of 7511 genes due to low nmat-emat slopecalculating RNA velocity shift ... donecalculating extrapolated cell state ... done> show.velocity.on.embedding.cor(emb,rvel.cd,n=300,scale='sqrt',cell.colors=ac(cell.colors,alpha=0.5),cex=0.8,arrow.scale=5,show.grid.flow=TRUE,min.grid.cell.mass=0.5,grid.n=40,arrow.lwd=1,do.par=F,cell.border.alpha = 0.1)Error in as.character(col) %in% "0" : object 'cell.colors' not found> cell.colors <- sccore::fac2col(cluster.label)> show.velocity.on.embedding.cor(emb, rvel.cd, n = 200, scale = 'sqrt',+ cell.colors = ac(cell.colors, alpha = 0.5),+ cex = 0.8, arrow.scale = 3, show.grid.flow = TRUE,+ min.grid.cell.mass = 0.5, grid.n = 40,+ arrow.lwd = 1,do.par = TRUE, cell.border.alpha = 0.1)delta projections ... sqrt knn ... transition probs ... donecalculating arrows ... donegrid estimates ... grid.sd= 1.327798 min.arrow.size= 0.02655596 max.grid.arrow.length= 0.0536128 done> > gene.relative.velocity.estimates(emat, nmat, deltaT = 1, kCells = 25,+ kGenes = 1, fit.quantile = fit.quantile,+ cell.emb = emb, cell.colors = cell.colors,+ cell.dist = cell.dist, show.gene = "GAPDH",+ old.fit = rvel.cd, do.par = TRUE)Error in gene.relative.velocity.estimates(emat, nmat, deltaT = 1, kCells = 25, : gene GAPDH is not present in the filtered expression matrices> dev.off()null device 1 > gene.relative.velocity.estimates(emat, nmat, deltaT = 1, kCells = 25,+ kGenes = 1, fit.quantile = fit.quantile,+ cell.emb = emb, cell.colors = cell.colors,+ cell.dist = cell.dist, show.gene = "Gapdh",+ old.fit = rvel.cd, do.par = TRUE)calculating convolved matrices ... done[1] 1> dev.off()null device 1 >
成功!歐耶!
重複一下結果還不一樣耶,這個有點坑爹哇!
接下來就是如何把這個結果展示在我們自己的tsne圖或者umap圖上。
在原有的UMAP上標註RNA velocity - 雲+社區 - 騰訊雲
https://cloud.tencent.com/developer/article/1620345
當然,還是不出所料有問題,然後就去解決。
(重點是讓行名即barcode一致)
library(velocyto.R)library(pagoda2)YDL<- readRDS("YDL_NONPRO.rds")ldat <- read.loom.matrices(file = "/data/yudonglin/singlecell/ABFC20190816-04-分析結果/NoPro/velocyto/NoPro.loom")emat <- ldat$splicednmat <- ldat$unsplicedseurat.object<-YDLemb <- seurat.object@reductions$umap@cell.embeddingscell.dist <- as.dist(1-armaCor(t(seurat.object@reductions$umap@cell.embeddings)))head(colnames(emat))head(colnames(nmat))colnames(emat) <- paste(substring(colnames(emat),7,22),"-1",sep="")colnames(nmat) <- paste(substring(colnames(nmat),7,22),"-1",sep="")head(colnames(emat))head(colnames(nmat))fit.quantile <- 0.02
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2, kCells=10, cell.dist=cell.dist, fit.quantile=fit.quantile, n.cores=24)library("Seurat")library("ggplot2")
gg <- UMAPPlot(seurat.object)ggplot_build(gg)$datacolors <- as.list(ggplot_build(gg)$data[[1]]$colour)names(colors) <- rownames(emb)p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt', cell.colors=ac(colors,alpha=0.5), cex=0.8,arrow.scale=2,show.grid.flow=T, min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1, do.par=F,cell.border.alpha = 0.1, n.cores=24,main="RNA Velocity")原來的umap圖如下: