有粉絲求助,她做850K甲基化晶片數據處理的時候,使用champ流程,然後報錯非常詭異,是Failed CpG Fraction,各種查資料都無法解決,我讓她複製粘貼報錯的關鍵信息,如下:
[ Section 3: Use Annotation Start ]
Reading 850K Annotation >>
Fetching NEGATIVE ControlProbe.
Totally, there are 613 control probes in Annotation.
Your data set contains 556 control probes.
Generating Meth and UnMeth Matrix
Extracting Meth Matrix...
Totally there are 485512 Meth probes in 850K Annotation.
Your data set contains 485512 Meth probes.
Extracting UnMeth Matrix...
Totally there are 485512 UnMeth probes in 850K Annotation.
Your data set contains 485512 UnMeth probes.
Generating beta Matrix
Generating M Matrix
Generating intensity Matrix
Calculating Detect P value
Counting Beads
[ Section 3: Use Annotation Done ]
---
中間省略
---
[ Section 2: Filtering Start >>
Filtering Detect P value Start
The fraction of failed positions per sample
You may need to delete samples with high proportion of failed probes:
Failed CpG Fraction.
sample1 NaN
sample2 NaN
---後面省略一些樣本
Error in if (any(numfail >= SampleCutoff)) { :
missing value where TRUE/FALSE needed也幫忙去各種檢索,但確實沒有好的解決方案,就讓她發過來2個G的原始數據和代碼,認真檢查了好久,看起來就是我的教程的代碼,一模一樣啊!
myLoad <- champ.load("raw/",arraytype="850K")而且我看了她關於"raw/"文件夾下的idat文件,以及製作好的'raw/sample_sheet.csv'文件,都是合格的。沒辦法,我只好看champ.load函數的幫助文檔了:
champ.load(directory = getwd(),
method="ChAMP",
methValue="B",
autoimpute=TRUE,
filterDetP=TRUE,
ProbeCutoff=0,
SampleCutoff=0.1,
detPcut=0.01,
filterBeads=TRUE,
beadCutoff=0.05,
filterNoCG=TRUE,
filterSNPs=TRUE,
population=NULL,
filterMultiHit=TRUE,
filterXY=TRUE,
force=FALSE,
arraytype="450K")剛開始一直看不出問題所在,但是最後注意到了:
arraytype 這個參數的選擇是:
Choose microarray type is "450K" or "EPIC".(default = "450K")也就是說,沒有850K這個選項,有意思,於是我修改了代碼,如下:
#myLoad <- champ.load("raw/",arraytype="850K")
myLoad <- champ.load("raw/",arraytype="EPIC")確實解決了這個報錯,成功運行champ流程,載入idat文件後的日誌如下:
Filtering probes with a detection p-value above 0.01.
Removing 3813 probes.
If a large number of probes have been removed, ChAMP suggests you to identify potentially bad samples
Filtering BeadCount Start
Filtering probes with a beadcount <3 in at least 5% of samples.
Removing 22027 probes
Filtering NoCG Start
Only Keep CpGs, removing 2889 probes from the analysis.
Filtering SNPs Start
Using general EPIC SNP list for filtering.
Filtering probes with SNPs as identified in Zhou's Nucleic Acids Research Paper 2016.
Removing 95451 probes from the analysis.
Filtering MultiHit Start
Filtering probes that align to multiple locations as identified in Nordlund et al
Removing 11 probes from the analysis.
Filtering XY Start
Filtering probes located on X,Y chromosome, removing 16655 probes from the analysis.
Updating PD file
Fixing Outliers Start
Replacing all value smaller/equal to 0 with smallest positive value.
Replacing all value greater/equal to 1 with largest value below 1..
[ Section 2: Filtering Done ]
All filterings are Done, now you have 725072 probes and 24 samples.很有意思哦,850K甲基化晶片和EPIC的差異是?我明明是在各種教程及文檔,看到850K甲基化晶片和EPIC是同一個晶片的不同表述而已:
Illumina公司提供了一個更強大的甲基化分析平臺:Illumina InfiniumMethylationEPIC BeadChip (DNA甲基化850K晶片),不但包含了原450K晶片90%以上的位點,並額外增加了增強子區的350,000個位點,可以對正常樣本和FFPE樣本單個CpG位點進行定量甲基化檢測,該晶片是目前最適合甲基化圖譜分析研究的全基因組DNA甲基化晶片。850K晶片覆蓋了全基因組853,307個CpG位點,全面覆蓋CpG島、啟動子、編碼區及增強子。覆蓋CpG島、RefSeq基因、ENCODE開放染色質、ENCODE轉錄因子結合位點、FANTOM5增強子區域。這就是很神奇了,但我又不是公司客服,懶得去探索了。
甲基化教程目錄甲基化晶片視頻課程免費在B站《甲基化晶片(450K或者850K)數據處理 》
教學視頻免費在:https://www.bilibili.com/video/BV177411U7oj
課程配套思維導圖:https://mubu.com/doc/1cwlFgcXMg
甲基化晶片相關資料,https://share.weiyun.com/42a9e78c2dd5367f3427e86c5c99baa1 按需下載,不要整個文件夾全部下載
表觀全部資料,https://share.weiyun.com/5tg6pIn 同樣是按需下載,不要整個文件夾全部下載
當然了,如果你完全看不懂這些,說明你可能需要手把手教學,考慮一下生信技能樹官方入門學習班哦!
文末友情推薦要想真正入門生物信息學建議務必購買全套書籍,一點一滴攻克計算機基礎知識,書單在:什麼,生信入門全套書籍僅需160 。如果大家沒有時間自行慢慢摸索著學習,可以考慮我們生信技能樹官方舉辦的學習班:
如果你課題涉及到轉錄組,歡迎添加一對一客服:詳見:你還在花三五萬做一個單細胞轉錄組嗎?
號外:生信技能樹知識整理實習生招募,長期招募,也可以簡單參與軟體測評筆記撰寫,開啟你的分享人生!另外,:絕大部分生信技能樹粉絲都沒有機會加我微信,已經多次滿了5000好友,所以我開通了一個微信好友,前100名添加我,僅需150元即可,3折優惠期機會不容錯過哈。我的微信小號二維碼在:0元,10小時教學視頻直播《跟著百度李彥宏學習腫瘤基因組測序數據分析》