7月21日R語言學習筆記——tidyr一、數據清理扁變長長變扁二、分割和合併原始數據分割合併三、處理NA原始數據1.去掉含有NA的行,可以選擇只根據某一列來去除2.替換NA3.用上一行的值填充NA
一、數據清理rm(list = ls())
options(stringsAsFactors = F)
if(!require(tidyr))install.packages("tidyr")## Loading required package: tidyrlibrary(tidyr)
test <- data.frame(geneid = paste0("gene",1:4),
sample1 = c(1,4,7,10),
sample2 = c(2,5,0.8,11),
sample3 = c(0.3,6,9,12))
test
## geneid sample1 sample2 sample3
## 1 gene1 1 2.0 0.3
## 2 gene2 4 5.0 6.0
## 3 gene3 7 0.8 9.0
## 4 gene4 10 11.0 12.0扁變長test_gather <- gather(data = test,
key = sample_nm,
value = exp,
- geneid)
#key指哪一列形成列名
#value指哪一列為內容
head(test_gather)
## geneid sample_nm exp
## 1 gene1 sample1 1
## 2 gene2 sample1 4
## 3 gene3 sample1 7
## 4 gene4 sample1 10
## 5 gene1 sample2 2
## 6 gene2 sample2 5
長變扁test_re <- spread(data = test_gather,
key = sample_nm,
value = exp)
head(test_re)
## geneid sample1 sample2 sample3
## 1 gene1 1 2.0 0.3
## 2 gene2 4 5.0 6.0
## 3 gene3 7 0.8 9.0
## 4 gene4 10 11.0 12.0
二、分割和合併原始數據test <- data.frame(x = c( "a,b", "a,d", "b,c"));test
## x
## 1 a,b
## 2 a,d
## 3 b,c分割test_seprate <- separate(test,x, c("X", "Y"),sep = ",");test_seprate
## X Y
## 1 a b
## 2 a d
## 3 b c
#x 要分的列的列名
#分成的兩列的列名 c("X", "Y")
#sep = "," 分隔符是什麼合併test_re <- unite(test_seprate,"x",X,Y,sep = ",");test_re
## x
## 1 a,b
## 2 a,d
## 3 b,c
三、處理NA原始數據X<-data.frame(X1 = LETTERS[1:5],X2 = 1:5)
X[2,2] <- NA
X[4,1] <- NA;X
## X1 X2
## 1 A 1
## 2 B NA
## 3 C 3
## 4 <NA> 4
## 5 E 5
1.去掉含有NA的行,可以選擇只根據某一列來去除na.omit(X)#去除所有含有缺失值的行
> X1 X2
> 1 A 1
> 3 C 3
> 5 E 5
drop_na(X)
## X1 X2
## 1 A 1
## 2 C 3
## 3 E 5drop_na(X,X1)
## X1 X2
## 1 A 1
## 2 B NA
## 3 C 3
## 4 E 5drop_na(X,X2)
## X1 X2
## 1 A 1
## 2 C 3
## 3 <NA> 4
## 4 E 5
2.替換NAreplace_na(X$X2,0)#替換第二列的缺失值為0
## [1] 1 0 3 4 5
3.用上一行的值填充NAX
## X1 X2
## 1 A 1
## 2 B NA
## 3 C 3
## 4 <NA> 4
## 5 E 5fill(X,X2)
## X1 X2
## 1 A 1
## 2 B 1
## 3 C 3
## 4 <NA> 4
## 5 E 5