ggplot2隻能處理 data.frame數據,每列作為一個變量,是一個指標.
以ToothGrowth數據為例,進行處理data("ToothGrowth")
tg <- ToothGrowth
head(tg)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
數據預處理
採用summarySE()(函數定義在本文末尾)對數據進行預處理,計算數據的 標準誤差(Standard Error).
標準誤差區別於標準差(Standard Deviation) . SE為 standard error of the mean
可以參考: http://blog.csdn.net/tanzuozhev/article/details/50830928
# summarySE 計算標準差和標準誤差以及95%的置信區間.
library(ggplot2)
tgc <- summarySE(tg, measurevar="len", groupvars=c("supp","dose"))
tgc
## supp dose N len sd se ci
## 1 OJ 0.5 10 13.23 4.459709 1.4102837 3.190283
## 2 OJ 1.0 10 22.70 3.910953 1.2367520 2.797727
## 3 OJ 2.0 10 26.06 2.655058 0.8396031 1.899314
## 4 VC 0.5 10 7.98 2.746634 0.8685620 1.964824
## 5 VC 1.0 10 16.77 2.515309 0.7954104 1.799343
## 6 VC 2.0 10 26.14 4.797731 1.5171757 3.432090
繪製帶有誤差線和95%置信區間線的折線圖和點圖
# 帶有標準誤差線的折線圖
# Standard error of the mean
ggplot(tgc, aes(x=dose, y=len, colour=supp)) +
geom_errorbar(aes(ymin=len-se, ymax=len+se), width=.1) +
geom_line() +
geom_point()
# 對重疊的點,進行偏移處理(儘管這樣可以將點分開便於觀看,但是個人認為這並不科學)
pd <- position_dodge(0.1) # move them .05 to the left and right
ggplot(tgc, aes(x=dose, y=len, colour=supp)) +
geom_errorbar(aes(ymin=len-se, ymax=len+se), width=.1, position=pd) +
geom_line(position=pd) +
geom_point(position=pd)
## ymax not defined: adjusting position using y instead
## ymax not defined: adjusting position using y instead
# 繪製帶有95%置信區間的折線圖
ggplot(tgc, aes(x=dose, y=len, colour=supp)) +
geom_errorbar(aes(ymin=len-ci, ymax=len+ci), width=.1, position=pd) +
geom_line(position=pd) +
geom_point(position=pd)
## ymax not defined: adjusting position using y instead
## ymax not defined: adjusting position using y instead
# 設置誤差線的顏色,特別注意如果沒有 group=supp,這個重合的誤差線將不會偏移.
ggplot(tgc, aes(x=dose, y=len, colour=supp, group=supp)) +
geom_errorbar(aes(ymin=len-ci, ymax=len+ci), colour="black", width=.1, position=pd) +
geom_line(position=pd) +
geom_point(position=pd, size=3)
## ymax not defined: adjusting position using y instead
## ymax not defined: adjusting position using y instead
下面是一個完整的帶有標準誤差線的圖,geom_point 放在 geom_line之後,可以保證點被最後繪製,填充為白色.
ggplot(tgc, aes(x=dose, y=len, colour=supp, group=supp)) +
geom_errorbar(aes(ymin=len-se, ymax=len+se), colour="black", width=.1, position=pd) +
geom_line(position=pd) +
geom_point(position=pd, size=3, shape=21, fill="white") + # 21 is filled circle
xlab("Dose (mg)") +
ylab("Tooth length") +
scale_colour_hue(name="Supplement type", # Legend label, use darker colors
breaks=c("OJ", "VC"),
labels=c("Orange juice", "Ascorbic acid"),l=40) + # Use darker colors, lightness=40
ggtitle("The Effect of Vitamin C on\nTooth Growth in Guinea Pigs") +
expand_limits(y=0) + # Expand y range
scale_y_continuous(breaks=0:20*4) + # Set tick every 4
theme_bw() +
theme(legend.justification=c(1,0),# 這一項很關鍵,如果沒有這個參數,圖例會偏移,讀者可以試一試
legend.position=c(1,0)) # Position legend in bottom right
## ymax not defined: adjusting position using y instead
## ymax not defined: adjusting position using y instead
條形圖
繪製條形圖與繪製折線圖類似,但是必須要注意的是tgc$size必須被設置成 factor 類型,如果它是 數值型向量,那麼將會出現錯誤. > 這是因為dose如果是 數值型向量將會作為連續型數據進行處理,而 因子型 變量被作為離散型數據進行處理.
# 轉換為因子類型
tgc2 <- tgc
tgc2$dose <- factor(tgc2$dose)
# Error bars represent standard error of the mean
ggplot(tgc2, aes(x=dose, y=len, fill=supp)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=len-se, ymax=len+se),
width=.2, # 設置誤差線的寬度
position=position_dodge(.9))
# 使用95%置信區間
ggplot(tgc2, aes(x=dose, y=len, fill=supp)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=len-ci, ymax=len+ci),
width=.2, # Width of the error bars
position=position_dodge(.9))
完整的條形圖
ggplot(tgc2, aes(x=dose, y=len, fill=supp)) +
geom_bar(position=position_dodge(), stat="identity",
colour="black", # Use black outlines,
size=.3) + # Thinner lines
geom_errorbar(aes(ymin=len-se, ymax=len+se),
size=.3, # Thinner lines
width=.2,
position=position_dodge(.9)) +
xlab("Dose (mg)") +
ylab("Tooth length") +
scale_fill_hue(name="Supplement type", # Legend label, use darker colors
breaks=c("OJ", "VC"),
labels=c("Orange juice", "Ascorbic acid")) +
ggtitle("The Effect of Vitamin C on\nTooth Growth in Guinea Pigs") +
scale_y_continuous(breaks=0:20*4) +
theme_bw()