假設檢驗
-T檢驗
-F檢驗
-卡方檢驗
-正太性檢驗
兩樣本的T檢驗
-有原始數據的獨立兩樣本T檢測
-有原始數據的配對T檢測
實例如下:
Wage 數據中大學學歷的收入和中學一樣嗎?
其中大學取4.Colleage Grad
初中取 2.HS Grad
獨立兩樣本T檢測
#取包> library(ISLR)#取數據集> data(Wage)#取大學的收入> x=subset(Wage,education=='4. College Grad',select='wage')#取中學的收入> y=subset(Wage,education=='2. HS Grad',select='wage')#paired=F表明兩樣本間獨立,默認為F#conf.level(置信水平)默認為0.95,alternative(對立假設)默認為小於> t.test(x,y,paired=F) Welch Two Sample t-testdata: x and yt = 15.727, df = 1134.9, p-value < 2.2e-16#對立假設:『大學學歷收入和中學學歷收入不一樣』概率不為0#對立假設默認為小於,即H1。alternative hypothesis: true difference in means is not equal to 0#95%置信水平落在[25.07104,32.21808]的置信區間內(H0)(0.05可能性 不落在置信區間內H1)95 percent confidence interval: 25.07104 32.21808sample estimates:mean of x mean of y 124.42791 95.78335
p值<0.05,故拒絕原假設H0,H1大學學歷收入和中學學歷收入不一樣正確。
配對T檢測
註:它要求要進行配對的兩樣本的參數一樣長。
> x <- c(2.41,2.90,2.75,2.23,3.67,4.49,5.16,5.45,2.06,1.64,1.06,0.77)> y <- c(2.80,3.04,1.88,3.43,3.81,4.00,4.44,5.41,1.24,1.83,1.45,0.92)> t.test(x,y,paired=T) Paired t-testdata: x and yt = 0.16232, df = 11, p-value = 0.874alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.3558501 0.4125167sample estimates:mean of the differences 0.02833333 >
p值0.874>0.05,不能拒絕原假設H0,說明不能認為兩組樣本結果不相同。
若小夥伴們想對T檢測進一步了解,請參考http://blog.csdn.net/tiaaaaa/article/details/58130363,這個大佬的博客講解的很詳細。
F檢驗觀察兩總體方差是否相等。
步驟:
-H0:方差相等
-H1 : 方差不等
> x <- c(2.41,2.90,2.75,2.23,3.67,4.49,5.16,5.45,2.06,1.64,1.06,0.77)> y <- c(2.80,3.04,1.88,3.43,3.81,4.00,4.44,5.41,1.24,1.83,1.45,0.92)> var.test(x,y) F test to compare two variancesdata: x and yF = 1.1683, num df = 11, denom df = 11, p-value = 0.801alternative hypothesis: true ratio of variances is not equal to 195 percent confidence interval: 0.336328 4.058331sample estimates:ratio of variances 1.168302
p值為0.801大於0.05,不能拒絕原假設,x,y兩總體方差不一定不相等。
擬合優度檢驗-卡方檢驗
實例:
消費者對五種品牌的喜好是否一樣?
> x=c(210,312,170,85,223)> expect=rep(sum(x)/5,times=5)> chi=sum((x-expect)^2/expect)> 1-pchisq(chi,df=4)[1] 0> chisq.test(x) Chi-squared test for given probabilitiesdata: xX-squared = 136.49, df = 4, p-value < 2.2e-16
p值小於0.05,拒絕原假設,即消費者對五種品牌的喜好不一樣
> set.seed(1)> x=rnorm(100)> y=table(cut(x,br=c(-3,-2,-1,0,1,2,3)))> y(-3,-2] (-2,-1] (-1,0] (0,1] (1,2] (2,3] 1 10 35 39 13 2 > p=pnorm(c(-3,-2,-1,0,1,2,3),mean(x),sd(x))> p[1] 0.0002688424 0.0094396510 0.1084958323 0.4517550229[5] 0.8394280836 0.9823738638 0.9993563302> #diff()用來獲取相鄰兩項的差> p=diff(p)*100> p=diff(p)*100> p[1] 0.9170809 9.9056181 34.3259191 38.7673061 14.2945780[6] 1.6982466> chisq.test(y,p) Pearson's Chi-squared testdata: y and pX-squared = 30, df = 25, p-value = 0.2243> ks.test(x,"pnorm",0,1) One-sample Kolmogorov-Smirnov testdata: xD = 0.094659, p-value = 0.3317alternative hypothesis: two-sided>
對ks檢驗可參考https://www.cnblogs.com/arkenstone/p/5496761.html,內容很詳細哦
正太性檢驗
H0:服從正太性
H1:不服從正太性
> shapiro.test(x) Shapiro-Wilk normality testdata: x#p值>0.05,不能拒絕原假設W = 0.9956, p-value = 0.9876
qq圖檢驗,呈直線時,近似正太。
> qqnorm(x)>
非參數檢驗:
H0:均值相等
H1:均值不等
> wilcox.test(x,y) Wilcoxon rank sum test with continuity correctiondata: x and y#p值很小(<0.05)拒絕原假設。W = 17, p-value = 0.0001123alternative hypothesis: true location shift is not equal to 0>
今天就先到這裡咯,下節我們將開啟新篇章,講什麼?這裡先不做透露了,哈哈~