統計|用R語言做協方差分析

2021-03-02 52Psychology

ANALYSIS OF COVARIANCE

Some people view ANCOVA as being a multiple regression technique. Others see it as an offshoot of ANOVA. In R, ANCOVA can be done with equal success using either the lm() function or the aov() function. I find it easier to treat ANCOVA as regression, so that's what I'll do.

Everything is regression, after all. The previous tutorial was called Multiple Regression, and we treated only cases where all the explanatory, or predictor, variables were numeric. It's possible to include categorical predictors (IVs) if they are coded correctly. In fact, if all we have are categorical IVs, then we are essentially doing ANOVA. If there is a mix of numeric and categorical IVs, then we are in the realm of ANCOVA.

Gas Mileage

A car with an eight cylinder engine... What teenaged boy hasn't dreamed of owning one? Well, get one now before they become extinct, because they are hard on gasoline! Of course, cars with big engines also tend to weigh more than cars with smaller engines, so maybe it's not the number of cylinders in the engine so much as it is the weight of the car that costs us at the gas pump. I wonder how we might tease those two variables apart. 

> names(mtcars) [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"> cars = mtcars[,c(1,2,6)]> cars$cyl = factor(cars$cyl)> summary(cars)      mpg        cyl          wt       Min.   :10.40   4:11   Min.   :1.513   1st Qu.:15.43   6: 7   1st Qu.:2.581   Median :19.20   8:14   Median :3.325   Mean   :20.09          Mean   :3.217   3rd Qu.:22.80          3rd Qu.:3.610   Max.   :33.90          Max.   :5.424

The "mtcars" data set contains information on cars that were road-tested by Motor Trend magazine in 1974 (or road tests of which were published in 1974). That's a long time ago, unless you're me, and you're still dreaming of that 1967 Mustang! It's the data we have, so we'll go with it. You can update the data on your own time if you want to see if relationships we discover from 1974 still hold today.

We have extracted three columns from that data frame, namely, mpg (gas mileage in miles per U.S. gallon), cyl (the number of cylinders in the engine), and wt (weight in 1000s of pounds). The "cyl" variable has sensibly been declared a factor. Let's begin by looking at some graphs. 

> par(mfrow=c(1,3))> boxplot(mpg ~ cyl, data=cars, xlab="cylinders", ylab="mpg")> boxplot(wt ~ cyl, data=cars, xlab="cylinders", ylab="weight")> plot(mpg ~ wt, data=cars)

That doesn't leave much to the imagination, does it?! The more cylinders the car has (had!), the harder it is on gas. The more cylinders the car has, the more it weighs. And the more the car weighs, the lower its gas mileage. All three of those relationships appear impressively strong in the graphs.

An analysis of variance will no doubt show us in p-values what we can already plainly see from looking at the boxplots. 

> lm.out1 = lm(mpg ~ cyl, data=cars)   # aov() works as lm() internally> anova(lm.out1)                       # same as summary.aov(lm.out1)Analysis of Variance TableResponse: mpg          Df Sum Sq Mean Sq F value    Pr(>F)    cyl        2 824.78  412.39  39.697 4.979e-09 ***Residuals 29 301.26   10.39                      ---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> summary(lm.out1)Call:lm(formula = mpg ~ cyl, data = cars)Residuals:    Min      1Q  Median      3Q     Max -5.2636 -1.8357  0.0286  1.3893  7.2364 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  26.6636     0.9718  27.437  < 2e-16 ***cyl6         -6.9208     1.5583  -4.441 0.000119 ***cyl8        -11.5636     1.2986  -8.905 8.57e-10 ***---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Residual standard error: 3.223 on 29 degrees of freedomMultiple R-squared:  0.7325, Adjusted R-squared:  0.714 F-statistic:  39.7 on 2 and 29 DF,  p-value: 4.979e-09

No surprises in the ANOVA output. The interesting stuff is in the regression output. The (estimated) coefficients relate to the means of "cyl". The base, or reference, level of "cyl" is the one not listed with a coefficient in the regression output, or cyl4. The intercept (26.66) is the mean gas mileage for that level of "cyl". The coefficient for cyl6 is -6.92, which means that six cylinder cars get (got) 6.92 mpg less than four cylinder cars, on the average. Eight cylinder cars got 11.56 mpg less than four cylinder cars. The significance tests on these coefficients are equivalent to Fisher LSD tests on the differences in means. These figures are easily enough confirmed. (Important note: This interpretation of the coefficients depends upon having contrasts for unordered factors set to "treatment". Do options("contrasts") to check.) 

> with(cars, tapply(mpg, cyl, mean)) 4 6 8 26.66364 19.74286 15.10000

The effect size is R2 = .7325. In the language of ANOVA, that statistic would be called eta-squared. That is the "proportion of explained variation" in mpg accounted for by the explanatory variable. I don't think there's any doubt of a cause-and-effect relationship in this case (maybe a wee bit), but this does not mean that number of cylinders explains the whole 73% of variability in gas mileage. A more accurate statement would be cylinders and all confounds from other variables that might be confounded with cylinders account for 73% of the variability in gas mileage. The last line of the regression output duplicates the ANOVA output, because there is only one explanatory variable at this time.

We think one of the variables confounded with cylinders is weight of the vehicle. In fact, we think once those gas mileages are adjusted for the weight of the vehicle, the differences in the means by number of cylinders will be a lot less impressive. Dare we imagine that the differences might vanish altogether? I don't think so. But, let's see. Now for the analysis of covariance part. 

> lm.out2 = lm(mpg ~ wt + cyl, data=cars)> anova(lm.out2)Analysis of Variance TableResponse: mpg          Df Sum Sq Mean Sq  F value    Pr(>F)    wt         1 847.73  847.73 129.6650 5.079e-12 ***cyl        2  95.26   47.63   7.2856  0.002835 ** Residuals 28 183.06    6.54                       ---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> summary(lm.out2)Call:lm(formula = mpg ~ wt + cyl, data = cars)Residuals:    Min      1Q  Median      3Q     Max -4.5890 -1.2357 -0.5159  1.3845  5.7915 Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  33.9908     1.8878  18.006  < 2e-16 ***wt           -3.2056     0.7539  -4.252 0.000213 ***cyl6         -4.2556     1.3861  -3.070 0.004718 ** cyl8         -6.0709     1.6523  -3.674 0.000999 ***---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Residual standard error: 2.557 on 28 degrees of freedomMultiple R-squared:  0.8374, Adjusted R-squared:   0.82 F-statistic: 48.08 on 3 and 28 DF,  p-value: 3.594e-11

In the ANOVA, we have added "wt" (weight), a numeric variable that we think (i.e., know perfectly well in this case) is related to the DV as a covariate. Ordinarily, at least in the social sciences, we would do such a thing because we think the covariate will help discriminate among levels of the categorical IV, once the confound with the covariate is removed. Here we believe just the opposite is going to occur. We're not so terribly interested in the relationship between weight and gas mileage. We just want to get it out of the mix, so that we can see a "purer" relationship between cylinders and gas mileage.

Two very important points need to be made at this juncture. First, notice we didn't ask to see any interactions between "wt" and "cyl". That means we're assuming there aren't any, a common assumption in ANCOVA, but also an assumption that is commonly incorrect. It is worth checking. Second, in R, the ANOVA tests are sequential. That means if we want weight taken out before we look at the effect of cylinders, we MUST enter it first into the model formula. This won't make any difference in the regression output, but it makes a BIG difference in the ANOVA output!

Overall, we are now accounting for R2 = .8374, or 84% of the total variability in gas mileage. However, notice that the part of the explained variability that we can attribute to "cyl" (eta-squared for "cyl") is now only... 

> 95.26 / (847.73 + 95.26 + 183.06)[1] 0.0845966

...or about 8%. The mighty have, indeed, fallen, but not so far that "cyl" is no longer a statistically significant effect. Are there other confounds with "cyl" that might remove that last 8%? I'll leave it up to you to look for them.

We can now calculate adjusted means of "cyl" by level. This is done using the regression equation, and we have a different regression equation for each of the three levels of "cyl".

mpg-hat = 33.9908 - 3.2056 * wt for cyl4
mpg-hat = 33.9908 - 3.2056 * wt - 4.2556 for cyl6
mpg-hat = 33.9908 - 3.2056 * wt - 6.0709 for cyl8


The terms for specific levels of "cyl" can be combined with, and thus will change, the intercept of the regression line, and usually they are, but I have left them separated here to illustrate what's happening. To get the adjusted means for levels of "cyl", we fill in the overall averageof "wt" and solve each of these regression equations. 

> mean(cars$wt)[1] 3.21725> 33.9908 - 3.2056 * mean(cars$wt) # adjusted mean for cyl4[1] 23.67758> 33.9908 - 3.2056 * mean(cars$wt) - 4.2556 # adjusted mean for cyl6[1] 19.42198> 33.9908 - 3.2056 * mean(cars$wt) - 6.0709 # adjusted mean for cyl8[1] 17.60668

Notice that the means are still different--and significantly different as we can tell from the ANOVA--but not so different as they were before we took out the effect of the weight of the car on gas mileage.

Here's a nice way to graph these results. 

> plot(mpg ~ wt, data=cars, pch=as.numeric(cars$cyl), xlab="weight of car in 1000s of pounds")> abline(a=33.99, b=-3.21, lty="solid")> abline(a=33.99-4.26, b=-3.21, lty="dashed")> abline(a=33.99-6.07, b=-3.21, lty="dotted")

The circles and solid line are for the four-cylinder cars, the triangles and dashed line for the six-cylinder cars, and the plus symbols and dotted line for the eight-cylinder cars. I'll refer you to the More Graphics tutorial for instructions on how to add all that information to the graph.

Two things distress me here that I feel compelled to comment on. First, notice there is very little overlap on the scatterplot in the weight of 4- vs. 6- vs. 8-cylinder cars. This makes the calculation of adjusted means just a little bit dicey. Also, while I don't detect anything obvious on the graph, I'm still uncomfortable that we haven't checked for the possibility of interactions between "wt" and "cyl". 

> lm.out3 = lm(mpg ~ wt * cyl, data=cars)> anova(lm.out2, lm.out3)Analysis of Variance TableModel 1: mpg ~ wt + cylModel 2: mpg ~ wt * cyl  Res.Df    RSS Df Sum of Sq      F Pr(>F)1     28 183.06                           2     26 155.89  2     27.17 2.2658 0.1239

Interactions make no significant contribution to the model, but I'd be happier if that F-value were closer to 1.

An Aside

Above I noted that, in R, ANOVA is done sequentially, and that makes it very important to get the covariate into the model formula first. If you don't care for sequential sums of squares, you can try this, provided you have installed the optional "car" package from CRAN. 

> library("car")                       # this will give an error if "car" is not installed> Anova(lm.out2)                       # notice the upper case A in AnovaAnova Table (Type II tests)Response: mpg           Sum Sq Df F value   Pr(>F)    wt        118.204  1 18.0801 0.000213 ***cyl        95.263  2  7.2856 0.002835 ** Residuals 183.059 28                     ---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These are so-called Type II sums of squares, and they are not calculated sequentially. I would hesitate to calculate eta-squares from this table unless I got the total SS by some method other than adding the SSs in this table. (See the tutorial on Unbalanced Designs for a discussion.)


Childhood Abuse and Adult Post-Traumatic Stress Disorder

The data for this section of the tutorial is discussed in Faraway (2005). To get the data, you can either install the "faraway" package, or you can go to Dr. Faraway's website and download the data files that accompany his book. The files will download into a folder called "jfdata.zip", which will unzip into a folder called "ascdata". Rename this folder "Faraway" and drop it into your working directory (Rspace).

If you have installed the "faraway" package, the commands to get the data are: 

> library("faraway")> data(sexab)

If you have downloaded the Faraway folder to your working directory, then do this: 

> sexab = read.table("Faraway/sexab.txt", header=T, row.names=1)

Either way... 

> summary(sexab) cpa ptsd csa Min. :-3.1204 Min. :-3.349 Abused :45 1st Qu.: 0.8321 1st Qu.: 6.173 NotAbused:31 Median : 2.0707 Median : 8.909 Mean : 2.3547 Mean : 8.986 3rd Qu.: 3.7387 3rd Qu.:12.240 Max. : 8.6469 Max. :18.993

The data are from a study by Rodriquez, et al. (1997). The subjects were adult women who were being treated at a clinic, some of whom self-reported childhood sexual abuse (csa=Abused), and some of whom did not (csa=NotAbused). All of the women filled out two standardized scales, one assessing the severity of childhood physical abuse (cpa), and one assessing the severity of adult post-traumatic stress disorder (ptsd). On both of these scales, a higher score indicates a worse outcome. The question we will attempt to answer is, How are sexual and physical abuse that occur in childhood related to the severity of PTSD as an adult, and how does the effect of physical abuse differ for women who were or were not sexually abused as children?

When I discuss this example in my class, I always start off by having students examine the following scatterplot. (DON'T close this graphics window. We will be drawing regression lines on this graph eventually.) 

> plot(ptsd ~ cpa, data=sexab, pch=as.numeric(csa), col=as.numeric(csa))

The red triangles are for women who were not sexually abused as children, and the black open circles are for women who were. I ask my students to tell me what they can see in this graph, and I hope against hope for the following answers.

There is a clear positive relationship between the severity of childhood physical abuse and the severity of adult PTSD.

Overall, women who were not sexually abused as children have lower severity of adult PTSD than do women who were sexually abused as children.

Overall, women who were not sexually abused as children also had lower severity of physical abuse as children than did women who were sexually abused as children.

Then I ask them the hard question. Does the effect of physical abuse add to the effect of sexual abuse, or not? When we summarize the relationships we see on this graph, and we going to have to do so with two regression lines, or will one be sufficient? Let's find out.

First, I propose we check for the importance of interactions. 

> lm.sexab1 = lm(ptsd ~ cpa + csa, data=sexab)> lm.sexab2 = lm(ptsd ~ cpa * csa, data=sexab)> anova(lm.sexab1, lm.sexab2)Analysis of Variance TableModel 1: ptsd ~ cpa + csaModel 2: ptsd ~ cpa * csa  Res.Df    RSS Df Sum of Sq     F Pr(>F)1     73 782.08                          2     72 774.28  1    7.8069 0.726  0.397

I like that. The F-value is near 1, so we can assume the absence of important interactions. That will make our analysis easier. 

> anova(lm.sexab1) # sequential testsAnalysis of Variance TableResponse: ptsd Df Sum Sq Mean Sq F value Pr(>F) cpa 1 449.80 449.80 41.984 9.462e-09 ***csa 1 624.03 624.03 58.247 6.907e-11 ***Residuals 73 782.08 10.71 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> summary(lm.sexab1)Call:lm(formula = ptsd ~ cpa + csa, data = sexab)Residuals: Min 1Q Median 3Q Max -8.1567 -2.3643 -0.1533 2.1466 7.1417 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.2480 0.7187 14.260 < 2e-16 ***cpa 0.5506 0.1716 3.209 0.00198 ** csaNotAbused -6.2728 0.8219 -7.632 6.91e-11 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Residual standard error: 3.273 on 73 degrees of freedomMultiple R-squared: 0.5786, Adjusted R-squared: 0.5671 F-statistic: 50.12 on 2 and 73 DF, p-value: 2.002e-14

We have an answer. The estimated coefficient for csaNotAbused is statistically significant. When we include it in the regression equation(s), it will result in different intercepts for the csaAbused and csaNotAbused lines. Furthermore, the difference in intercepts, 6.3 points on a scale with a range of scores of a little more than 22 points, is quite large. The regression equations are:

ptsd-hat(Abused) = 10.2480 + 0.5506 * cpa
ptsd-hat(NotAbused) = 10.2480 + 0.5506 * cpa - 6.2728 = 3.9752 + 0.5506 * cpa

To draw the regression lines on the pre-existing scatterplot... 

> abline(a=10.25, b= 0.55, lty="dashed", col="black")> abline(a=3.98, b=0.55, lty="dotted", col="red")

It appears that the effects of physical and sexual abuse are additive. One does not make the effects of the other more severe (no interaction), which is in no way to say that they aren't each bad enough to begin with!

Same Aside

Here are the Type II (nonsequential) tests from the "car" package. 

> Anova(lm.sexab1)Anova Table (Type II tests)Response: ptsd          Sum Sq Df F value    Pr(>F)    cpa       110.30  1  10.295  0.001982 ** csa       624.03  1  58.247 6.907e-11 ***Residuals 782.08 73                      ---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This changes nothing in the regression analysis, as regression effects are not calculated sequentially.

Rodriguez, N., Ryan, S. W., Vande Kemp, H., & Foy, D. W. (1997). Posttraumatic stress disorder in adult female survivors of childhood sexual abuse: A comparison study. Journal of Counseling and Clinical Psychology, 65, 53-59.

 

編輯:石星琦 

wechat ID:xingqi_star

相關焦點

  • R語言統計篇: 單因素協方差分析
    方差分析(One-way ANCOVA)可以研究一個分類變量對一個連續變量的影響,同時校正其他變量的作用,這些變量也稱為協變量(Covariate)。也是單因素方差分析(One-way ANOVA,R語言統計篇:單因素方差分析)的一個延伸。比方說,我們現在想要研究不同BMI(偏輕,正常與超重)與空腹血糖的關係,同時校正血壓水平。在此研究中,BMI分組是一個分類變量(自變量),血糖是一個連續變量(因變量),血壓則是一個協變量(covariate)。c.
  • R語言 | 方差分析(下)
    方差檢驗可以處理並分析兩組及其以上的樣本量之間均數差異的比較,它的H1是至少有一組均數與其他組不一樣。F統計量的計算是通過組間均方差與組內均方差的比值得到的。單因素方差分析使用aov( )函數、Anova( )函數(car包)、ezANOVA( )函數(ez包)均可。
  • 【視頻教學】SPSS中級統計--S02-5協方差分析
    小夥伴們,今天我們學習SPSS中級統計--協方差分析。
  • 協方差分析
    一、協方差分析基本思想  通過上述的分析可以看到,不論是單因素方差分析還是多因素方差分析,控制因素都是可控的,其各個水平可以通過人為的努力得到控制和確定。但在許多實際問題中,有些控制因素很難人為控制,但它們的不同水平確實對觀測變量產生了較為顯著的影響。
  • R語言實戰:方差分析
    均衡設計 vs 非均衡設計單因素方差分析 one-way ANOVA,單因素組間方差分析組內因子單因素組內方差分析,重複測量方差分析主效應 vs 交互效應因素方差分析設計混合模型方差分析混淆因素 confounding factor幹擾變數 nuisance variable
  • 數理統計 | 原來協方差與相關係數還能這麼理解
    關鍵詞:協方差\相關係數\數理統計前言:在上一章中,我們介紹了均值,方差和標準差。這樣的一個數理統計指標是非常重要的, 特別是在機器學習領域中, 我們通常在做特徵分析的時候,會使用方差來感知一組數據的離散情況,從而決定要不要對數據特徵經行處理與清洗,因為對於一些特定的機器學習模型來說,如果數據特徵的方差過大, 那麼模型可能很難收斂,導致最終模型的表現不盡人意。
  • 基於R語言的主成分和因子分析
    主成分分析主成分分析,是一種降維的分析方法,其考察多個變量間相關性的一種多元統計方法,研究如何通過少數幾個主成分來揭示多個變量間的內部結構
  • 協方差(covariance)與相關係數(2)|統計學專題
    相關係數的計算公式:從上面的公式中可以看出:相關係數的計算公式中包括x與y的協方差、x的方差和y的方差。故計算x與y的協方差是計算相關係數的基礎。這個原因使得協方差本身的意義難以闡釋。3. 相關係數與p值、預測能力如果兩個變量具有相關性,比如說他們的相關係數為0.8,那麼他們之間的相關性是真實的嗎?回答這個問題,也就是回答他們間的相關係數是否具有統計顯著性,而統計中判斷統計顯著性的方法就是求p值。
  • 數據分析系列(3) | 如何用R語言進行相關係數與多變量的meta分析
    本文主體部分來自《全哥的學習生涯》,如需轉載請聯繫公眾號後臺。本
  • python數據分析中的相關性和協方差
    相關性和協方差相關性和協方差是兩個重要的統計量,pandas計算這兩個量的函數分別是corr( )和cov( )。這兩個量的計算通常涉及兩個Series對象。另外一種情況是,計算單個DataFrame對象的相關性和協方差,返回兩個新DataFrame對象形式的矩陣。用corrwith( )方法可以計算DataFrame對象的列或行與Series對象或其他DataFrame對象元素兩兩之間的相關性。
  • 用R語言做主成分分析
    主成分分析,就是一種降維的分析方法,其考察多個變量間相關性的一種多元統計方法,研究如何通過少數幾個主成分來揭示多個變量間的內部結構,即從原始變量中導出少數幾個主成分,使它們儘可能多地保留原始變量的信息,且彼此間互不相關。舉個例子來說明:拿到一個生物信息系的本科生期末考試成績單,裡面有三列,一列是對學科的興趣程度,一列是複習時間,還有一列是考試成績。
  • SPSS: 方差成分分析/方差分量分析
    通過方差成分分析可考察各層次因素的變異大小,提供可能減少數據變異的方法。○Zero(零方案),假設隨機效應的方差相等並為0。☆Sum of Squares(平方和),只能用於ANOVA(方差分析)法。○Type I(I型平方和),最常用的方差成分分析的方法,可用於分層模型。○Type III(III型平方和),為默認值,可在廣義線性模型單變量方差分析中III型平方和假設檢驗的方差成分分析。
  • 【通俗理解】協方差
    由協方差的公式(及其變形)不難選出正確答案(給公眾號發送「協方差」獲得答案)。希望通過此題,讓大家熟悉一下一些概念:均值/期望,方差,協方差,相關係數。期望,方差等被稱作統計變量的數字特徵。我們知道,概率密度函數可以完全描述一個統計變量的特性。正如一個用一個照片來描述一個人的長相一樣。概率密度函數可能是個複雜的函數。
  • 【學習記·第31期】單因素、雙因素方差分析VS協方差分析
    如果分布的假設不能得到滿足,二者均方比值的分布就不是分布,用方差分析得出的結論可能是不正確的。 方差分析的基本思想•  方差分析是對數據變異量的分析,將總變異分解為由自變量(或稱實驗處理)引起的變異和誤差因素引起的變異,如果由自變量產生的變異顯著多於誤差造成的變異,那麼我們可以有把握的推斷自變量對因變量確實產生了影響。在這裡就涉及方差分析的邏輯基礎,即方差的可分解性。用公式表示即:。
  • 【每天學點應用軟體】用Excel做數據分析之相關係數與協方差
    注:本功能需要使用Excel擴展功能,假如您的Excel尚未安裝數據分析,請依次選擇「工具」-「加載宏」,在安裝光碟中加載「分析資料庫」。加載成功後,能夠在「工具」下拉菜單中看到「數據分析」選項。  操作步驟  1. 打開原始數據表格,製作本實例的原始數據需要滿足兩組或兩組以上的數據,結果將給出其中任意兩項的相關係數。
  • ML基礎:協方差矩陣!
    但這些統計量都是針對一維數據的計算,在處理高維數據時,便可以採用協方差來查看數據集中的一些規律。協方差來度量兩個隨機變量關係的統計量,它描述的意義是:如果結果為正值,則說明兩者是正相關的,否則是負相關的。需要注意的是,協方差是計算不同特徵之間的統計量,不是不同樣本之間的統計量。
  • 一團糟的數據做方差分析,幸好還有Wilcox的robust ANOVA(R統計專用)
    在UCL攻讀語言發育方向,統計學的要求比較高,足足上了兩個學期,要上機考試還要寫2800字的案例分析論文。最近跟語言學另一個方向的同學溝通,才知道他們壓根沒這門頭痛的課!不過這次(希望是最後一次)學統計的經歷是最深刻也最特別的,不光是用英語學習,而且還覺得很有用。
  • 方差-協方差法VaR計量模型選擇
    VaR的計算方法主要有歷史模擬法、Monte Carlo模擬法、方差—協方差方法(分析方法),這三種方法基本思想不同,各有優缺點。本文所研究的是實踐中最常用的方差—協方差方法。     在方差—協方差方法中,RiskMetrics方法由於其推出時間較早且算法簡單而倍受業界推崇。但隨著現代金融市場的不斷發展,市場風險日益複雜化,從而需要一套更為精確的測量方法來監控投資風險。
  • 《R語言實戰》菜雞筆記(七):基本統計分析
    # 《R語言實戰》第七章 基本統計分析# 描述性統計分析/計算描述性統計量# 方法一:summary() 獲取最值 四分位數 均值data_1 <- mtcars[c("mpg", "hp", "wt", "am")]head(data_1)summary(data_1)# 方法二:sapply()/apply() 計算所選擇的任意描述性統計量
  • R語言之冗餘分析(RDA)及方差分解(VPA)
    在進行RDA分析前需對包含很多0值的物種數據做一定的轉化。具體的描述與RDA解讀可參考相關書籍或文獻。第一軸長度<3,則進行RDA分析;第一軸長度>4,進行CCA分析;3<第一軸長度<4,兩者皆可。此處<3,進行RDA分析。