時,Level 1因變量的均值第二個方程的解釋與此類似,但此方程擬合的是Level 2變量對 的斜率。
聯立後的形式如下:
第一部分為模型的固定效應部分( 部分) ,第二部分為模型的隨機效應部分(u和r) ,其中 是個體層次的誤差項,多層次模型還有兩個額外的誤差項, 是各個層級單位因變量水平的隨機差異, 是各個層級單位自變量X與Y關係的隨機差異。因此,多層次模型的隨機效應依賴於各個層級單位。
使用層次模型時需要注意的是:
3層以上的模型並非不可以,但社會科學中常用到的層次模型為2~3層。使用隨機截距,還是隨機截距和隨機斜率共同作為層級單位的自變量與因變量的關係的反應,應綜合考慮理論和數據作出決定。用到的包# load libraries library(nlme) library(lme4) library(lattice) library(ggplot2) library(dplyr)我們使用nlme包的一份重複測量的數據集Orthodont,該數據集有4個變量
data1 <- as.data.frame(nlme::Orthodont) data1$Subject <- factor(data1$Subject, ordered = FALSE) head(data1) str(data1) 'data.frame': 108 obs. of 4 variables: $ distance: num 26 25 29 31 21.5 22.5 23 26.5 23 22.5 ... $ age : num 8 10 12 14 8 10 12 14 8 10 ... $ Subject : Factor w/ 27 levels "M16","M05","M02",..: 15 15 15 15 3 3 3 3 7 7 ... $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...每個個體age與distance的關係
data1 %>% ggplot(aes(x=age, y=distance, group=Subject, color=Subject, linetype=Sex)) + geom_line(size=1) + theme_classic() 零模型mod1 <- lmer(distance ~ (1|Subject), data=data1, REML=F) summary(mod1) ## Linear mixed model fit by maximum likelihood ['lmerMod'] ## Formula: distance ~ (1 | Subject) ## Data: data1 ## ## AIC BIC logLik deviance df.resid ## 521.5 529.5 -257.7 515.5 105 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.2391 -0.5248 -0.1103 0.4827 2.7734 ## ## Random effects: ## Groups Name Variance Std.Dev. ## Subject (Intercept) 3.567 1.889 ## Residual 4.930 2.220 ## Number of obs: 108, groups: Subject, 27 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 24.0231 0.4216 56.98 # 檢驗隨機截距是否顯著 rand(mod1) ANOVA-like table for random-effects: Single term deletions Model: distance ~ (1 | Subject) npar logLik AIC LRT Df Pr(>Chisq) <none> 3 -257.75 521.49 (1 | Subject) 2 -268.79 541.58 22.09 1 2.601e-06 *** --- Signif. codes: 0 『***』 0.001 『**』 0.01 『*』 0.05 『.』 0.1 『 』 1可視化模型結果:
data1 %>% # save predicted values mutate(pred_dist = fitted(mod1)) %>% # graph ggplot(aes(x=age, y=pred_dist, group=Subject, color=Subject)) + theme_classic() + geom_line(size=1) 隨機截距+固定斜率模型在零模型上加一個一級的自變量age
mod2 <- lmer(distance ~ age + (1|Subject), data=data1, REML=F) summary(mod2) Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: distance ~ age + (1 | Subject) Data: data1 AIC BIC logLik deviance df.resid 451.4 462.1 -221.7 443.4 104 Scaled residuals: Min 1Q Median 3Q Max -3.6870 -0.5386 -0.0123 0.4910 3.7470 Random effects: Groups Name Variance Std.Dev. Subject (Intercept) 4.294 2.072 Residual 2.024 1.423 Number of obs: 108, groups: Subject, 27 Fixed effects: Estimate Std. Error t value (Intercept) 16.76111 0.79456 21.09 age 0.66019 0.06122 10.78 Correlation of Fixed Effects: (Intr) age -0.848可視化模型結果:
data1 %>% # save predicted values mutate(pred_dist = fitted(mod2)) %>% # graph ggplot(aes(x=age, y=pred_dist, group=Subject, color=Subject)) + theme_classic() + geom_line(size=1) 隨機截距和隨機斜率模型允許年齡在每個人的身上對於因變量的關係是變化的
mod3 <- lmer(distance ~ age + (1|Subject) + (0+age|Subject), data=data1, REML=F) summary(mod3) Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: distance ~ age + (1 | Subject) + (0 + age | Subject) Data: data1 AIC BIC logLik deviance df.resid 449.7 463.1 -219.9 439.7 103 Scaled residuals: Min 1Q Median 3Q Max -3.7542 -0.5056 0.0181 0.5216 3.8017 Random effects: Groups Name Variance Std.Dev. Subject (Intercept) 1.82570 1.3512 Subject.1 age 0.02141 0.1463 Residual 1.85944 1.3636 Number of obs: 108, groups: Subject, 27 Fixed effects: Estimate Std. Error t value (Intercept) 16.76111 0.70816 23.67 age 0.66019 0.06509 10.14 Correlation of Fixed Effects: (Intr) age -0.822可視化模型結果
data1 %>% # save predicted values mutate(pred_dist = fitted(mod3)) %>% # graph ggplot(aes(x=age, y=pred_dist, group=Subject, color=Subject)) + theme_classic() + geom_line(size=1) 隨機截距和隨機斜率(允許兩者相關)允許隨機截距和隨機斜率相關
mod4 <- lmer(distance ~ age + (1+age|Subject), data=data1, REML=F) summary(mod4) ## Linear mixed model fit by maximum likelihood ['lmerMod'] ## Formula: distance ~ age + (1 + age | Subject) ## Data: data1 ## ## AIC BIC logLik deviance df.resid ## 451.2 467.3 -219.6 439.2 102 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.3060 -0.4874 0.0076 0.4822 3.9228 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## Subject (Intercept) 4.81409 2.1941 ## age 0.04619 0.2149 -0.58 ## Residual 1.71620 1.3100 ## Number of obs: 108, groups: Subject, 27 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 16.76111 0.76075 22.032 ## age 0.66019 0.06992 9.442 ## ## Correlation of Fixed Effects: ## (Intr) ## age -0.848可視化模型結果
data1 %>% # save predicted values mutate(pred_dist = fitted(mod4)) %>% # graph ggplot(aes(x=age, y=pred_dist, group=Subject, color=Subject)) + theme_classic() + geom_line(size=1) 模型診斷殘差診斷
# 殘差與擬合值散點圖 檢查非線性關係、離群值、異方差 data1 %>% # data mutate(pred_dist = fitted(mod4)) %>% mutate(L1_resid = residuals(mod4, type = "response")) %>% # graph ggplot(aes(x=pred_dist, y=L1_resid)) + theme_classic() + geom_point(size=3, shape=1) + geom_hline(yintercept=0)隨機截距 每個個體的差異還挺大的
dotplot(ranef(mod4, condVar = T), strip = T, scales=list(relation='free'))$Subject 模型比較選擇mod2
anova(mod1,mod2,mod3,mod4) Data: data1 Models: mod1: distance ~ (1 | Subject) mod2: distance ~ age + (1 | Subject) mod3: distance ~ age + (1 | Subject) + (0 + age | Subject) mod4: distance ~ age + (1 + age | Subject) npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) mod1 3 521.49 529.54 -257.75 515.49 mod2 4 451.39 462.12 -221.69 443.39 72.1016 1 < 2e-16 *** mod3 5 449.74 463.15 -219.87 439.74 3.6513 1 0.05603 . mod4 6 451.21 467.30 -219.61 439.21 0.5267 1 0.46801 --- Signif. codes: 0 『***』 0.001 『**』 0.01 『*』 0.05 『.』 0.1 『 』 1 - END -