醫學統計與R語言:Probit回歸模型及邊際效應(Marginal effects)

2021-12-23 醫學統計與R語言

微信公眾號：醫學統計與R語言
如果你覺得對你有幫助，歡迎轉發

A probit regression is a version of the generalized linear model used to model dichotomous outcome variables. It uses the inverse standard normal distribution as a linear combination of the predictors. The binary outcome variable Y is assumed to have a Bernoulli distribution with parameter p (where the success probability is p∈(0,1)p∈(0,1)). Hence, the probit link function is

Remember back to intro stats when you had to look up in Z tables the area under the normal curve for a specific Z value? That area represents a cumulative probability: the probability that Z is less than or equal to the specified Z value.

Coefficients for probit models can be interpreted as the difference in Z score associated with each one-unit difference in the predictor variable.

Although the effect on Z of a change in X is linear, the link between z and the dependent variable Y is nonlinear since Φ is a nonlinear function of X.

輸入1：

rm(list=ls())
setwd("C:/Users/mooshaa/Desktop")
library(rio)
bilog <- import("log.sav")
head(bilog)
bilog[,c(1,2,6)] <- lapply(bilog[,c(1,2,6)] ,as.factor)
xtabs(~Gender + Smoking_status, data = bilog)

結果1：

Smoking_status
Gender  0  1
     0 94 27
     1 17 80

輸入2：

logitmodel <- glm(Smoking_status~., data = bilog,
family = binomial(link="logit"))
probitmodel <- glm(Smoking_status~., data = bilog,
family = binomial(link="probit"))
summary(probitmodel)

結果2：

Call:
glm(formula = Smoking_status ~ ., family = binomial(link = "probit"),
    data = bilog)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-2.1019  -0.5894  -0.1353   0.5619   3.2125

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.103e+00  8.498e-01  -2.475  0.01333 *
Gender1      1.783e+00  2.624e-01   6.793 1.10e-11 ***
Race1        5.124e-01  4.927e-01   1.040  0.29838
Race2        1.914e-01  3.307e-01   0.579  0.56263
Race3        3.181e-01  3.209e-01   0.991  0.32164
Race4        1.204e+00  3.955e-01   3.044  0.00233 **
Age          5.499e-02  1.223e-02   4.495 6.97e-06 ***
Income      -6.602e-06  5.719e-06  -1.154  0.24839
Cigarettes  -2.856e-02  1.267e-02  -2.254  0.02420 *
---
Signif. codes:  0 『***』 0.001 『**』 0.01 『*』 0.05 『.』 0.1 『』 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 302.14  on 217  degrees of freedom
Residual deviance: 174.24  on 209  degrees of freedom
AIC: 192.24

Number of Fisher Scoring iterations: 6

輸入3：

library(stargazer)
stargazer(logitmodel, probitmodel, type = "text", style = "demography", ci = T)

結果3：

----
                           Smoking_status
                    logistic             probit
                     Model 1             Model 2
----
Gender1             3.101***            1.783***
                 (2.155, 4.047)      (1.268, 2.297)
Race1                 0.913               0.512
                 (-0.885, 2.711)     (-0.453, 1.478)
Race2                 0.218               0.191
                 (-0.988, 1.423)     (-0.457, 0.840)
Race3                 0.504               0.318
                 (-0.674, 1.682)     (-0.311, 0.947)
Race4                2.082**             1.204**
                 (0.669, 3.496)      (0.429, 1.979)
Age                 0.101***            0.055***
                 (0.056, 0.146)      (0.031, 0.079)
Income              -0.00001            -0.00001
               (-0.00003, 0.00001) (-0.00002, 0.00000)
Cigarettes           -0.056*             -0.029*
                (-0.101, -0.011)    (-0.053, -0.004)
Constant             -3.720*             -2.103*
                (-6.714, -0.725)    (-3.769, -0.438)
N                      218                 218
Log Likelihood       -86.473             -87.122
AIC                  190.947             192.245
----
*p < .05; **p < .01; ***p < .001

輸入4：

cbind(coef= coef(probitmodel),confint(probitmodel))

結果4：

               coef         2.5 %        97.5 %
(Intercept) -2.103163e+00 -3.799258e+00 -4.546255e-01
Gender1      1.782577e+00  1.278533e+00  2.310248e+00
Race1        5.124061e-01 -5.049788e-01  1.550530e+00
Race2        1.914387e-01 -4.965334e-01  8.801854e-01
Race3        3.180726e-01 -3.212090e-01  9.825868e-01
Race4        1.203820e+00  4.092421e-01  2.031129e+00
Age          5.498523e-02  3.202985e-02  7.950719e-02
Income      -6.601737e-06 -1.783996e-05  4.584988e-06
Cigarettes  -2.855577e-02 -5.333771e-02 -4.533804e-03

輸入5：

predict(newdata=bilog[1,],probitmodel,"response")

pnorm(predict(newdata=bilog[1,],probitmodel,"link"))

結果5：

0.5040487

輸入6：

probdata <- data.frame(Gender=c(0,1),Race=c(1,1),
                       Age=c(rep(mean(bilog$Age),2)),
                       Income=c(rep(mean(bilog$Income),2)),
                       Cigarettes=c(rep(mean(bilog$Cigarettes),2)))
probdata [,c(1,2)] <- lapply(probdata [,c(1,2)] ,factor)
prob<- predict(newdata=probdata,probitmodel,"response")
prob

結果6：

1 2
0.2725377 0.8804864

輸入7：

diff(prob)

結果7：

0.6079486

輸入8：Marginal effects

install.packages("mfx")
library(mfx)
probitmfx(formula=Smoking_status~., data=bilog,atmean = F)

This function estimates a probit regression model and calculates the corresponding marginal effects.

結果8：

Call:
probitmfx(formula = Smoking_status ~ ., data = bilog, atmean = F)

Marginal Effects:
                 dF/dx   Std. Err.       z     P>|z|
Gender1     5.2417e-01  6.8890e-02  7.6088 2.767e-14 ***
Race1       1.1523e-01  1.0894e-01  1.0577 0.2901849
Race2       4.3800e-02  7.6774e-02  0.5705 0.5683383
Race3       6.9941e-02  6.8452e-02  1.0217 0.3069004
Race4       2.5896e-01  7.4118e-02  3.4938 0.0004761 ***
Age         1.2361e-02  2.4540e-03  5.0371 4.727e-07 ***
Income     -1.4841e-06  1.2775e-06 -1.1617 0.2453464
Cigarettes -6.4196e-03  2.7760e-03 -2.3125 0.0207510 *
---
Signif. codes:  0 『***』 0.001 『**』 0.01 『*』 0.05 『.』 0.1 『』 1

dF/dx is for discrete change for the following variables:

[1] "Gender1" "Race1"   "Race2"   "Race3"   "Race4"

輸入9：Marginal effects

prob.marg<-coef(probitmodel)*mean(dnorm(predict(probitmodel)), na.rm=T)
prob.marg

結果9：

(Intercept)       Gender1         Race1         Race2         Race3         Race4           Age        Income
-4.728079e-01  4.007377e-01  1.151930e-01  4.303696e-02  7.150525e-02  2.706284e-01  1.236112e-02 -1.484123e-06
   Cigarettes
-6.419566e-03

相關焦點

Logit-Probit:非線性模型中交互項的邊際效應解讀

問題背景在非線性計量模型（例如 logit,probit 等），如何解釋交互項的效應，很多學者在應用上仍存在混淆之處。本文通過介紹 Buis (2010 和 Dow et al. (2019) 來說明這一問題。 2.
互助問答第68期:分組檢驗和邊際效應問題、ivprobit交乘項設計

（1）在分組回歸中，其實每個變量都有各自的係數，比如在OLS回歸中可以根據係數影響大小（若係數表示的是邊際效應）。那為什麼又需要進行假設檢驗？這裡可以這樣想，這個結果僅僅是根據一組樣本做出來的，是否會因為隨機誤差而引起的這個差異呢？所以不能僅根據這一組樣本數據就得出結果，需要進行更為嚴謹的檢驗，即這裡的假設檢驗。
Stata:邊際效應知多少?f_able命令(下)

我還描述了 nl 能夠估計除交互作用之外的變量變換的邊際效應，使用數值導數來近似解析導數。事實上，在估計非線性模型時，已經使用數值導數來估計邊際效應。換句話說，當模型中使用交互和多項式以外的變換時，Stata 已經具有估計邊際效應的能力。
醫學統計與R語言:Tobit回歸模型

醫學統計與R語言：多重比較P值的可視化醫學統計與R語言：爬蟲抓取國自然基金信息醫學統計與R語言：多元方差分析與非參數多元方差分析醫學統計與R語言：使用R語言實現Johnson-Neyman分析醫學統計與R語言：多層線性模型圖示醫學統計與R語言：多層線性模型(混合線性模型醫學統計與R語言：best
醫學統計與R語言:標準Z值一定服從標準正態分布?

Model）醫學統計與R語言：Probit回歸模型及邊際效應（Marginal effects）醫學統計與R語言：Lord’s Paradox醫學統計與R語言：協方差分析（ANCOVA）+plus醫學統計與R語言：Kendall是誰？
「離散選擇模型」Probit模型、Logit模型

我們主要介紹三個模型，即線性概率模型（LPM模型），logit模型，probit模型，其中LPM模型最簡單，並且可以用普通OLS進行估計，而probit和logit模型都較為複雜，只能用極大似然估計法進行估計。
r語言檢驗p值 - CSDN

醫學統計與R語言：對數正態分布與卡方分布醫學統計與R語言：qvalue醫學統計與R語言：Meta 回歸作圖(Meta regression Plot)醫學統計與R語言：aggregate.plot了解一下醫學統計與R語言：有序Probit回歸(Ordered Probit Model)醫學統計與R語言：Probit回歸模型及邊際效應
二值選擇模型(2):Probit與Logit

在滿足獨立同分布（i.i.d.）假設和已知分布函數的情況下，最大似然估計是最佳無偏估計（BUE，比線性模型的BLUE少個L）。通過MLE得到模型參數的估計值後，接下來的重要工作便是對參數的經濟解釋。在線性模型的情況下，回歸係數
SPSS統計分析案例:probit概率回歸分析

SPSS統計訓練營是一個自學平臺，以詳實統計案例教程為基礎，配套練習使用的原始數據，方便讀者自己實踐，致力於讓數據科學學習簡單有趣高效。
醫學統計R語言:分面畫boxplot

R語言：多列分組正態性檢驗醫學統計與R語言：調節效應分析（Moderation Analysis）醫學統計與R語言：結構方程模型（structural equation model）醫學統計與R語言：中介效應分析（mediation effect analysis）醫學統計與R語言：生存曲線（survival curves
Stata:一文讀懂 Tobit 模型

Tobit 模型的介紹1.1 受限數據：截斷和截堵1.2 Tobit 模型設定1.3 Tobit 模型的估計1.4 Tobit 模型的假設檢驗1.5 邊際效應及其推導過程2. Stata 範例3.文獻中，後者的別名還包括：「歸併回歸模型」和「審查回歸模型」。
醫學統計與R語言:圓形樹狀圖(circular dendrogram)

醫學統計與R語言：雙因素重複測量方差分析（Two-way repeated measures ANOVA）醫學統計R語言：分面畫boxplot醫學統計與R語言：調節效應分析（Moderation Analysis）醫學統計與R語言：結構方程模型（structural equation model）
醫學統計與R語言:點圖(dotplot)

醫學統計與R語言：這裡的坑你踩過幾回，有序多分類Logistic回歸（Ordinal Logistic Regression）醫學統計與R語言：logsitc回歸校準曲線 Calibration curve醫學統計與R語言：多分格相關係數（polychoric）多序列相關係數（polyserial）Coefficient Omega醫學統計與
醫學統計與R語言:隨機森林與Logistic預測(randomForest vs Logistic regression)

R語言：腫瘤研究中的waterfall plot(瀑布圖) 醫學統計與R語言：多列分組正態性檢驗醫學統計與R語言：Tobit回歸中的Marginal effect醫學統計與R語言：配對均值檢驗可視化加label醫學統計與R語言：線性固定效應模型（Linear fix effect model ）醫學統計與R
R語言 | 回歸分析(四)

如何更方便地考慮到所有因素，並進行統計檢驗呢？我們採用的分析方法叫做線性混合效應模型(linear mixed effects model，簡稱LME)，它本質上是一種多水平模型(multilevel model)。
關於交互項的那些事(二):畫交互效應圖原來如此簡單

比如，考慮如下包含交互項的模型：其中，為虛擬變量，而為連續變量（如也為離散變量，可類似討論），則此模型可寫為這意味著，如果根據虛擬變量取值為0或1將全樣本分為兩個子樣本，則這兩個子樣本所對應的回歸方程並不相同（截距項與斜率均可不同）。
【R語言與高級醫學統計學】

而醫學統計學作為統計學中不可分割的一部分，在我們的醫學科研中的地位越來越重要。「醫學方」在前期已經推出了兩套關於R語言的視頻教程，其一是《R語言快速入門與數據清洗》，其二《R語言可視化作圖》，中間似乎還缺點什麼。
線性混合效應模型R語言應用+1

線性混合效應模型R語言應用+1數據的層次結構模型出發點是個體受環境的影響，從而在同一個環境中的個體，具有一定的相似性。其他類型的層次結構例如在縱向研究中的個體也一樣，每個個體為第二層層，個體的每次觀測為第一層。
基礎方法 | Logit回歸和Probit回歸有區別嗎?

因變量的類型決定了回歸模型的使用！常見的針對分類變量的回歸模型其實主要有三類：第一類，Logit回歸（包括：二分類、多分類和序次Logit回歸）；第二類，Probit回歸；第三類，泊松回歸（又稱為普哇松回歸）。
醫學統計與R語言:Welch's ANOVA and Games-Howell post-hoc test

醫學統計與R語言：多列分組正態性檢驗醫學統計與R語言：Tobit回歸中的Marginal effect醫學統計與R語言：配對均值檢驗可視化加label醫學統計與R語言：線性固定效應模型（Linear fix effect model ）醫學統計與R語言：Tobit回歸模型函數醫學統計與R語言：隨機森林與Logistic預測（randomForest vs Logistic regression）醫學統計與R語言：多重比較P值的可視化醫學統計與R語言：腫瘤研究中的waterfall plot(瀑布圖)

醫學統計與R語言:Probit回歸模型及邊際效應(Marginal effects)

相關焦點

Logit-Probit:非線性模型中交互項的邊際效應解讀

互助問答第68期:分組檢驗和邊際效應問題、ivprobit交乘項設計

Stata:邊際效應知多少?f_able命令(下)

醫學統計與R語言:Tobit回歸模型

醫學統計與R語言:標準Z值一定服從標準正態分布?

「離散選擇模型」Probit模型、Logit模型

r語言 檢驗p值 - CSDN

二值選擇模型(2):Probit與Logit

SPSS統計分析案例:probit概率回歸分析

醫學統計R語言:分面畫boxplot

Stata:一文讀懂 Tobit 模型

醫學統計與R語言:圓形樹狀圖(circular dendrogram)

醫學統計與R語言:點圖(dotplot)

醫學統計與R語言:隨機森林與Logistic預測(randomForest vs Logistic regression)

R語言 | 回歸分析(四)

關於交互項的那些事(二):畫交互效應圖原來如此簡單

【R語言與高級醫學統計學】

線性混合效應模型R語言應用+1

基礎方法 | Logit回歸和Probit回歸有區別嗎?

醫學統計與R語言:Welch's ANOVA and Games-Howell post-hoc test

r語言檢驗p值 - CSDN