微信公眾號:醫學統計與R語言
如果你覺得對你有幫助,歡迎轉發
A probit regression is a version of the generalized linear model used to model dichotomous outcome variables. It uses the inverse standard normal distribution as a linear combination of the predictors. The binary outcome variable Y is assumed to have a Bernoulli distribution with parameter p (where the success probability is
p∈(0,1)p∈(0,1)). Hence, the probit link function isRemember back to intro stats when you had to look up in Z tables the area under the normal curve for a specific Z value? That area represents a cumulative probability: the probability that Z is less than or equal to the specified Z value.
Coefficients for probit models can be interpreted as the difference in Z score associated with each one-unit difference in the predictor variable.
Although the effect on Z of a change in X is linear, the link between z and the dependent variable Y is nonlinear since Φ is a nonlinear function of X.
輸入1:rm(list=ls())
setwd("C:/Users/mooshaa/Desktop")
library(rio)
bilog <- import("log.sav")
head(bilog)
bilog[,c(1,2,6)] <- lapply(bilog[,c(1,2,6)] ,as.factor)
xtabs(~Gender + Smoking_status, data = bilog)
Smoking_status
Gender 0 1
0 94 27
1 17 80
logitmodel <- glm(Smoking_status~., data = bilog,
family = binomial(link="logit"))
probitmodel <- glm(Smoking_status~., data = bilog,
family = binomial(link="probit"))
summary(probitmodel)
Call:
glm(formula = Smoking_status ~ ., family = binomial(link = "probit"),
data = bilog)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1019 -0.5894 -0.1353 0.5619 3.2125
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.103e+00 8.498e-01 -2.475 0.01333 *
Gender1 1.783e+00 2.624e-01 6.793 1.10e-11 ***
Race1 5.124e-01 4.927e-01 1.040 0.29838
Race2 1.914e-01 3.307e-01 0.579 0.56263
Race3 3.181e-01 3.209e-01 0.991 0.32164
Race4 1.204e+00 3.955e-01 3.044 0.00233 **
Age 5.499e-02 1.223e-02 4.495 6.97e-06 ***
Income -6.602e-06 5.719e-06 -1.154 0.24839
Cigarettes -2.856e-02 1.267e-02 -2.254 0.02420 *
---
Signif. codes: 0 『***』 0.001 『**』 0.01 『*』 0.05 『.』 0.1 『 』 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 302.14 on 217 degrees of freedom
Residual deviance: 174.24 on 209 degrees of freedom
AIC: 192.24
Number of Fisher Scoring iterations: 6
library(stargazer)
stargazer(logitmodel, probitmodel, type = "text", style = "demography", ci = T)
----
Smoking_status
logistic probit
Model 1 Model 2
----
Gender1 3.101*** 1.783***
(2.155, 4.047) (1.268, 2.297)
Race1 0.913 0.512
(-0.885, 2.711) (-0.453, 1.478)
Race2 0.218 0.191
(-0.988, 1.423) (-0.457, 0.840)
Race3 0.504 0.318
(-0.674, 1.682) (-0.311, 0.947)
Race4 2.082** 1.204**
(0.669, 3.496) (0.429, 1.979)
Age 0.101*** 0.055***
(0.056, 0.146) (0.031, 0.079)
Income -0.00001 -0.00001
(-0.00003, 0.00001) (-0.00002, 0.00000)
Cigarettes -0.056* -0.029*
(-0.101, -0.011) (-0.053, -0.004)
Constant -3.720* -2.103*
(-6.714, -0.725) (-3.769, -0.438)
N 218 218
Log Likelihood -86.473 -87.122
AIC 190.947 192.245
----
*p < .05; **p < .01; ***p < .001
cbind(coef= coef(probitmodel),confint(probitmodel))
coef 2.5 % 97.5 %
(Intercept) -2.103163e+00 -3.799258e+00 -4.546255e-01
Gender1 1.782577e+00 1.278533e+00 2.310248e+00
Race1 5.124061e-01 -5.049788e-01 1.550530e+00
Race2 1.914387e-01 -4.965334e-01 8.801854e-01
Race3 3.180726e-01 -3.212090e-01 9.825868e-01
Race4 1.203820e+00 4.092421e-01 2.031129e+00
Age 5.498523e-02 3.202985e-02 7.950719e-02
Income -6.601737e-06 -1.783996e-05 4.584988e-06
Cigarettes -2.855577e-02 -5.333771e-02 -4.533804e-03
predict(newdata=bilog[1,],probitmodel,"response")
pnorm(predict(newdata=bilog[1,],probitmodel,"link"))
結果5:0.5040487
probdata <- data.frame(Gender=c(0,1),Race=c(1,1),
Age=c(rep(mean(bilog$Age),2)),
Income=c(rep(mean(bilog$Income),2)),
Cigarettes=c(rep(mean(bilog$Cigarettes),2)))
probdata [,c(1,2)] <- lapply(probdata [,c(1,2)] ,factor)
prob<- predict(newdata=probdata,probitmodel,"response")
prob
1 2
0.2725377 0.8804864
diff(prob)
0.6079486
install.packages("mfx")
library(mfx)
probitmfx(formula=Smoking_status~., data=bilog,atmean = F)
This function estimates a probit regression model and calculates the corresponding marginal effects.
結果8:Call:
probitmfx(formula = Smoking_status ~ ., data = bilog, atmean = F)
Marginal Effects:
dF/dx Std. Err. z P>|z|
Gender1 5.2417e-01 6.8890e-02 7.6088 2.767e-14 ***
Race1 1.1523e-01 1.0894e-01 1.0577 0.2901849
Race2 4.3800e-02 7.6774e-02 0.5705 0.5683383
Race3 6.9941e-02 6.8452e-02 1.0217 0.3069004
Race4 2.5896e-01 7.4118e-02 3.4938 0.0004761 ***
Age 1.2361e-02 2.4540e-03 5.0371 4.727e-07 ***
Income -1.4841e-06 1.2775e-06 -1.1617 0.2453464
Cigarettes -6.4196e-03 2.7760e-03 -2.3125 0.0207510 *
---
Signif. codes: 0 『***』 0.001 『**』 0.01 『*』 0.05 『.』 0.1 『 』 1
dF/dx is for discrete change for the following variables:
[1] "Gender1" "Race1" "Race2" "Race3" "Race4"
prob.marg<-coef(probitmodel)*mean(dnorm(predict(probitmodel)), na.rm=T)
prob.marg
(Intercept) Gender1 Race1 Race2 Race3 Race4 Age Income
-4.728079e-01 4.007377e-01 1.151930e-01 4.303696e-02 7.150525e-02 2.706284e-01 1.236112e-02 -1.484123e-06
Cigarettes
-6.419566e-03