計數數據回歸操作程序和數據公開, 認真學起來

2021-02-21 計量經濟圈

凡是搞計量經濟的，都關注這個號了

郵箱：econometrics666@sina.cn

所有計量經濟圈方法論叢的code程序, 宏微觀資料庫和各種軟件都放在社群裡.歡迎到計量經濟圈社群交流訪問.

之前，咱們圈子引薦了很多學術神器，受到海內外學者歡迎和認可。

1.神器! SSCI分區及影響因子查詢, 還有國人發表比例

2.一數學神器誕生! 手寫公式和符號, 竟免費轉成LaTex

3.學術神器Endnote的最詳盡使用方法

4.面板數據模型操作指南, 不得不看的16篇文章

5.免費知雲文獻翻譯最新版下載, 英文文獻閱讀利器

6.Sci-hub最牛逼的英文文獻下載網站,可以實時監測最新可用域名

7.這40個微觀資料庫夠你博士畢業了, 反正憑著這些庫成了教授

8.高效使用Stata的115頁Tips, PDF版本可列印使用

9.Stata16版本可以下載使用了！！！

在閱讀下面這份操作程序之前，各位學者務必先看看「泊松回歸與負二元回歸什麼鬼？」。下面的內容需要學者們有一定計量基礎，還要對計量經濟圈公眾號裡的內容有一定熟悉度，比如「有限混合模型」、「控制函數法」、「雙欄模型」、「截斷數據」、「歸併數據」等。

泊松回歸（英語：Poisson regression）是用來為計數資料和列聯表建模的一種回歸分析。泊松回歸假設反應變量Y是泊松分布，並假設它期望值的對數可被未知參數的線性組合建模。泊松回歸模型有時（特別是當用作列聯表模型時）又被稱作對數-線性模型。

1.微觀計量經濟學在理論和應用上走過的30年

2.機器學習在微觀計量的應用最新趨勢: 大數據和因果推斷

3.機器學習在微觀計量的應用最新趨勢: 回歸模型

4.微觀計量經濟學：政策效應評估

今天，咱們引薦的是計數數據回歸操作代碼，建議各位學者參考學習裡面的操作步驟。咱們會把完整版程序和數據放到社群裡供各位群友研習。在研習下面這份代碼之前，強烈建議各位學者參閱以下20多篇Stata相關文章。

26.多重中介效應的估計與檢驗, Stata MP15可下載

Start

* Stata user-written commands
* hnblogit
* chi2gof
* qcount
* are used

********** SETUP **********

clear all
set more off
version 14
set linesize 82
set scheme s1mono

*********** 1: COUNT REGRESSION: BASICS (POISSON, GLM, NLS, DIAGNOSTICS, NB)

*** 1. BASICS: ANALYSIS WITHOUT REGRESSORS

* Summarize doctor visits data (MEPS 2003 data 65-90 years old: see MUS p.557)
use mus17data.dta
histogram docvis if docvis < 40, discrete
xtitle("# Doctor visits (for < 40 visits)")
graph export countfig01histraw.wmf, replace

* Tabulate docvis after recoding values > 10 to ranges 11-40 and 41-60
generate dvrange = docvis
recode dvrange (11/40 = 40) (41/60 = 60)
label define dvcounts 40 "11-40" 60 "41-60"
label values dvrange dvcounts
tabulate dvrange

*** 1. BASICS: POISSON REGRESSION

drop _all

* Summary statistics for doctor visits data
use mus17data.dta
global xlist private medicaid age age2 educyr actlim totchr
describe docvis $xlist
summarize docvis $xlist, sep(10)

* Poisson with preffered robust standard errors
poisson docvis $xlist, vce(robust) nolog

* Poisson with default ML standard errors
poisson docvis $xlist

* Poisson coefficients as incidence ratios
poisson docvis $xlist, irr vce(robust) nolog

* AME and MEM for Poisson
quietly poisson docvis $xlist, vce(robust)
margins, dydx(*)
margins, dydx(*) atmean

* Older code uses add-ons mfx and margeff now superceded by margins
* margeff
* mfx

* Also factor variables and noncaculus methods
* The i. are discrete and will calculate ME of one unit change (not derivative)
* The c.age##age means age and age-sauared appear and ME is w.r.t. age
quietly poisson docvis i.private i.medicaid c.age##c.age educyr i.actlim totchr, vce(robust)
margins, dydx(*)

*** 1. BASICS: EXAMPLES OF ML CODING

* Poisson providing the log-density and using mlexp for simple ML model
mlexp (-exp({xb:$xlist _cons}) + docvis*({xb:}) - lnfactorial(docvis)),
vce(robust)

* Poisson ML program lfpois to be called by command ml method lf
program lfpois
version 15
args lnf theta1
tempvar lnyfact mu
local y "$ML_y1"
generate double `lnyfact' = lnfactorial(`y')
generate double `mu' = exp(`theta1')
quietly replace `lnf' = -`mu' + `y'*`theta1' - `lnyfact'
end

* Command ml model including defining y and x, plus ml check
ml model lf lfpois (docvis = $xlist), vce(robust)
* ml check
* ml search
ml maximize, nolog

global y docvis
generate cons = 1
global xlist2 private medicaid age age2 educyr actlim totchr cons
sum $y
sum $xlist2

*** 1. BASICS: POISSON MLE AND NEWTON RAPHSON ITERATIONS IN MATA

* Complete Mata code for Poisson MLE NR iterations
mata:
st_view(y=., ., "$y") // read in stata data to y and X
st_view(X=., ., tokens("$xlist"))
b = J(cols(X),1,0) // compute starting values
n = rows(X)
iter = 1 // initialize number of iterations
cha = 1 // initialize stopping criterion
do {
mu = exp(X*b)
grad = X'(y-mu) // k x 1 gradient vector
hes = cross(X, mu, X) // negative of the k x k Hessian matrix
bold = b
b = bold + cholinv(hes)*grad
cha = (bold-b)'(bold-b)/(bold'bold)
iter = iter + 1
} while (cha > 1e-16) // end of iteration loops
mu = exp(X*b)
hes = cross(X, mu, X)
vgrad = cross(X, (y-mu):^2, X)
vb = cholinv(hes)*vgrad*cholinv(hes)*n/(n-cols(X))
iter // number of iterations
cha // stopping criterion
st_matrix("b",b') // pass results from Mata to Stata
st_matrix("V",vb) // pass results from Mata to Stata
end

* Present results, nicely formatted using Stata command ereturn
matrix colnames b = $xlist
matrix colnames V = $xlist
matrix rownames V = $xlist
ereturn post b V
ereturn display

*** 1. BASICS: GLM REGRESSION

* GLM for Poisson with robust standard errors
glm docvis $xlist, family(poisson) link(log) vce(robust) nolog

* GLM for Poisson with GLM corrected standard errors
* These are default ML standard errors multiplied by square root Pearson statistic
glm docvis $xlist, family(poisson) link(log) scale(x2) nolog

*** 1. BASICS: NONLINEAR LEAST SQUARES

* Nonlinear least squares with exponential conditional mean
nl (docvis = exp({xb: $xlist}+{b0})), vce(robust) nolog

*** 1. BASICS: DIAGNOSTICS

* Residuals
quietly glm docvis $xlist, family(poisson) link(log)
predict rraw, response
predict rpearson, pearson
predict rdeviance, deviance
predict ranscombe, anscombe
predict hat, hat
predict cooksd, cooksd
summarize rraw rpearson rdeviance ranscombe hat cooksd, sep(10)
correlate rraw rpearson rdeviance ranscombe

* An R-squared measure
capture drop yphat
quietly poisson docvis $xlist, vce(robust)
predict yphat, n
quietly correlate docvis yphat
display "Squared correlation between y and yhat = " r(rho)^2

* Overdispersion test against V[y|x] = E[y|x] + a*(E[y|x]^2)
quietly poisson docvis $xlist, vce(robust)
predict muhat, n
quietly generate ystar = ((docvis-muhat)^2 - docvis)/muhat
regress ystar muhat, noconstant noheader

* Newer command - use rather than countfit below
quietly poisson docvis $xlist
chi2gof, cells(11) table

* Poisson: Sample vs avg predicted probabilities of y = 0, 1, ..., 10
countfit docvis $xlist, maxcount(10) prm nograph noestimates nofit

*** 1. BASICS: NEGATIVE BINOMIAL REGRESSION MOTIVATION

* Generate Poisson data with sample mean of docvis
quietly summarize docvis
scalar dvmean = r(mean)
set seed 10101 // set the seed !
generate ypoisson= rpoisson(dvmean) // Draw from Poisson(mu=dvmean)
summarize docvis ypoisson
tabulate ypoisson
histogram docvis if docvis < 40, discrete addplot(hist ypoisson, fcolor(white) discrete) ///
xtitle("# Doctor visits versus Poisson") legend(off)
graph export countfig02histpoisson.wmf, replace

* Generate negative binomial (mu=ymean alpha=fromnbreg) generated data
quietly nbreg docvis // Negative binomial on sclar givesalpha
scalar alpha = e(alpha)
display alpha
set seed 10101 // set the seed !
generate mugamma = rgamma(1/alpha,alpha)
generate ynegbin = rpoisson(dvmean*mugamma) // NB generated as a Poisson-gamma mixture
summarize docvis ynegbin mugamma
tabulate ynegbin
histogram docvis if docvis < 40, discrete addplot(hist ynegbin if ynegbin < 40, fcolor(white) discrete) ///
xtitle("# Doctor visits versus negative binomial") legend(off)
graph export countfig03histnegbin.wmf, replace

* Standard negative binomial (NB2) with default SEs
nbreg docvis $xlist, nolog

* Newer command - use rather than countfit below
chi2gof, cells(11) table

* NB2: Sample vs average predicted probabilities of y = 0, 1, ..., 10
countfit docvis $xlist, maxcount(10) nbreg nograph noestimates nofit

* Generalized negative binomial with alpha parameterized
gnbreg docvis $xlist, lnalpha($xlist) nolog

* Comparison of Poisson and NB and various standard error estimates
quietly poisson docvis $xlist
estimates store PDEFAULT
quietly poisson docvis $xlist, vce(robust)
estimates store PROBUST
quietly glm docvis $xlist, family(poisson) link(log) scale(x2)
estimates store PPEARSON
quietly nbreg docvis $xlist
estimates store NBDEFAULT
quietly nbreg docvis $xlist, vce(robust)
estimates store NBROBUST
esttab PDEFAULT PROBUST PPEARSON NBDEFAULT NBROBUST, b(%10.4f) se mtitles pr2

*** 1. ADDITONAL: TRUNCATED AND CENSORED

use mus17data.dta, clear
global xlist private medicaid age age2 educyr actlim totchr

* Zero-truncated negative binomial
ztnb docvis $xlist if docvis>0, nolog
estimates store ZTNB

* Censored negative binomial at y <= 10
generate docviscens10 = docvis
replace docviscens10 = 10 if docvis > 10

* Censored Negbin ML program lfnbcens10 with censoring from right at 10
program lfnbcens10
version 10.1
args lnf theta1 a // theta1=x'b, a=alpha, lnf=lnf(y)
tempvar mu sumgammaratios probyle10 cens
local y $ML_y1 // Define y so program more readable
generate double `mu' = exp(`theta1')
generate double `sumgammaratios' = exp(lngamma(0+(1/`a')))/1
+ exp(lngamma(1+(1/`a')))/1 + exp(lngamma(2+(1/`a')))/2
+ exp(lngamma(3+(1/`a')))/6 + exp(lngamma(4+(1/`a')))/24
+ exp(lngamma(5+(1/`a')))/120 + exp(lngamma(6+(1/`a')))/720
+ exp(lngamma(7+(1/`a')))/5040 + exp(lngamma(8+(1/`a')))/40320
+ exp(lngamma(9+(1/`a')))/362880 + exp(lngamma(10+(1/`a')))/3628800
generate double `probyle10' = (`sumgammaratios'/exp(lngamma((1/`a'))))
*((1/`a')^(1/`a'))*(`mu'^`y')/((1/`a'+`mu')^(1/`a'+`y'))
generate double `cens' = `y' >= 10
quietly replace `lnf' = (1-`cens') * (lngamma(`y'+(1/`a')) - lngamma(1/`a')
- lnfactorial(`y') - (`y'+(1/`a'))*ln(1+`a'*`mu')
+ `y'*ln(`a') + `y'*ln(`mu') ) + `cens'*ln(`probyle10')
end
* Command lfnbcens10 implemented for negative binomial MLE
ml model lf lfnbcens10 (docviscens10 = $xlist) ()
ml maximize, nolog
estimates store NBCENS10

* Comparison with regular negative binomial
quietly nbreg docvis $xlist, nolog
estimates store NBREG
estimates table NBREG ZTNB NBCENS10, equation(1) b(%10.4f) se stats(N ll)

* Ordered logit
* Do for docviscens10 which has 10 categories
ologit docviscens10 $xlist, nolog

*** 1. ADDITONAL: HURDLE MODEL

* Hurdle logit-nb model manually
logit docvis $xlist, nolog
predict dvplogit, p
* Second step uses positives only
ztnb docvis $xlist if docvis>0, nolog
predict dvztnbcm, cm

generate dvhurdle = dvplogit*dvztnbcm

* Same hurdle logit-nb model using the user-written hnblogit command
hnblogit docvis $xlist, nolog
estimates store HURDLENB

*** 1. ADDITONAL: ZERO-INFLATED

* Zero-inflated negative binomial
zinb docvis $xlist, inflate($xlist) vuong nolog
estimates store ZINB
predict dvzinb, n

quietly nbreg docvis $xlist
estimates store NBREG
predict dvnbreg, n

*** 1. ADDITIONAL: MODEL COMPARISON

* Comparison of NB and ZINB using countfit
countfit docvis $xlist, nbreg zinb nograph noestimates

* Compare LL, AIC, BIC across models
estimates table NBREG HURDLENB ZINB, equation(1) b(%12.4f) se
stats(N ll aic bic) stfmt(%12.1f) newpanel

* Compare predicted means across models
summarize docvis dvnbreg dvhurdle dvzinb
correlate docvis dvnbreg dvhurdle dvzinb

*** 2. ADDITIONAL: FINITE MIXTURE MODEL

version 15

* Finite-mixture model using fmm command with constant probabilities
use mus17data.dta, clear
fmm 2, vce(robust): poisson docvis $xlist

* Obtain latent class probabilities
estat lcprob

* Obtain predicted mean for each individual by component
predict yfit*
summarize yfit1 yfit2

* Obtain marginal effects in each component
margins, dydx(*)
margins, dydx(*) atmean noatlegend

* Create histograms of fitted values
quietly histogram yfit1, name(class_1, replace)
quietly histogram yfit2, name(class_2, replace)
quietly graph combine class_1 class_2, iscale(1.1) ycommon xcommon
ysize(2.5) xsize(5)
graph export countfig21finitemixture.wmf, replace

* 2-component mixture of negative binomial
fmm 2, nolog: nbreg docvis $xlist
estat lcprob

*** 2. ADDITONAL: ENDOGENOUS REGRESSORS

use mus17data.dta, clear
global xlist private medicaid age age2 educyr actlim totchr

* GMM (Nonlinear IV) for Poisson: using command gmm
global zlist income ssiratio medicaid age age2 educyr actlim totchr
gmm (docvis - exp({xb:$xlist}+{b0})), instruments($zlist) onestep nolog

* GMM (Nonlinear IV) for Poisson: computation using command optimize
generate cons = 1
local y docvis
local xlist private medicaid age age2 educyr actlim totchr cons
local zlist income ssiratio medicaid age age2 educyr actlim totchr cons
mata
void pgmm(todo, b, y, X, Z, Qb, g, H)
{
Xb = X*b'
mu = exp(Xb)
h = Z'(y-mu)
W = cholinv(cross(Z,Z))
Qb = h'W*h
if (todo == 0) return
G = -(mu:*Z)'X
g = (G'W*h)'
if (todo == 1) return
H = G'W*G
_makesymmetric(H)
}
st_view(y=., ., "`y'")
st_view(X=., ., tokens("`xlist'"))
st_view(Z=., ., tokens("`zlist'"))
S = optimize_init()
optimize_init_which(S,"min")
optimize_init_evaluator(S, &pgmm())
optimize_init_evaluatortype(S, "d2")
optimize_init_argument(S, 1, y)
optimize_init_argument(S, 2, X)
optimize_init_argument(S, 3, Z)
optimize_init_params(S, J(1,cols(X),0))
optimize_init_technique(S,"nr")
b = optimize(S)

Xb = X*b'
mu = exp(Xb)
h = Z'(y-mu)
W = cholinv(cross(Z,Z))
G = -(mu:*Z)'X
Shat = ((y-mu):*Z)'((y-mu):*Z)*rows(X)/(rows(X)-cols(X))
Vb = luinv(G'W*G)*G'W*Shat*W*G*luinv(G'W*G)
st_matrix("b",b)
st_matrix("Vb",Vb)
end

* Nonlinear IV estimator for Poisson: formatted results
matrix colnames b = `xlist'
matrix colnames Vb = `xlist'
matrix rownames Vb = `xlist'
ereturn post b Vb
ereturn display

* Control function approach to endogeneity
* First stage is reduced form to get predicted residuals
global xlist2 medicaid age age2 educyr actlim totchr
regress private $xlist2 income ssiratio, vce(robust)
predict lpuhat, residual

* Second-stage Poisson with robust SEs
poisson docvis private $xlist2 lpuhat, vce(robust) nolog

* Program and bootstrap for Poisson two-step estimator
program endogtwostep, eclass
version 10.1
tempname b
capture drop lpuhat2
regress private $xlist2 income ssiratio
predict lpuhat2, residual
poisson docvis private $xlist2 lpuhat2
matrix `b' = e(b)
ereturn post `b'
end
bootstrap _b, reps(400) seed(10101) nodots nowarn: endogtwostep
estimates store twostep

*** 2. ADDITIONAL: PANEL COUNT REGRESSION

* Describe dependent variables and regressors
use mus18data.dta, clear
describe mdu lcoins ndisease female age lfam child id year

* Summarize dependent variables and regressors
summarize mdu lcoins ndisease female age lfam child id year

* Panel description of dataset
xtset id year
xtdescribe

* Panel summary of dependent variable
xtsum mdu

* Year-to-year transitions in doctor visits
generate mdushort = mdu
replace mdushort = 4 if mdu >= 4
xttrans mdushort
corr mdu L.mdu

* Simple time-series plot for first 20 individuals (= first 85 obs)
quietly xtline mdu if _n<=85, overlay legend(off)
graph export countfig31panelplot.wmf, replace

*** PANEL POISSON: POOLED AND POPULATION AVERAGED

* Pooled Poisson estimator with cluster-robust standard errors
poisson mdu lcoins ndisease female age lfam child, vce(cluster id)

* Without cluster-robust
poisson mdu lcoins ndisease female age lfam child

* Poisson PA estimator with unstructured error correlation and robust VCE
xtpoisson mdu lcoins ndisease female age lfam child, pa corr(unstr) vce(robust)

* Print out the correlation matrix
matrix list e(R)

*** PANEL POISSON: RANDOM EFFECTS

* Poisson random-effects estimator with default standard errors
xtpoisson mdu lcoins ndisease female age lfam child, re

* Poisson random-effects estimator (gamma) with cluster-robust standard errors
xtpoisson mdu lcoins ndisease female age lfam child, re vce(cluster id)
* Preceding same as xtpoisson , re vce(robust)

* Poisson random-effects estimator with normal intercept and default standard errors
xtpoisson mdu lcoins ndisease female age lfam child, re normal

* Poisson random-effects estimator with normal intercept and normal slope for one parameter
* xtmepoisson mdu lcoins ndisease female age lfam child || id: NDISEASE
* Previous command commented out as takes a long time

*** PANEL POISSON: FIXED EFFECTS

* Poisson fixed-effects estimator with WRONG default standard errors
xtpoisson mdu lcoins ndisease female age lfam child, fe

* Poisson fixed-effects estimator with panel-robust standard errors
xtpoisson mdu lcoins ndisease female age lfam child, fe vce(robust)
* This supplants earlier add-on xtpqml
* xtpqml mdu lcoins ndisease female age lfam child, fe i(id) cluster(id)

*** PANEL POISSON: ESTIMATOR COMPARISON

* Comparison of Poisson panel estimators using panel-robust standard errors
global xlist lcoins ndisease female age lfam child
quietly poisson mdu $xlist, vce(cluster id)
estimates store POOLED
quietly xtpoisson mdu $xlist, pa corr(unstr) vce(robust)
estimates store POPAVE
quietly xtpoisson mdu $xlist, re vce(cluster id)
estimates store RE_GAMMA
quietly xtpoisson mdu $xlist, re normal vce(cluster id)
estimates store RE_NORMAL
quietly xtpoisson mdu $xlist, fe vce(robust)
estimates store FIXED
estimates table POOLED POPAVE RE_GAMMA RE_NORMAL FIXED, equations(1) b(%8.4f) se stats(N ll) stfmt(%8.0f)

********** CLOSE OUTPUT

* log close
* clear
* exit

以下是25篇關於面板（動態或靜態）數據的文章，裡面附上了程序和相關文獻，基本上可以解決大部分面板運用中的問題。

1.動態面板門檻回歸程序公布, 使用方法介紹

2.動態面板分位數估計怎麼做？

3.計量大牛白聚山教授, 是這樣講解動態面板分析的

4.動態面板模型的王冠—系統GMM什麼鬼？

5.面板協整與誤差修正模型的操作程序和講解

6.GMM和工具變量在面板數據中的運用

7.HCW面板數據政策評估方法, panel數據構造對照組

8.截面, 時間和面板的門檻回歸模型, threshold

9.面板數據聚類, 因子分析和主成分分析咋做?

10.偽面板回歸是什麼, 諾貝爾經濟學家推薦使用

11.面板數據中介效應的計算程序, 打開面板這扇門

12.中國工企資料庫各年份指標解釋, 面板數據構建地基

13.面板數據中去中心化的交互項回歸什麼情況

14.空間面板回歸模型: SAR, SDM, SAC和SEM

15.面板交互固定效應是什麼, 白聚山教授推動了最前沿的研究

16.面板數據密度圖和時間趨勢圖韓城攻略和常見操作

17.面板數據計量方法全局脈絡和程序使用指南篇

18.面板數據裡處理多重高維固定效應的神器

19.向量自回歸VAR模型操作指南針,為微觀面板VAR鋪基石

20.非線性面板模型中內生性解決方案以及Stata命令

21.面板門檻回歸Stata程序xthreg和其編寫者

22.面板數據、工具變量選擇和HAUSMAN檢驗的若干問題

23.把動態面板命令講清楚了，對Stata的ado詳盡解釋

24.動態面板回歸和軟體操作，單位根和協整檢驗

25.SVAR模型的起源、識別、估計與應用, 系統講述

下面是一些空間計量相關文章，各位學者可以參考閱讀：

1.空間計量經濟學最新進展和理論框架

2.空間和時間的計量，關注二位國人

3.空間計量模型選擇、估計、權重、檢驗

4.空間計量百科全書式的使用指南的do file公開

5.空間計量百科全書式的使用指南, 只此一份掌握此獨門秘籍

6.空間計量的46頁Notes, 區經相關學者可參閱

7.空間計量軟體代碼資源集錦(Matlab/R/Python/SAS/Stata)

8.用R語言做空間計量, 絕不容錯過的簡明教程

9.R軟體中的空間計量經濟學程序包縱覽

10.空間計量的研究領域模型, 發展階段與最新進展

12.中介和調節效應操作指南, 經典書籍和PPT珍藏版

下面這些短連結文章屬於合集，可以收藏起來閱讀，不然以後都找不到了。

計數數據回歸操作程序和數據公開, 認真學起來

相關焦點

Excel如何對數據進行分段計數？

Excel小技巧|多列數據不重複計數

從零開始學Python數據分析【25】--嶺回歸及LASSO回歸(實戰部分)

數據分析神操作,第一個操作就跪了!(建議收藏)

怎樣用Spark機器學習使用戶數據可視化?

教你學Python32-預測數值型數據:數據回歸

數據分析常用五大軟體介紹

Excel數據分析篇:線性回歸

數據科學&機器學習的筆記1:分類、線性回歸和邏輯回歸

數據統計分析和圖形表達 | 直方圖的方方面面

Mita, 2020小諾獎RDD女王最具影響力的成名作, 附數據和計量程序

數據收集及數據分析工具介紹

從零開始學Python數據分析【22】--線性回歸診斷(第一部分)

Excel怎麼使用COUNT函數進行計數操作

從數據到美化,一文講解Excel數據透視表基本操作

數據分析5大軟體Excel、SAS、R、SPSS、Python優勢分析

從零開始學Python數據分析【21】--線性回歸(實戰部分)

缺失值插補方法程序和數據release, 總結加註解

數據分析軟體介紹(一):Python和Excel

數據科學家要先學邏輯回歸?圖樣圖森破!