即將開幕的STATA前沿培訓精講:帶異質性處理效應的雙向固定效應估計|從精確斷點、模糊斷點估計的實際操作|弱工具變量穩健推斷

2021-02-14 經濟理論與實證建模

American Economic Review 2020, 110(9): 2964–2996

https://doi.org/10.1257/aer.20181169 2964

Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects†

By Clément de Chaisemartin and Xavier D』Haultfoeuille*

Linear regressions with period and group fixed effects are widely used to estimate treatment effects. We show that they estimate weighted sums of the average treatment effects (ATE ) in each group and period, with weights that may be negative. Due to the negative weights, the linear regression coefficient may for instance be negative while all the ATEs are positive. We propose another estimator that solves this issue. In the two applications we revisit, it is significantly

different from the linear regression estimator. (JEL C21, C23,D72, J31, J51, L82)

文末閱讀原文，下載文獻、數據、代碼，按本推文消化理解，跟詳細了解可參考以下課程大綱：

即將開班 | Python數據挖掘與Stata應用能力提升與實證前沿寒假工作坊

· 時間：2021年1月25-26日（鄧旭東）

2021年1月27-28日（江艇）

2021年1月29-30日（王非）

2021年1月31日-2月1日（司繼春）

據了解，報名平臺仍然處在開放狀態

報名諮詢

13967800957（同微信）（陳老師）

19817128496（同微信）（任老師）

摘要：

具有分期和組固定效應的線性回歸被廣泛用於估計處理效應。研究表明，它們估計了每一組和每一時期的平均處理效應(ATE)的加權和，其權重可能為負。例如，由於權值為負，線性回歸係數可能為負，而所有的ATEs都為正。提出了另一種估計方法來解決這個問題。在回顧的兩個應用中，它與線性回歸估計有顯著的不同。

引言：

估計一個處理對一個結果影響的一個流行方法是比較在一段時間內經歷不同演變的處理組。在實踐中，這個想法是通過估計控制組和時間固定效應的回歸來實現的。此後，將其稱為雙向固定效應(FE)回歸。本文進行了一項調查，發現美國經濟評論(AER)在2010年至2012年間發表的所有實證文章中有19%使用了雙向FE回歸來估計處理對結果的影響。當處理效果在組間和時間上是恆定的，這種回歸估計了在標準共同趨勢假設下的效果。然而，通常很難相信處理效果是恆定的。例如，最低工資對就業的影響在美國各個縣可能不同，而且可能會隨時間而改變。本文研究了違反常數效應假設時雙向有限元回歸的性質。

結論：

在2010年至2012年期間發表在《AER》雜誌上的經驗主義文章中，有近20%使用了帶有組和時期固定效應的回歸來估計處理效果。在一個共同趨勢假設下，這些回歸估計了每一組和每一時期的處理效果的加權和。權重可能是負的:在一個應用程式中，發現超過40%的權重是負的。當處理效果是異質的，在組間或隨著時間的推移，負權重是一個問題。那麼，可以得到，在每一組和時間周期，在這些回歸中，處理的係數是負的，而處理效果是正的。因此，本文提出了一個新的估計量來解決這個問題。這個估計量估計了在轉換處理的那一組的處理效果。它不依賴於任何處理效果的同質性條件。它是由fuzzydid和did_multiplegt Stata包計算的。在回顧的兩個應用中，該估計量與雙向固定效應估計量在有顯著的和經濟的不同。

從第一個差異到Gentzkow中的面板示例，並創建變量

set matsize 10000
* Indicate below the folder where you have put all the .ado filesadopath ++ "H:\stata_aer.9Two-Way.Fixed.Effects.Estimators.with.Heterogeneous.Treatment.Effects/Ado files"cd H:\stata_aer.9Two-Way.Fixed.Effects.Estimators.with.Heterogeneous.Treatment.Effects
* Indicate below the folder where you have put the .dta filesuse voting_cnty_clean.dta, clear
set more off
/*a. Going from the first difference to the panel sample in Gentzkow, and creating variables */
qui{gen sample=0gen tminus1sample=0forvalue i=1872(4)1928 {replace sample=1 if (year==`i')&mainsample==1sort cnty90 yearreplace tminus1sample=1 if sample==0&sample[_n+1]==1&cnty90==cnty90[_n+1]&year==`i'-4replace sample=1 if sample[_n+1]==1&cnty90==cnty90[_n+1]&year==`i'-4}tab sample mainsample keep if sample==1drop sample
*First differences, lags and leads of treatment and outcome variable
xtset cnty90 year
gen numdailies_l1=l4.numdailiesgen prestout_l1=l4.prestoutgen changedailies=numdailies-numdailies_l1gen changeprestout=prestout-prestout_l1gen changedailies_l1=l4.changedailiesgen changeprestout_l1=l4.changeprestoutgen changedailies_l2=l8.changedailiesgen changedailies_for=f4.changedailies
* Creating the state dummies, to be used as controls
tab st, gen(st)
keep if year>=1868keep if year<=1928
/* b. First difference and fixed effects regression (the results are displayed at the end of the program */
areg changeprestout changedailies if mainsample, absorb(styr) cluster(cnty90)scalar betafd=_b[changedailies]scalar se_fd=_se[changedailies]scalar N_fd=e(N)
tab styr, gen(styr)areg prestout i.year numdailies styr1-styr666, absorb(cnty90) cluster(cnty90)scalar betafe=_b[numdailies]scalar se_fe=_se[numdailies]scalar N_fe=e(N)
* FE and FD give significantly different results?set seed 1scalar diff_fe_fd=betafd-betafematrix A=0
forvalue i=1/100{preservebsample, cluster(cnty90)areg changeprestout changedailies if mainsample, absorb(styr)scalar beta2=_b[changedailies]areg prestout i.year numdailies styr1-styr666, absorb(cnty90)matrix A=A\beta2-_b[numdailies]restore}
preservedrop _allsvmat Adrop if _n==1sum A1scalar se_diff_fe_fd = r(sd)scalar t_st_diff_fe_fd = diff_fe_fd/r(sd)restore}
在計算一階差分和固定效應回歸的權重之前準備變量
qui{keep prestout numdailies changeprestout changedailies cnty90 styr styr1-styr666 ///  year numdailies tminus1sample mainsample st st1-st48
/* Replacing outcome and treatment by missings for 1868.For weights computation, important to replace changeprestout and changedailies by missings for observations in the sample in levels but not in the first-diff sample. On the other hand, important to keep those variables as such for placebos => stored in vars changeprestout_placebo and changedailies_placebo. */
gen changeprestout_placebo=changeprestoutgen changedailies_placebo=changedailiesreplace changeprestout=. if tminus1sample==1replace changedailies=. if tminus1sample==1}
計算FE回歸的權重，見第2段第21頁，結果證實一階差分估計結果並不穩健
twowayfeweights prestout cnty90 year numdailies, type(feTR) controls(styr1-styr666)
/* e. Computing weights attached to FD regression, see 2nd paragraph p.21  Also: link b/w weights and year, see 3rd paragraph p.21*/
twowayfeweights changeprestout cnty90 year changedailies numdailies, type(fdTR) ///        controls(styr1-styr666) test_random_weights(year)
DID_M的計算
f. Computation of the DID_M
/* Defining the "super groups" for each year by creating 2 dummies for increase    & decrease, and defining the final sample as observations for which change in    number of newspapers can be computed */
qui{xtset cnty90 yeargen G_T=.gen sample=.forvalue i=1872(4)1928 {gen group`i'=(changedailies>0)-(changedailies<0) if (year==`i')& ///  changedailies !=. & mainsample==1 & changeprestout !=.
replace G_T=group`i' if (year==`i')&changedailies!=.&mainsample==1&changeprestout!=.replace sample=1 if (year==`i')&changedailies!=.&mainsample==1&changeprestout!=.gen group`i'_increase=(group`i'>0) if (year==`i')&group`i'!=.gen group`i'_decrease=(group`i'<0) if (year==`i')&group`i'!=.sort cnty90 yearreplace group`i'=group`i'[_n+1] if cnty90==cnty90[_n+1]&year==`i'-4replace sample=1 if cnty90==cnty90[_n+1]&year==`i'-4&sample[_n+1]==1replace group`i'_increase=group`i'_increase[_n+1] if cnty90==cnty90[_n+1]&year==`i'-4replace group`i'_decrease=group`i'_decrease[_n+1] if cnty90==cnty90[_n+1]&year==`i'-4gen year`i'=(year==`i')}
keep if sample==1tab yeargen G_T_for=f4.G_Tsum sample G_T G_T_fortab G_T
DID_M的點估計(同樣，結果顯示在程序的最後)，穩健的模糊DID
* Point estimate of the DID_M (again, results are displayed at the end of the program)
discard
set seed 1fuzzydid prestout G_T G_T_for year numdailies, tc newcateg(0 1 2 1000) ///    qualitative(st1-st48) nose
scalar did_m=el(e(b_LATE),1,1)* scalar se_did_m=el(e(se_LATE),1,1)scalar N_did_m=e(N)
* Testing difference between DID_M and fd estimator in Gentzkow et al. 
set seed 1scalar diff_did_m_fd = did_m - betafd
matrix A=0,0
forvalue i=1/100{preservebsample, cluster(cnty90)fuzzydid prestout G_T G_T_for year numdailies, tc newcateg(0 1 2 1000) ///    qualitative(st1-st48) nosescalar DIDM_bs=el(e(b_LATE),1,1)areg changeprestout changedailies if mainsample, absorb(styr)scalar betafd_bs=_b[changedailies]matrix A=A\DIDM_bs,betafd_bsrestore}preservedrop _allsvmat Adrop if _n==1sum A1scalar se_did_m=r(sd)gen A3=A1-A2sum A3scalar se_diff_did_m_fd = r(sd)scalar t_st_diff_did_m_fd = diff_did_m_fd/r(sd)restore
計算安慰劑DID_M
// g. Computation of placebo DID_M in Gentzkow
/* Restricting sample to (g,t)s for which treatment stable between t-2 and t-1,    and adding the t-1 obs of those (g,t)s. */
xtset cnty90 yeargen fd_numdailies_l1=l4.changedailies_placebogen prestout_l1=l4.prestoutgen Gb_placebo=.forvalue i=1872(4)1928 {replace Gb_placebo=(changedailies_placebo>0)-(changedailies_placebo<0) if ///(year==`i') & changedailies_placebo!=. & mainsample==1 & fd_numdailies_l1==0 & ///changeprestout_placebo!=.} gen Gf_placebo=f4.Gb_placebo
/* Estimating the DID_M again, on the subsamples of groups whose treatment is    stable between T-2 and T-1. */
set seed 1fuzzydid prestout Gb_placebo Gf_placebo year numdailies, tc ///    newcateg(0 1 2 1000) qualitative(st1-st48) breps(100) cluster(cnty90) scalar did_m_sub=el(e(b_LATE),1,1)scalar se_did_m_sub=el(e(se_LATE),1,1)scalar N_did_m_sub=e(N)

/* Placebo DID_M, on the subsamples of groups whose treatment is stable    between T-2 and T-1. */
set seed 1fuzzydid prestout_l1 Gb_placebo Gf_placebo year numdailies, tc ///    newcateg(0 1 2 1000) qualitative(st1-st48) breps(100) cluster(cnty90) scalar did_m_pl=el(e(b_LATE),1,1)scalar se_did_m_pl=el(e(se_LATE),1,1)scalar N_did_m_pl=e(N)
matrix res = (betafe, se_fe, N_fe\betafd, se_fd, N_fd\ did_m, se_did_m, ///       N_did_m\did_m_pl, se_did_m_pl, N_did_m_pl\did_m_sub, se_did_m_sub, ///        N_did_m_sub)
matrix rownames res=betaFE betaFD didM didM_pl didM_submatrix colnames res=estimate se N
matrix res_diff = (diff_fe_fd, se_diff_fe_fd, t_st_diff_fe_fd \ diff_did_m_fd, ///           se_diff_did_m_fd, t_st_diff_did_m_fd) 
matrix rownames res_diff = fe_vs_fd did_m_vs_fdmatrix colnames res_diff = value se t-stat
理論機理的分析參見：
https://www.stata.com/meeting/chicago19/
https://www.stata.com/meeting/chicago19/slides/chicago19_Goodman-Bacon.pdf
DID策略下雙向固定效應估計量在估計什麼？
最新AER論文速遞（含Stata操作）：異質性處理效應下的雙向固定效應估計量
何謂模糊DID：
眾所周知，雙重差分法（DID）是一種估計幹預效果的方法，其基礎的設定是：對照組在兩期都不受幹預，而處理組在第一期不受幹預而在第二期受到幹預。然而，實際應用情況常常與上述這種急劇變化（sharp）的DID有所出入——在模糊（fuzzy）的DID中，可能沒有任何一組顯示出了急劇變化（即不存在「乾淨」的處理組），也可能沒有任何一組完全未受幹預（即不存在「乾淨」的對照組）。此項研究將針對此類模糊DID的估計展開討論。
論文《模糊的雙重差分模型（Fuzzy Differences-in-Differences）》的一篇論文解析，該論文於2018年4月發表在《The Review of Economic Studies（RES）》雜誌上。論文作者包括C. DE CHAISEMARTIN和X. D』HAULTFOEUILLE。
RES論文解析：模糊的雙重差分法
http://www.jijitang.com/article/5c4c23e7a92bdaba7c0ea503/RES-lun-wen-jie-xi-mo-hu-di-shuang-zhong-cha-fen-fa
Supplementary data (在上述原文下載頁面中，數據和 Stata 實現程序, 部分使用 Matlab實現)
https://academic.oup.com/restud/article/doi/10.1093/restud/rdx049/4096388/Fuzzy-DifferencesinDifferences
https://academic.oup.com/restud/article-abstract/85/2/999/4096388?redirectedFrom=fulltext
可以在線安裝：
help fuzzydid學習模糊DID語法：
Title    fuzzydid -- Estimation with fuzzy differences-in-differences designsSyntax        fuzzydid Y G T D [if] [in] [, did tc cic lqte newcateg(numlist) numerator partial nose cluster(varname) breps(#) eqtest tagobs continuous(varlist) qualitative(varlist) modelx(reg1 reg2 reg3) sieves sieveorder(#)]    Y is the outcome variable.    G is the group variable or variables.  We refer to section 4.2 of Chaisemartin, D'Haultfœuille, and Guyonvarch (2019) for more details on how to construct this variable or these variables.    T is the time-period variable.    D is the treatment variable.  It can be any ordered variable.
Andrew Goodman-Bacon相關圖示

https://t.co/S4uEJl3sm6?amp=1
2X2DID估計量對應權重：
模糊斷點Fuzzy_RD經典案例複製：理性無知假說的檢驗：來自巴西自然實驗的證據 |開學第九講案例複製操作步驟
弱工具變量穩健推斷、PSM-DID合理使用等的Stata操作要點(附字幕版Impact of  COVID-19crisis)
該部分內容將在此課程精講：
即將開班 | Python數據挖掘與Stata應用能力提升與實證前沿寒假工作坊
· 時間：2021年1月25-26日（鄧旭東）
            2021年1月27-28日（江艇）
            2021年1月29-30日（王非）
            2021年1月31日-2月1日（司繼春）
據了解，報名平臺仍然處在開放狀態
報名諮詢
13967800957（同微信）（陳老師）
19817128496（同微信）（任老師）
帶異質性處理效應的雙向固定效應估計不穩健時，Fuzzy-DID來幫忙|補充更新
該部分內容將在此課程精講：
即將開班 | Python數據挖掘與Stata應用能力提升與實證前沿寒假工作坊

即將開幕的STATA前沿培訓精講:帶異質性處理效應的雙向固定效應估計|從精確斷點、模糊斷點估計的實際操作|弱工具變量穩健推斷

相關焦點

帶異質性處理效應的雙向固定效應估計不穩健時,Fuzzy-DID來幫忙|補充更新

斷點回歸(RD)學習手冊(包含設計前提條件內生分組等顯著性檢驗、精確斷點&模糊斷點等全套標準操作)

Stata:斷點回歸分析教程

Stata:斷點回歸分析設計

2021寒假Stata初高級學習安排_18小時初級+18小時高級_含論文範例

【Stata更新】新增中介效應、調節效應、交互項及Stata操作--手把手教你Stata系列課程更新啦!

RDD:斷點回歸可以加入控制變量嗎?

斷點回歸:3篇中文綜述和介紹

RDD: 斷點回歸命令rdrobust(附rdbwselect、rdplot)及Stata實現

利用獨特數據, 地理斷點RDD和IV研究中國環境議題!

中國女學者在JPE發文,用獨特數據,地理斷點RDD和IV研究環境議題

Stata: 斷點回歸 (RDD) 教程

斷點回歸前沿-多斷點RDD

人生苦短,我學stata

再談斷點回歸 (RDD):命令rdrobust、 rdbwselect、rdplot及Stata實現

處理效應異質性分析——機器學習方法帶來的機遇與挑戰

內生轉換模型vs內生處理模型vs樣本選擇模型vs工具變量2SLS

Stata數據統計分析及模型應用核心技術與應用培訓

政策效應評估的四種主流方法Policy evaluation