*-處理前數據 gen byte period = 0 //pre-treatment label var period "是否處理後" gen long id = _n label var id "個體id" gen byte gender = uniform() > 0.5 label var gender "性別" gen age = uniform() label var age "年齡" gen fitness = normal(gender*0.25 - age + invnorm(uniform())*0.1) label var fitness "健康程度" gen weight = normal(-gender*0.25 + age*0.25 - fitness*0.25 + invnorm(uniform())*0.1) label var weight "體重" gen treated = normal(fitness + invnorm(uniform())*0.25) > 0.73 label var treated "是否處理組" save `tmp'
*-處理後數據 replace period = 1 // after treatment replace weight = weight + weight*(uniform()-0.5)*0.2 - weight*(fitness-0.5)*0.25
*-合併處理前後的數據 append using `tmp' sort id period replace weight = int(30.5+100*weight) replace age = int(18.5+50*age) gen effect = treated*period // treatment effect (interaction term for DiD) label var effect "處理效應的交互" order id treated period effect gender age weight fitness
*-傾向得分 probit treated age gender weight if period == 0 // omitting "unobserved" selection predict score // probensity score label var score "傾向得分"
des
此時,我們就得到一組有關於健康狀況的數據,其基本的描述如下:
obs: 4,000 vars: 9 6 Nov 2020 18:24 -- storage display value variable name type format label variable label -- id long %12.0g 個體id treated float %9.0g 是否處理組 period byte %8.0g 是否處理後 effect float %9.0g 處理效應的交互 gender byte %8.0g 性別 age float %9.0g 年齡 weight float %9.0g 體重 fitness float %9.0g 健康程度 score float %9.0g 傾向得分 -- Sorted by: id period
storage display value variable name type format label variable label --- _match long %12.0g match id _weight double %10.0g pweight _distance double %10.0g neighbor distance
Support | Treated Control --+-- Total | 387 1613 Without | 0 0 With | 387 1613 --+-- Matched | 387 448 Clustered | 0 0 Clusters | 387 448
report(varlist) 選項使用
cap drop _* ultimatch score if period == 0, treated(treated) report(gender age) cap drop _* ultimatch score if period == 0, treated(treated) report(gender age) unmatched
選項 report(varlist) 報告匹配後處理組與觀測值在指定變量 varlist 上的差異及對應 t 檢驗,其附屬選項 unmatched 進一步報告匹配前 varlist 的差異及對應 t 檢驗。
. ultimatch score if period == 0, treated(treated) report(gender age) Nearest Neighbor
Support | Treated Control --+-- Total | 387 1613 Without | 0 0 With | 387 1613 --+-- Matched | 387 448 Clustered | 0 0 Clusters | 387 448 --+---- Matched | Treated Control | StdErr t p>|t| --+--+- gender | .599483204 .589147287 | .0390341 0.26 0.791 age | 34.8165375 34.6925065 | .967815 0.13 0.898 --
.ultimatch score if period == 0, treated(treated) report(gender age) unmatched Nearest Neighbor -- Support | Treated Control --+---- Total | 387 1613 Without | 0 0 With | 387 1613 --+---- Matched | 387 448 Clustered | 0 0 Clusters | 387 448 --+- Unmatched | Treated Control | StdErr t p>|t| --+----+- gender | .599483204 .458152511 | .0281268 5.02 0.000 age | 34.8165375 45.2461252 | .783574 -13.31 0.000 --+----+- Matched | Treated Control | StdErr t p>|t| --+----+- gender | .599483204 .589147287 | .0390341 0.26 0.791 age | 34.8165375 34.6925065 | .967815 0.13 0.898 ----
ultimatch 默認將距離相同的觀測值視為同一個觀測值,你也可以通過選項 single 指定從最近鄰中隨機抽取一個作為反事實觀測值。此外,你也可以通過選項 greedy 指定不放回抽樣。
cap drop _* ultimatch score if period == 0, treated(treated) report(gender age weight) single sort period score treated list id treated _match _distance if _match <= 3
cap drop _* ultimatch score if period == 0, treated(treated) report(gender age weight) single greedy sort period score treated list id treated _match _distance if _match <= 3
cap drop if _copy == 1 cap drop _* ultimatch score if period == 0, treated(treated) report(gender age weight) copy tab _copy tab _weight if treated == 1
cap drop if _copy == 1 cap drop _* ultimatch score if period == 0, treated(treated) report(gender age weight) copy full tab _copy tab _weight if treated == 1
方法二:半徑匹配(radius matching)半徑匹配法是事先設定半徑,找到所有設定半徑範圍內的單位圓中的控制樣本,半逕取值為正。隨著半徑的降低,匹配的要求越來越嚴。方法五:馬氏距離由於在傾向得分匹配第一階段估計傾向得分時存在不確定性,Abadie and Imbens的相關研究又重新回到更簡單的馬氏距離,進行有放回且允許並列的k近鄰匹配,針對非精確匹配一般存在偏差,提出了偏差校正的方法,通過回歸的方法估計偏差,然後得到偏差校正匹配估計量。