關於具體 PSM 方法的原理,我不做過多闡述,這裡我僅討論teffects psmatch和psmatch2在stata中的估計結果不相同的一個原因。
stata15及之後的版本中有個teffects模塊,PSM 方法也可以用其實現,一般的psmatch2命令用來做 psm 是比較多的,但,psmatch2對標準差的估計是有問題的,其報告結果的時候都會提示Note: S.E. does not take into account that the propensity score is estimated.,而teffects psmath的標準差你大可以放心。
這篇文章Propensity Score Matching in Stata using teffects (連接:https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm)關於psmatch2和teffects psmatch的講解是比較詳細的,該文中也指出對命令選項進行調整,理應可以獲得同樣的係數,psmatch2 t x1 x2, out(y) logit ate 和teffects psmatch (y) (t x1 x2), atet應該可以獲得同樣的ATT。而,部分學者使用psmatch2和teffects psmatch命令對同一個數據進行估計時,往往卻發現兩個命令的估計結果不相同,甚至結論完全相反。
至於為什麼會導致這種情況發生,原因在於psmatch2在最近鄰匹配時,如果多個控制組個體與幹預組個體具有相同的最近距離,那麼不加ties選項的psmatch2將會選擇最先遇到的控制組個體作為匹配,因此,樣本的順序會影響匹配樣本,而影響估計結果,如果加了ties選項,將會用到所有相同最近距離的控制組個體的平均結果作為幹預組個體的匹配,而teffects psmatch則是採用後者的方法。
故,當你使用兩個命令卻發現獲得不同的結果時,例,att 與teffects psmatch (y) (treat xlist), atet相差很大時,你應當檢查你的psmatch2 treat xlist,out(y) logit ate是否有ties選項,這個可能是係數差異的一個可能原因。
clear all
frames create data
frames change data
webuse cattaneo2frames copy data frames1
frames change frames1
sum bweight mbsmoke mmarried c.mage##c.mage fbaby meduVariable | Obs Mean Std. Dev. Min Max
---+--
bweight | 4,642 3361.68 578.8196 340 5500
mbsmoke | 4,642 .1861267 .3892508 0 1
mmarried | 4,642 .6996984 .4584385 0 1
mage | 4,642 26.50452 5.619026 13 45
|
c.mage#|
c.mage | 4,642 734.0564 305.2242 169 2025
---+--
|
fbaby | 4,642 .4379578 .4961893 0 1
medu | 4,642 12.68957 2.520661 0 17
1.Logitlogit mbsmoke mmarried c.mage##c.mage fbaby meduIteration 0: log likelihood = -2230.7484
Iteration 1: log likelihood = -2053.769
Iteration 2: log likelihood = -2043.2897
Iteration 3: log likelihood = -2043.2504
Iteration 4: log likelihood = -2043.2504
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
----
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----+----
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
----predict pr_(option pr assumed; Pr(mbsmoke))
2.psmatch22.1 using pscore()psmatch2 mbsmoke, out(bweight) pscore(pr_) ate logit---
Variable Sample | Treated Controls Difference S.E. T-stat
---+----
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3334.84259 -197.18287 55.6185293 -3.55
ATU | 3412.91159 3164.00185 -248.909741 . .
ATE | -239.281991 . .
---+----
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-+-+
Untreated | 3,778 | 3,778
Treated | 864 | 864
-+-+
Total | 4,642 | 4,642
2.2 generalpsmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
----
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----+----
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
----
---
Variable Sample | Treated Controls Difference S.E. T-stat
---+----
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3334.84259 -197.18287 55.6185293 -3.55
ATU | 3412.91159 3164.00185 -248.909741 . .
ATE | -239.281991 . .
---+----
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-+-+
Untreated | 3,778 | 3,778
Treated | 864 | 864
-+-+
Total | 4,642 | 4,642
3. teffects psmatchteffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),atet nn(1)Treatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
---
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+----
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
---兩個命令的估計結果不同?
4. 計算結果不同?frames create frame2
frames change frame2use http://ssc.wisc.edu/sscc/pubs/files/psmsumVariable | Obs Mean Std. Dev. Min Max
---+--
x1 | 1,000 -.012963 1.000053 -3.6593 3.084742
x2 | 1,000 -.0246025 1.034555 -3.363018 3.399474
t | 1,000 .333 .4715224 0 1
y | 1,000 .3474242 1.957462 -5.494524 6.873514psmatch2 t x1 x2, out(y) logit ateLogistic regression Number of obs = 1,000
LR chi2(2) = 222.78
Prob > chi2 = 0.0000
Log likelihood = -524.89072 Pseudo R2 = 0.1751
---
t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+----
x1 | .9068298 .0885341 10.24 0.000 .7333062 1.080353
x2 | .8100408 .0816962 9.92 0.000 .6499192 .9701624
_cons | -.8528442 .0788823 -10.81 0.000 -1.007451 -.6982378
---
---
Variable Sample | Treated Controls Difference S.E. T-stat
---+----
y Unmatched | 1.8910736 -.423243358 2.31431696 .109094342 21.21
ATT | 1.8910736 .930722886 .960350715 .168252917 5.71
ATU |-.423243358 .625587554 1.04883091 . .
ATE | 1.01936701 . .
---+----
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-+-+
Untreated | 667 | 667
Treated | 333 | 333
-+-+
Total | 1,000 | 1,000teffects psmatch (y) (t x1 x2), atetTreatment-effects estimation Number of obs = 1,000
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 1
---
| AI Robust
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+----
ATET |
t |
(1 vs 0) | .9603507 .1204748 7.97 0.000 .7242245 1.196477
---兩個命令的估計結果又相同了?
5. 差異以及潛在的原因兩個命令的估計結果為何有時相同有時不同?
區別:
可能的原因:
psmatch2 直接匹配的第一個,即使在有相同距離的其他個體存在情況下;猜想:
而 teffects psmatch 將匹配到的最近距離的所有個體計算,即,存在1:1匹配,但是某一個樣本與其他多個樣本的距離相同6. 檢驗frame copy data simu,replace
frame change simu(note: frame simu not found)6.1 獲取匹配得分並排序
logit mbsmoke mmarried c.mage##c.mage fbaby meduIteration 0: log likelihood = -2230.7484
Iteration 1: log likelihood = -2053.769
Iteration 2: log likelihood = -2043.2897
Iteration 3: log likelihood = -2043.2504
Iteration 4: log likelihood = -2043.2504
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
----
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----+----
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
----cap drop pr_
predict pr_(option pr assumed; Pr(mbsmoke))sort pr_ mbsmoke
gen id = _n
6.2 兩個命令的ATT計算結果teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet
//teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) ateTreatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
---
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+----
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
---psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1) //ATT = -248.515046Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
----
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----+----
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
----
---
Variable Sample | Treated Controls Difference S.E. T-stat
---+----
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3386.17477 -248.515046 54.7442913 -4.54
ATU | 3412.91159 3166.47327 -246.438327 . .
ATE | -246.82486 . .
---+----
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-+-+
Untreated | 3,778 | 3,778
Treated | 864 | 864
-+-+
Total | 4,642 | 4,642
6.3 獲取 teffect psmatch 的匹配信息teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet gen(match)
// -236.78475Treatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
---
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+----
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
---
6.4 保留處理組並合併匹配組的數據6.4.1 保留處理組數據frame copy simu simu2
frame change simu2keep id bweight mbsmoke match*
keep if mbsmoke == 1(3,778 observations deleted)
6.4.2 生成匹配對應表reshape long match,i(id) j(match_id)
keep if match != .(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
> 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74)
Data wide -> long
--
Number of obs. 864 -> 63936
Number of variables 77 -> 5
j variable (74 values) -> match_id
xij variables:
match1 match2 ... match74 -> match
--
6.4.3 建立frames 連接frlink m:1 match,frame(simu id) gen(simu_21)(all observations in frame simu2 matched)
6.4.4 獲取匹配組結果變量frget bweight ,from(simu_21) pre(_0)(1 variable copied from linked frame)
6.5 計算單個匹配的處置效應gen att = bweight - _0bweightsum att
return listVariable | Obs Mean Std. Dev. Min Max
---+--
att | 15,721 -238.8935 796.7148 -4082 3884
scalars:
r(N) = 15721
r(sum_w) = 15721
r(mean) = -238.8935182240315
r(Var) = 634754.4716125642
r(sd) = 796.7147994185649
r(min) = -4082
r(max) = 3884
r(sum) = -3755645
6.6 計算樣本加權均值bysort id:gen num = _N
sum att [aweight = 1/num]
return listVariable | Obs Weight Mean Std. Dev. Min Max
---+
att | 15,721 864 -236.7848 808.3941 -4082 3884
scalars:
r(N) = 15721
r(sum_w) = 864
r(mean) = -236.7847508257834
r(Var) = 653501.0708042918
r(sd) = 808.394130857153
r(min) = -4082
r(max) = 3884
r(sum) = -204582.0247134768frames simu : teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atetTreatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
---
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+----
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
---frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1) tiesLogistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
----
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----+----
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
----
---
Variable Sample | Treated Controls Difference S.E. T-stat
---+----
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3374.44447 -236.784751 26.0535546 -9.09
ATU | 3412.91159 3207.84728 -205.064318 . .
ATE | -210.968337 . .
---+----
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-+-+
Untreated | 3,778 | 3,778
Treated | 864 | 864
-+-+
Total | 4,642 | 4,642
6.7 計算匹配第一個的組的均值sum att if match_id == 1Variable | Obs Mean Std. Dev. Min Max
---+--
att | 864 -248.515 817.1815 -3403 2750return listscalars:
r(sum) = -214717
r(max) = 2750
r(min) = -3403
r(sd) = 817.1815138978078
r(Var) = 667785.6266563131
r(mean) = -248.5150462962963
r(sum_w) = 864
r(N) = 864frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
----
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----+----
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
----
---
Variable Sample | Treated Controls Difference S.E. T-stat
---+----
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3386.17477 -248.515046 54.7442913 -4.54
ATU | 3412.91159 3166.47327 -246.438327 . .
ATE | -246.82486 . .
---+----
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-+-+
Untreated | 3,778 | 3,778
Treated | 864 | 864
-+-+
Total | 4,642 | 4,642