關於 psmatch2 與 teffects psmatch 估計結果差異的一個原因

2021-03-02 小熊的胡說八道

關於具體 PSM 方法的原理，我不做過多闡述，這裡我僅討論teffects psmatch和psmatch2在stata中的估計結果不相同的一個原因。

stata15及之後的版本中有個teffects模塊，PSM 方法也可以用其實現，一般的psmatch2命令用來做 psm 是比較多的，但，psmatch2對標準差的估計是有問題的，其報告結果的時候都會提示Note: S.E. does not take into account that the propensity score is estimated.，而teffects psmath的標準差你大可以放心。

這篇文章Propensity Score Matching in Stata using teffects （連接：https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm）關於psmatch2和teffects psmatch的講解是比較詳細的，該文中也指出對命令選項進行調整，理應可以獲得同樣的係數，psmatch2 t x1 x2, out(y) logit ate 和teffects psmatch (y) (t x1 x2), atet應該可以獲得同樣的ATT。而，部分學者使用psmatch2和teffects psmatch命令對同一個數據進行估計時，往往卻發現兩個命令的估計結果不相同，甚至結論完全相反。

至於為什麼會導致這種情況發生，原因在於psmatch2在最近鄰匹配時，如果多個控制組個體與幹預組個體具有相同的最近距離，那麼不加ties選項的psmatch2將會選擇最先遇到的控制組個體作為匹配，因此，樣本的順序會影響匹配樣本，而影響估計結果，如果加了ties選項，將會用到所有相同最近距離的控制組個體的平均結果作為幹預組個體的匹配，而teffects psmatch則是採用後者的方法。

故，當你使用兩個命令卻發現獲得不同的結果時，例，att 與teffects psmatch (y) (treat xlist), atet相差很大時，你應當檢查你的psmatch2 treat xlist,out(y) logit ate是否有ties選項，這個可能是係數差異的一個可能原因。

clear all

frames create data
frames change data

webuse cattaneo2
frames copy data frames1
frames change frames1

sum bweight mbsmoke mmarried c.mage##c.mage fbaby medu
    Variable |        Obs        Mean    Std. Dev.       Min        Max
---+--
     bweight |      4,642     3361.68    578.8196        340       5500
     mbsmoke |      4,642    .1861267    .3892508          0          1
    mmarried |      4,642    .6996984    .4584385          0          1
        mage |      4,642    26.50452    5.619026         13         45
             |
      c.mage#|
      c.mage |      4,642    734.0564    305.2242        169       2025
---+--
             |
       fbaby |      4,642    .4379578    .4961893          0          1
        medu |      4,642    12.68957    2.520661          0         17
1.Logitlogit mbsmoke mmarried c.mage##c.mage fbaby medu
Iteration 0:   log likelihood = -2230.7484  
Iteration 1:   log likelihood =  -2053.769  
Iteration 2:   log likelihood = -2043.2897  
Iteration 3:   log likelihood = -2043.2504  
Iteration 4:   log likelihood = -2043.2504  

Logistic regression                             Number of obs     =      4,642
                                                LR chi2(5)        =     375.00
                                                Prob > chi2       =     0.0000
Log likelihood = -2043.2504                     Pseudo R2         =     0.0841

----
      mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----+----
     mmarried |  -1.145706   .0918962   -12.47   0.000     -1.32582    -.965593
         mage |    .321518   .0638472     5.04   0.000     .1963798    .4466563
              |
c.mage#c.mage |  -.0060368   .0011849    -5.09   0.000    -.0083592   -.0037144
              |
        fbaby |  -.3864258   .0880445    -4.39   0.000    -.5589898   -.2138618
         medu |  -.1420833   .0173215    -8.20   0.000    -.1760328   -.1081338
        _cons |  -2.950915   .8102504    -3.64   0.000    -4.538976   -1.362853
----
predict pr_
(option pr assumed; Pr(mbsmoke))
2.psmatch22.1 using pscore()psmatch2 mbsmoke, out(bweight) pscore(pr_) ate logit
---
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
---+----
         bweight  Unmatched | 3137.65972   3412.91159  -275.251871   21.4528037   -12.83
                        ATT | 3137.65972   3334.84259   -197.18287   55.6185293    -3.55
                        ATU | 3412.91159   3164.00185  -248.909741            .        .
                        ATE |                          -239.281991            .        .
---+----
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-+-+
 Untreated |     3,778 |     3,778 
   Treated |       864 |       864 
-+-+
     Total |     4,642 |     4,642
2.2 generalpsmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)
Logistic regression                             Number of obs     =      4,642
                                                LR chi2(5)        =     375.00
                                                Prob > chi2       =     0.0000
Log likelihood = -2043.2504                     Pseudo R2         =     0.0841

----
      mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----+----
     mmarried |  -1.145706   .0918962   -12.47   0.000     -1.32582    -.965593
         mage |    .321518   .0638472     5.04   0.000     .1963798    .4466563
              |
c.mage#c.mage |  -.0060368   .0011849    -5.09   0.000    -.0083592   -.0037144
              |
        fbaby |  -.3864258   .0880445    -4.39   0.000    -.5589898   -.2138618
         medu |  -.1420833   .0173215    -8.20   0.000    -.1760328   -.1081338
        _cons |  -2.950915   .8102504    -3.64   0.000    -4.538976   -1.362853
----
---
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
---+----
         bweight  Unmatched | 3137.65972   3412.91159  -275.251871   21.4528037   -12.83
                        ATT | 3137.65972   3334.84259   -197.18287   55.6185293    -3.55
                        ATU | 3412.91159   3164.00185  -248.909741            .        .
                        ATE |                          -239.281991            .        .
---+----
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-+-+
 Untreated |     3,778 |     3,778 
   Treated |       864 |       864 
-+-+
     Total |     4,642 |     4,642
3. teffects psmatchteffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),atet nn(1)
Treatment-effects estimation                   Number of obs      =      4,642
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =         74
---
                       |              AI Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---+----
ATET                   |
               mbsmoke |
(smoker vs nonsmoker)  |  -236.7848   26.57789    -8.91   0.000    -288.8765    -184.693
---
兩個命令的估計結果不同?
4. 計算結果不同？frames create frame2
frames change frame2
use http://ssc.wisc.edu/sscc/pubs/files/psm
sum
    Variable |        Obs        Mean    Std. Dev.       Min        Max
---+--
          x1 |      1,000    -.012963    1.000053    -3.6593   3.084742
          x2 |      1,000   -.0246025    1.034555  -3.363018   3.399474
           t |      1,000        .333    .4715224          0          1
           y |      1,000    .3474242    1.957462  -5.494524   6.873514
psmatch2 t x1 x2, out(y) logit ate
Logistic regression                             Number of obs     =      1,000
                                                LR chi2(2)        =     222.78
                                                Prob > chi2       =     0.0000
Log likelihood = -524.89072                     Pseudo R2         =     0.1751

---
           t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---+----
          x1 |   .9068298   .0885341    10.24   0.000     .7333062    1.080353
          x2 |   .8100408   .0816962     9.92   0.000     .6499192    .9701624
       _cons |  -.8528442   .0788823   -10.81   0.000    -1.007451   -.6982378
---
---
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
---+----
               y  Unmatched |  1.8910736  -.423243358   2.31431696   .109094342    21.21
                        ATT |  1.8910736   .930722886   .960350715   .168252917     5.71
                        ATU |-.423243358   .625587554   1.04883091            .        .
                        ATE |                           1.01936701            .        .
---+----
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-+-+
 Untreated |       667 |       667 
   Treated |       333 |       333 
-+-+
     Total |     1,000 |     1,000
teffects psmatch (y) (t x1 x2), atet
Treatment-effects estimation                   Number of obs      =      1,000
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =          1
---
             |              AI Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---+----
ATET         |
           t |
   (1 vs 0)  |   .9603507   .1204748     7.97   0.000     .7242245    1.196477
---
兩個命令的估計結果又相同了?
5. 差異以及潛在的原因兩個命令的估計結果為何有時相同有時不同？
區別：
可能的原因：
psmatch2 直接匹配的第一個,即使在有相同距離的其他個體存在情況下；猜想:
而 teffects psmatch 將匹配到的最近距離的所有個體計算，即，存在1：1匹配，但是某一個樣本與其他多個樣本的距離相同6. 檢驗frame copy data simu,replace
frame change simu
(note: frame simu not found)

6.1 獲取匹配得分並排序logit mbsmoke mmarried c.mage##c.mage fbaby medu
Iteration 0:   log likelihood = -2230.7484  
Iteration 1:   log likelihood =  -2053.769  
Iteration 2:   log likelihood = -2043.2897  
Iteration 3:   log likelihood = -2043.2504  
Iteration 4:   log likelihood = -2043.2504  

Logistic regression                             Number of obs     =      4,642
                                                LR chi2(5)        =     375.00
                                                Prob > chi2       =     0.0000
Log likelihood = -2043.2504                     Pseudo R2         =     0.0841

----
      mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----+----
     mmarried |  -1.145706   .0918962   -12.47   0.000     -1.32582    -.965593
         mage |    .321518   .0638472     5.04   0.000     .1963798    .4466563
              |
c.mage#c.mage |  -.0060368   .0011849    -5.09   0.000    -.0083592   -.0037144
              |
        fbaby |  -.3864258   .0880445    -4.39   0.000    -.5589898   -.2138618
         medu |  -.1420833   .0173215    -8.20   0.000    -.1760328   -.1081338
        _cons |  -2.950915   .8102504    -3.64   0.000    -4.538976   -1.362853
----
cap drop pr_
predict pr_
(option pr assumed; Pr(mbsmoke))
sort pr_ mbsmoke
gen id = _n
6.2 兩個命令的ATT計算結果teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet
//teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) ate
Treatment-effects estimation                   Number of obs      =      4,642
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =         74
---
                       |              AI Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---+----
ATET                   |
               mbsmoke |
(smoker vs nonsmoker)  |  -236.7848   26.57789    -8.91   0.000    -288.8765    -184.693
---
psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1) //ATT = -248.515046
Logistic regression                             Number of obs     =      4,642
                                                LR chi2(5)        =     375.00
                                                Prob > chi2       =     0.0000
Log likelihood = -2043.2504                     Pseudo R2         =     0.0841

----
      mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----+----
     mmarried |  -1.145706   .0918962   -12.47   0.000     -1.32582    -.965593
         mage |    .321518   .0638472     5.04   0.000     .1963798    .4466563
              |
c.mage#c.mage |  -.0060368   .0011849    -5.09   0.000    -.0083592   -.0037144
              |
        fbaby |  -.3864258   .0880445    -4.39   0.000    -.5589898   -.2138618
         medu |  -.1420833   .0173215    -8.20   0.000    -.1760328   -.1081338
        _cons |  -2.950915   .8102504    -3.64   0.000    -4.538976   -1.362853
----
---
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
---+----
         bweight  Unmatched | 3137.65972   3412.91159  -275.251871   21.4528037   -12.83
                        ATT | 3137.65972   3386.17477  -248.515046   54.7442913    -4.54
                        ATU | 3412.91159   3166.47327  -246.438327            .        .
                        ATE |                           -246.82486            .        .
---+----
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-+-+
 Untreated |     3,778 |     3,778 
   Treated |       864 |       864 
-+-+
     Total |     4,642 |     4,642
6.3 獲取 teffect psmatch 的匹配信息teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet gen(match)
// -236.78475
Treatment-effects estimation                   Number of obs      =      4,642
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =         74
---
                       |              AI Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---+----
ATET                   |
               mbsmoke |
(smoker vs nonsmoker)  |  -236.7848   26.57789    -8.91   0.000    -288.8765    -184.693
---
6.4 保留處理組並合併匹配組的數據6.4.1 保留處理組數據frame copy simu simu2
frame change simu2
keep id bweight mbsmoke match*
keep if mbsmoke == 1
(3,778 observations deleted)
6.4.2 生成匹配對應表reshape long match,i(id) j(match_id)
keep if match != .
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 
> 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74)

Data                               wide   ->   long
--
Number of obs.                      864   ->   63936
Number of variables                  77   ->       5
j variable (74 values)                    ->   match_id
xij variables:
              match1 match2 ... match74   ->   match
--
6.4.3 建立frames 連接frlink m:1 match,frame(simu id) gen(simu_21)
  (all observations in frame simu2 matched)
6.4.4 獲取匹配組結果變量frget bweight ,from(simu_21) pre(_0)
  (1 variable copied from linked frame)
6.5 計算單個匹配的處置效應gen att = bweight - _0bweight
sum att
return list
    Variable |        Obs        Mean    Std. Dev.       Min        Max
---+--
         att |     15,721   -238.8935    796.7148      -4082       3884



scalars:
                  r(N) =  15721
              r(sum_w) =  15721
               r(mean) =  -238.8935182240315
                r(Var) =  634754.4716125642
                 r(sd) =  796.7147994185649
                r(min) =  -4082
                r(max) =  3884
                r(sum) =  -3755645
6.6 計算樣本加權均值bysort id:gen num = _N
sum att [aweight = 1/num]
return list
    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
---+
         att |  15,721         864   -236.7848   808.3941      -4082       3884



scalars:
                  r(N) =  15721
              r(sum_w) =  864
               r(mean) =  -236.7847508257834
                r(Var) =  653501.0708042918
                 r(sd) =  808.394130857153
                r(min) =  -4082
                r(max) =  3884
                r(sum) =  -204582.0247134768
frames simu : teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet
Treatment-effects estimation                   Number of obs      =      4,642
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =         74
---
                       |              AI Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---+----
ATET                   |
               mbsmoke |
(smoker vs nonsmoker)  |  -236.7848   26.57789    -8.91   0.000    -288.8765    -184.693
---
frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)  ties
Logistic regression                             Number of obs     =      4,642
                                                LR chi2(5)        =     375.00
                                                Prob > chi2       =     0.0000
Log likelihood = -2043.2504                     Pseudo R2         =     0.0841

----
      mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----+----
     mmarried |  -1.145706   .0918962   -12.47   0.000     -1.32582    -.965593
         mage |    .321518   .0638472     5.04   0.000     .1963798    .4466563
              |
c.mage#c.mage |  -.0060368   .0011849    -5.09   0.000    -.0083592   -.0037144
              |
        fbaby |  -.3864258   .0880445    -4.39   0.000    -.5589898   -.2138618
         medu |  -.1420833   .0173215    -8.20   0.000    -.1760328   -.1081338
        _cons |  -2.950915   .8102504    -3.64   0.000    -4.538976   -1.362853
----
---
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
---+----
         bweight  Unmatched | 3137.65972   3412.91159  -275.251871   21.4528037   -12.83
                        ATT | 3137.65972   3374.44447  -236.784751   26.0535546    -9.09
                        ATU | 3412.91159   3207.84728  -205.064318            .        .
                        ATE |                          -210.968337            .        .
---+----
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-+-+
 Untreated |     3,778 |     3,778 
   Treated |       864 |       864 
-+-+
     Total |     4,642 |     4,642
6.7 計算匹配第一個的組的均值sum att if match_id == 1
    Variable |        Obs        Mean    Std. Dev.       Min        Max
---+--
         att |        864    -248.515    817.1815      -3403       2750
return list
scalars:
                r(sum) =  -214717
                r(max) =  2750
                r(min) =  -3403
                 r(sd) =  817.1815138978078
                r(Var) =  667785.6266563131
               r(mean) =  -248.5150462962963
              r(sum_w) =  864
                  r(N) =  864
frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)
Logistic regression                             Number of obs     =      4,642
                                                LR chi2(5)        =     375.00
                                                Prob > chi2       =     0.0000
Log likelihood = -2043.2504                     Pseudo R2         =     0.0841

----
      mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----+----
     mmarried |  -1.145706   .0918962   -12.47   0.000     -1.32582    -.965593
         mage |    .321518   .0638472     5.04   0.000     .1963798    .4466563
              |
c.mage#c.mage |  -.0060368   .0011849    -5.09   0.000    -.0083592   -.0037144
              |
        fbaby |  -.3864258   .0880445    -4.39   0.000    -.5589898   -.2138618
         medu |  -.1420833   .0173215    -8.20   0.000    -.1760328   -.1081338
        _cons |  -2.950915   .8102504    -3.64   0.000    -4.538976   -1.362853
----
---
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
---+----
         bweight  Unmatched | 3137.65972   3412.91159  -275.251871   21.4528037   -12.83
                        ATT | 3137.65972   3386.17477  -248.515046   54.7442913    -4.54
                        ATU | 3412.91159   3166.47327  -246.438327            .        .
                        ATE |                           -246.82486            .        .
---+----
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-+-+
 Untreated |     3,778 |     3,778 
   Treated |       864 |       864 
-+-+
     Total |     4,642 |     4,642

關於 psmatch2 與 teffects psmatch 估計結果差異的一個原因

相關焦點

傾向得分匹配:psmatch2 還是 teffects psmatch

統計計量丨傾向得分匹配:psmatch2 還是 teffects psmatch

傾向匹配得分教程【pscore、psmatch2、官方命令Teffects操作及應用】

即將開幕的STATA前沿培訓精講:帶異質性處理效應的雙向固定效應估計|從精確斷點、模糊斷點估計的實際操作|弱工具變量穩健推斷

Alpha多樣性指數的計算和差異分析(差異檢驗結果可視化)

使用 ALDEx2 進行差異分析

FC/T檢驗/PLS-DA篩選差異代謝物方法介紹

帶異質性處理效應的雙向固定效應估計不穩健時,Fuzzy-DID來幫忙|補充更新

reg2logit:用OLS估計Logit模型參數

Python數據科學:正態分布與t檢驗

管理心理學之統計(11)t分數