搞定計量經濟學,不可不學的Stata學習筆記

2020-12-04 學術世界

你，和經濟，密不可分。

讀中國經濟，關注經管世界！

【資源】精美簡歷模板+自薦信+面試技巧免費共享

來源：計量經濟學服務中心

Stata基本操作

STATA的基本操作

setmem 500m, perm

顯示輸入內容

Display 1

Display 「clive」

顯示數據集結構describe

Describe /d

編輯 edit

Edit

重命名變量

Rename var1 var2

顯示數據集內容list/browse

List in 1

List in 2/10

數據導入:數據文件是文本類型（.csv）

insheet: . insheet using 「C:\Documentsand Settings\Administrator\桌面\ST9007\dataset\Fees1.csv」,clear

內存為空時才可以導入數據集，否則會出現（you must start with an empty dataset）

清空內存中的所有變量：.drop _all

導入語句後加入「clear」命令

打開及退出已存文件use

Use 文件路徑及文件名, clear

記錄命令和輸出結果（log）

1、開始建立記錄文件：log using "J:\phd\output.log", replace

2、暫停記錄文件：log off

3、重新打開記錄文件：log on

4、關閉記錄文件：log close

創建和保存程序文件：（doedit, do）

1、打開程序編輯窗口：doedit

2、寫入命令

3、保存文件，.do.

4、運行命令：.do 程序文件路徑及文件名

多個數據集合併為一個數據集（變量和結構相同）縱向合併append

insheet using"J:\phd\Fees1.csv", clear

save"J:\phd\Fees1.dta", replace

insheet using"J:\phd\Fees2.csv", clear

append using"J:\phd\Fees1.dta"

save"J:\phd\Fees1.dta", replace

橫向合併，在原數據集基礎上加上另外的變量merge

1、insheet using"J:\phd\Fees1.csv", clear

sort companyid yearend

save "J:\phd\Fees1.dta", replace

describe

insheet using "J:\phd\Fees6.csv", clear

sort companyid yearend

merge companyid yearend using "J:\phd\Fees1.dta"

save "J:\phd\Fees1.dta", replace

describe

2、_merge==1 obs. From master data

_merge==2 obs. From using data

_merge==3 obs. From both master and using data

幫助文件：help

1、. Help describe

描述性統計量

summarize incorporationyear 單個

summarize incorporationyear-big6 連續多個

summarize _all or simply summarize 所有

更詳細的統計量

summarize incorporationyear, detail

centile

centile auditfees, centile(0(10)100)

centile auditfees, centile(0(5)100)

tabulate不同類型變量的頻數和比例

tabulate companytype

tabulate companytype big6, column 按列計算百分比

tabulate companytype big6, row 按行計算百分比

tab companytype big6 ifcompanytype<=3, row col 同時按行列和條件計算百分比

計算滿足條件觀測的個數

count if big6==1

count if big6==0| big6==1

按離散變量排序，對連續變量計算描述性統計量：

by companytype, sort:summarize auditfees, detail

sort companytype

By companytype:summarizeauditees

轉換變量

按公司類型將公開發行股票公司賦值為1，其他為0

gen listed=0

replace listed=1if companytype==2

replace listed=1if companytype==3

replace listed=1if companytype==5

replace listed=.if companytype==.

產生新變量gen

Generate newvar=表達式

數據管理

recode命令

1、產生有多個值的變量的啞變量recode

recode year (min/1999 = 0) (2000/max = 1), gen (yeardum)

min/1999表示小於等於1999的值全部賦值為0

2000/max表示大於等於2000的值全部賦為1。

2、對一個連續變量按一定值分為不同間隔的組recode

gen assets_categ=recode(totalassets, 100, 500, 1000, 5000, 20000,100000, 1000000)。分組的值為每組的上限，包含該值。

sort assets_categ

by assets_categ: sum totalassets assets_categ

3、對一個連續變量按一定值分為相同間隔的組autocode

autocode(variablename, # of intervals, min value, max value)

for example: genassets_categ=autocode(totalassets, 10, 0, 10000)

4、對一個連續變量按每組樣本數相同進行分組：xtile

xtile assets_categ=totalassets, nquantiles(10)

每組樣本不一定完全相同

一次性計算同一變量不同組別的均值：egen命令

按公司類型先排序，再計算每一類型公司審計費用的均值並賦值給新變量：

by companytype, sort: egen meanaf2=mean(auditfees)

lcount()

lmean()

lmedian()

lsum()

_n和_N命令

1、顯示每個觀測的序號並顯示總觀測數

sort companyid fye

capture drop x

gen x=_n

capture drop y

gen y=_N

list companyidfye x y in 1/30

2、分組顯示每個組中變量的序號和每組總的樣本數

capture drop x y

sort companyid fye

by companyid: gen x=_n

by companyid: gen y=_N

list companyid fye x y in 1/30

3、創建新變量等於每個分組中變量的第一個值或最後一個值

sort companyid fye

by companyid: gen auditfees_first=auditfees[1]

by companyid: gen auditfees_last=auditfees[_N]

list companyid fye auditfees auditfees_first auditfees_last in 1/30

4、創建新變量等於滯後一期或滯後兩期的值

sort companyid fye

by companyid: gen auditfees_lag1= auditfees[_n-1]

by companyid: gen auditfees_lag2= auditfees[_n-2]

list companyid fye auditfees auditfees_lag1 auditfees_lag2 in 1/30

縮尾處理winsor

winsor rev, gen(wrev) p(0.01)0.01代表去掉的百分數。

Winsor rev, gen(wrev) h(5),5代表去掉的個數

列聯表檢驗：

1、創建列聯表的命令：

tabulate companytype big6,row

第一個變量是表的最左側一列的項目，第二個變量是表的第一行的項目。

2、兩變量之間的相關性檢驗：chi2

tabulate companytype big6,chi2 row

3、相關矩陣：

pwcorr lnaf big6 year listed

4、列出相關矩陣並進行符號檢驗

pwcorr lnaf big6 year listed, sig

5、在矩陣中列出觀測數

pwcorr lnaf big6 listed if year==2000, sig obs

模型

format x1 %10.3f ——將x1的列寬固定為10，小數點後取三位

基本一元回歸

regress y x

回歸結果的保存

回歸結果的係數保存在_b[varname]內存變量中，常數項的係數保存在(_cons)內存變量中。

預測值及殘差

predict yhat

predict yres, resid

yres即為真實值得與預測值之差

殘差與X的散點圖

twoway (scatter y_res x) (lfit y_res x)

衡量估計係數準確程度：標準誤差。

用樣本的標準偏差與係數之間的關係來衡量即T值（用係數除以標準差），同時P值是根據T值的分布計算出來的，表示係數落入標準對應上下限的可能性。前提是殘差符合以下假設：

同方差：Homoscedasticity (i.e., the residuals have a constant variance)

獨立不相關：Non-correlation (i.e., the residuals are not correlated with eachother)

正態分布：Normality (i.e., the residuals are normally distributed)

回歸結果包含的一些內容的意思

l 各變差的自由度：

For the ESS, df = k-1 where k = number of regression coefficients(df = 2 – 1)

For the RSS, df = n – k where n =number of observations (= 11 - 2)

For the TSS, df = n-1 ( = 11 – 1)

MS：變差除以自由度：The last column(MS) reports the ESS, RSS and TSS divided by their respective degrees offreedom

R平方：The R-squared = ESS / TSS

調整的R平方：Adj R-squared =1-(1-R2)(n-1)/(n-k) ，消除了加入相關度不高解釋變量後R平方增加的不足。

Root MSE = square root of RSS/n-k：模型的平均解釋能力

The F-statistic = (ESS/k-1)/(RSS/n-k)：模型的總解釋能力

Heteroscedasticity(hettest)異方差性

檢驗方差齊性的方法：

回歸後使用hettest命令：

reg auditfees nonauditfees totalassets big6 listed

hettest

方差齊性不會使係數有偏，但會使使係數的標準差有偏。產生的原因有可能是數據本身有界限，產生高的偏度。一些方差不齊可以通過取對數消除。當發現不齊性時使用Huber/White/sandwich estimator對標準差進行調整。STATA可以在回歸時加上robust來實現。

reg auditfees nonauditfees totalassets big6 listed, robust

加robust後的回歸係數相同，但標準差不同，T值變小，P值變大，F值變小，R2不變。

Correlated errors(自變量相關)

The residuals of a given firm are correlated across years (「timeseries dependence」)，面板數據（In paneldata）, 同一公司不可觀測的特性對不同年度都會產生一定的影響，這時就會使數據不獨立。there are likely to be unobserved company-specific characteristicsthat are relatively constant over time

標準差會下偏，This problem canbe avoided by adjusting the standard errors for the clustering of yearlyobservations across a given company

消除變量相關問題：

在回歸中加入robust cluster()

reg lnaf lntabig6 listed, robust cluster (companyid)

如何驗證同一公司不同年度數據的殘差的相關性

reg lnaf lnta

predict res, resid

keep companyid year res

sort companyid year

drop if companyid==companyid[_n-1] & year==year[_n-1]

reshape wide res, i(companyid) j(year)

browse

pwcorr res1998- res2002

在使用面板數據時應注意：

只用robust控制heteroscedasticity，而未用cluster( )控制time-series dependence，T統計量也會上偏

如果 heteroscedasticity也未控制，T統計量會上偏更嚴重。

因此在使用面板數據時應加入robust cluster() option, otherwise your 「significant」 results frompooled regressions may be spurious.

什麼情況下會產生多重共線性

l We have seen that when there isperfect collinearity between independent variables, STATA will have to excludeone of them. For example, year_1 + year_2 + year_3 + year_4 + year_5 = 1

reg lnaf year_1 year_2 year_3year_4 year_5, nocons

STATA automatically throws awayone of the year dummies so that the model can be estimated

l Even if the independentvariables are not perfectly collinear, there can still be a problem if they arehighly correlated

後果：

the standard errors of the coefficients to be large (i.e., thecoefficients are not estimated precisely)

the coefficient estimates can be highly unstable

衡量方法：

Variance-inflation factors (VIF) 可用來衡量是否存在多重共線性。

reg lnaf lnta big6 lnta1

vif

reg lnaf lnta big6

vif

多重共線性的嚴重程度：如果為10時可判斷為高，為20時可判斷為非常高。

【遇見·愛】經管世界公益交友平臺-人工智慧匹配

搞定計量經濟學,不可不學的Stata學習筆記

相關焦點

從0到1:一個外行的計量經濟學進階之路!

怎麼在Stata實現

零基礎的同學如何用stata做一元線性回歸模型?

高考生物一輪複習:25張圖搞定高中生物必修知識!附學霸手寫筆記

Meta分析學習筆記

零基礎的同學如何用stata做多元線性回歸模型?

高中部學霸化學筆記曝光,搞定化學難題,成績至少提升30

騰訊系APP助你搞定學習辦公,搞定學術論文寫作!

stata統計學軟體IC\SE\MP版本的區別與選擇

高中生物:十張表~搞定實驗中所有試劑使用!附高考狀元手寫筆記

深度學習筆記8:利用Tensorflow搭建神經網絡

費曼學習法、時間統計法、整體學習法、康奈爾筆記、思維導圖

初一學習方法:做筆記

免費資源|stata 16 中文版,32G統計和meta教程24h免費領

偏微分方程學習筆記

編串必備:文玩人不可不學的繩結技巧全攻略,拿走不謝!

用於政策評估的計量經濟學:問題、前沿與展望

零基礎的同學如何用Stata做logistic回歸?

MIT線性代數(Linear Algebra)中文筆記

搞定計量經濟學,不可不學的Stata學習筆記

相關焦點

從0到1:一個外行的計量經濟學進階之路!

怎麼在Stata實現

零基礎的同學如何用stata做一元線性回歸模型?

高考生物一輪複習:25張圖搞定高中生物必修知識!附學霸手寫筆記

Meta分析學習筆記

零基礎的同學如何用stata做多元線性回歸模型?

高中部學霸化學筆記曝光,搞定化學難題,成績至少提升30

騰訊系APP助你搞定學習辦公,搞定學術論文寫作!

stata統計學軟體IC\SE\MP版本的區別與選擇

高中生物:十張表~搞定實驗中所有試劑使用!附高考狀元手寫筆記

深度學習筆記8:利用Tensorflow搭建神經網絡

費曼學習法、時間統計法、整體學習法、康奈爾筆記、思維導圖

初一學習方法:做筆記

免費資源|stata 16 中文版,32G統計和meta教程24h免費領

偏微分方程 學習筆記

編串必備:文玩人不可不學的繩結技巧全攻略,拿走不謝!

用於政策評估的計量經濟學:問題、前沿與展望

零基礎的同學如何用Stata做logistic回歸?

MIT線性代數(Linear Algebra)中文筆記

偏微分方程學習筆記