对于面板数据,我们有多种估计方法,包括混合OLS、固定效应(FE)、随机效应(RE)和最小二乘虚拟变量(LSDV)等等。不过,我们最为常用的估计方法那自然还是固定效应(组内估计),固定效应模型的Stata官方命令是xtreg
,但它有时候其实并没有那么好用(如对数据格式有要求,运行速度慢等),我们经常使用的固定效应估计命令还有reg
、areg
和reghdfe
。
xtreg
xtreg,fe
是固定效应模型的官方命令,使用这一命令估计出来的系数是最为纯正的固定效应估计量(组内估计量)。xtreg
对数据格式有严格要求,要求必须是面板数据,在使用xtreg命令之前,我们首先需要使用xtset
命令进行面板数据声明,定义截面(个体)维度和时间维度。一旦在xtreg
命令后加上选项fe
,那就表示使用固定效应组内估计方法进行估计,并且默认个体固定效应定义在xtset
所设定的截面维度上。至于时间固定效应,需要引入虚拟变量i.year
来表示不同的时间。
下面使用林毅夫老师(1992)的AER论文《Rural Reforms and Agricultural Growth in China》(中国的农村改革与农业增长)所使用的数据lin_1992.dta,给大家演示一下该命令的用法和估计结果。
. xtset province year panel variable: province (strongly balanced) time variable: year, 70 to 87 delta: 1 unit . xtreg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, fe vce(cluster province) Fixed-effects (within) regression Number of obs = 476 Group variable: province Number of groups = 28 R-sq: Obs per group: within = 0.8932 min = 17 between = 0.6596 avg = 17.0 overall = 0.7156 max = 17 F(23,27) = 949.82 corr(u_i, Xb) = -0.3425 Prob > F = 0.0000 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------ | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ltlan | .5833594 .1745834 3.34 0.002 .2251439 .9415749 ltwlab | .1514909 .0585107 2.59 0.015 .0314368 .271545 ltpow | .0971114 .090911 1.07 0.295 -.0894225 .2836453 ltfer | .1693346 .0438098 3.87 0.001 .0794444 .2592248 hrs | .1503752 .0587581 2.56 0.016 .0298136 .2709368 mci | .1978373 .0810587 2.44 0.022 .0315186 .364156 ngca | .7784081 .4016301 1.94 0.063 -.0456688 1.602485 | year | 71 | -.0240404 .023366 -1.03 0.313 -.0719836 .0239027 72 | -.1323624 .0404832 -3.27 0.003 -.2154272 -.0492977 73 | -.0377336 .0357883 -1.05 0.301 -.111165 .0356979 74 | .0058554 .0500774 0.12 0.908 -.096895 .1086058 75 | .0096731 .0566898 0.17 0.866 -.1066448 .1259911 76 | -.0476465 .061423 -0.78 0.445 -.1736761 .0783832 77 | -.0869336 .0680579 -1.28 0.212 -.2265767 .0527096 78 | -.0325205 .0766428 -0.42 0.675 -.1897785 .1247376 79 | -.0076332 .0833462 -0.09 0.928 -.1786454 .163379 81 | -.093479 .1093614 -0.85 0.400 -.3178701 .1309121 82 | -.0447862 .1207405 -0.37 0.714 -.2925251 .2029528 83 | -.0309435 .1377207 -0.22 0.824 -.313523 .2516361 84 | .0442535 .1428764 0.31 0.759 -.2489048 .3374117 85 | -.0033372 .1561209 -0.02 0.983 -.3236709 .3169965 86 | .00484 .157992 0.03 0.976 -.3193329 .3290129 87 | .0386475 .1639608 0.24 0.815 -.2977723 .3750674 | _cons | 2.651286 .7738994 3.43 0.002 1.063376 4.239196 -------------+---------------------------------------------------------------- sigma_u | .29344594 sigma_e | .09930555 rho | .89724523 (fraction of variance due to u_i) ------------------------------------------------------------------------------
reg
通过在回归方程中引入虚拟变量来代表不同的个体,可以起到和固定效应组内估计方法(FE)同样的效果(已经被证明)。这种方法被称之为最小二乘虚拟变量方法(LSDV),一些教材和论文也把这种方法称之为固定效应估计方法。它的好处是可以得到对个体异质性的估计(FE是通过组内变换消去个体异质性),但如果个体很大,那么需要引入很多虚拟变量,自由度损失太多,还可能超出Stata所允许的解释变量个数。
LSDV方法的Stata命令是reg i.id i.year
,其中,id是个体变量,year是时间变量,reg
命令对数据格式没有要求,因而使用起来更为灵活,只是会生成一大长串虚拟变量估计结果。
. reg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.province i.year, vce(cluster province) Linear regression Number of obs = 476 F(22, 27) = . Prob > F = . R-squared = 0.9695 Root MSE = .09931 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------- | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- ltlan | .5833594 .1800436 3.24 0.003 .2139404 .9527783 ltwlab | .1514909 .0603407 2.51 0.018 .027682 .2752998 ltpow | .0971114 .0937543 1.04 0.309 -.0952565 .2894792 ltfer | .1693346 .0451799 3.75 0.001 .0766331 .2620362 hrs | .1503752 .0605958 2.48 0.020 .026043 .2747075 mci | .1978373 .0835939 2.37 0.025 .0263169 .3693578 ngca | .7784081 .4141914 1.88 0.071 -.0714423 1.628259 | province | beijing | -.1865095 .1172887 -1.59 0.123 -.427166 .054147 fujian | .0434646 .0473107 0.92 0.366 -.0536089 .1405381 gansu | -.7945197 .1228202 -6.47 0.000 -1.046526 -.5425134 guangdong | -.0278664 .0609608 -0.46 0.651 -.1529476 .0972149 guangxi | -.2539549 .0614801 -4.13 0.000 -.3801015 -.1278082 guizhou | -.2526439 .0598147 -4.22 0.000 -.3753736 -.1299142 hebei | -.270106 .0948694 -2.85 0.008 -.4647619 -.07545 heilongjiang | -.0926732 .26542 -0.35 0.730 -.63727 .4519237 henan | -.0920743 .0396983 -2.32 0.028 -.1735284 -.0106201 hubei | .1024438 .0368811 2.78 0.010 .0267701 .1781176 hunan | -.0434275 .0581142 -0.75 0.461 -.1626679 .0758129 jiangsu | .1153335 .0352061 3.28 0.003 .0430965 .1875705 jiangxi | -.1401737 .0596644 -2.35 0.026 -.2625949 -.0177525 jilin | -.1783839 .2109985 -0.85 0.405 -.6113171 .2545493 liaoning | -.2517315 .1563399 -1.61 0.119 -.5725145 .0690515 neimong | -.8860432 .2325209 -3.81 0.001 -1.363137 -.4089498 ningxia | -.8489859 .1732579 -4.90 0.000 -1.204482 -.49349 qinghai | -.6982553 .1268849 -5.50 0.000 -.9586017 -.4379089 shaanxi | -.320607 .0887091 -3.61 0.001 -.502623 -.1385911 shangdong | .0040812 .0547494 0.07 0.941 -.1082554 .1164177 shanghai | .0864336 .0982642 0.88 0.387 -.1151878 .288055 shanxi | -.5005347 .1388718 -3.60 0.001 -.785476 -.2155934 sichuan | .0335563 .0392453 0.86 0.400 -.0469685 .1140811 tianjin | -.3011 .1049208 -2.87 0.008 -.5163796 -.0858203 xinjiang | -.3740561 .2053926 -1.82 0.080 -.7954869 .0473746 yunnan | -.2854833 .0590488 -4.83 0.000 -.4066415 -.1643251 zhejiang | .1615248 .0760427 2.12 0.043 .0054981 .3175515 | year | 71 | -.0240404 .0240968 -1.00 0.327 -.073483 .0254022 72 | -.1323624 .0417494 -3.17 0.004 -.2180251 -.0466998 73 | -.0377336 .0369076 -1.02 0.316 -.1134616 .0379945 74 | .0058554 .0516436 0.11 0.911 -.1001086 .1118193 75 | .0096731 .0584628 0.17 0.870 -.1102827 .129629 76 | -.0476465 .0633441 -0.75 0.458 -.1776178 .0823249 77 | -.0869336 .0701864 -1.24 0.226 -.2309442 .057077 78 | -.0325205 .0790398 -0.41 0.684 -.1946968 .1296559 79 | -.0076332 .0859529 -0.09 0.930 -.1839939 .1687275 81 | -.093479 .1127818 -0.83 0.414 -.324888 .1379301 82 | -.0447862 .1245167 -0.36 0.722 -.3002733 .210701 83 | -.0309435 .142028 -0.22 0.829 -.3223608 .2604739 84 | .0442535 .147345 0.30 0.766 -.2580735 .3465804 85 | -.0033372 .1610037 -0.02 0.984 -.3336895 .3270151 86 | .00484 .1629333 0.03 0.977 -.3294716 .3391516 87 | .0386475 .1690888 0.23 0.821 -.3082941 .3855891 | _cons | 2.874582 .7510459 3.83 0.001 1.333563 4.415601 -------------------------------------------------------------------------------
areg
areg
命令是对reg
命令的改进和优化,其对数据结构也没有要求。有些时候我们想在回归中控制很多虚拟变量(i.id
这种),但又不想生成虚拟变量,不想报告虚拟变量的回归结果,那么就可以使用areg
命令,只需在选项absorb()
的括号里加入你想要控制的类别变量就好。因此,我们也可以使用areg
命令实现固定效应的估计,因为固定效应组内估计与LSDV效果是等价的。
不过absorb()
的括号里只能加一个变量,如果想要估计双向固定效应或是更高维度固定效应,那么就还是要使用使用i.var
的方式引入虚拟变量。
. areg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, absorb(province) vce(cluster province) Linear regression, absorbing indicators Number of obs = 476 Absorbed variable: province No. of categories = 28 F( 23, 27) = 893.08 Prob > F = 0.0000 R-squared = 0.9695 Adj R-squared = 0.9659 Root MSE = 0.0993 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------ | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ltlan | .5833594 .1800436 3.24 0.003 .2139404 .9527783 ltwlab | .1514909 .0603407 2.51 0.018 .027682 .2752998 ltpow | .0971114 .0937543 1.04 0.309 -.0952565 .2894792 ltfer | .1693346 .0451799 3.75 0.001 .0766331 .2620362 hrs | .1503752 .0605958 2.48 0.020 .026043 .2747075 mci | .1978373 .0835939 2.37 0.025 .0263169 .3693578 ngca | .7784081 .4141914 1.88 0.071 -.0714423 1.628259 | year | 71 | -.0240404 .0240968 -1.00 0.327 -.073483 .0254022 72 | -.1323624 .0417494 -3.17 0.004 -.2180251 -.0466998 73 | -.0377336 .0369076 -1.02 0.316 -.1134616 .0379945 74 | .0058554 .0516436 0.11 0.911 -.1001086 .1118193 75 | .0096731 .0584628 0.17 0.870 -.1102827 .129629 76 | -.0476465 .0633441 -0.75 0.458 -.1776178 .0823249 77 | -.0869336 .0701864 -1.24 0.226 -.2309442 .057077 78 | -.0325205 .0790398 -0.41 0.684 -.1946968 .1296559 79 | -.0076332 .0859529 -0.09 0.930 -.1839939 .1687275 81 | -.093479 .1127818 -0.83 0.414 -.324888 .1379301 82 | -.0447862 .1245167 -0.36 0.722 -.3002733 .210701 83 | -.0309435 .142028 -0.22 0.829 -.3223608 .2604739 84 | .0442535 .147345 0.30 0.766 -.2580735 .3465804 85 | -.0033372 .1610037 -0.02 0.984 -.3336895 .3270151 86 | .00484 .1629333 0.03 0.977 -.3294716 .3391516 87 | .0386475 .1690888 0.23 0.821 -.3082941 .3855891 | _cons | 2.651286 .7981036 3.32 0.003 1.013713 4.288859 ------------------------------------------------------------------------------
备注:如果出现matsize too small
set matsize 5000
reghdfe
reghdfe
主要用于实现多维固定效应线性回归。有些时候,我们需要控制多个维度(如城市-行业-年度)的固定效应,xtreg
等命令也OK,但运行速度会很慢,reghdfe
解决的就是这一痛点,其在运行速度方面远远优于xtreg
等命令。reghdfe
是一个外部命令,作者是Sergio Correia,有关这一命令的更多介绍详见github作者主页(https://github.com/sergiocorreia/reghdfe),大家在使用之前需要安装(ssc install reghdfe
)。
reghdfe
命令可以包含多维固定效应,只需 absorb (var1,var2,var3,...)
,不需要使用i.var
的方式引入虚拟变量,相比xtreg
等命令方便许多,并且不会汇报一大长串虚拟变量回归结果。
. reghdfe ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca, absorb(year province) vce(cluster province) (MWFE estimator converged in 2 iterations) HDFE Linear regression Number of obs = 476 Absorbing 2 HDFE groups F( 7, 27) = 229.56 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.9695 Adj R-squared = 0.9658 Within R-sq. = 0.6751 Number of clusters (province) = 28 Root MSE = 0.0994 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------ | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ltlan | .5833594 .1745834 3.34 0.002 .2251439 .9415749 ltwlab | .1514909 .0585107 2.59 0.015 .0314368 .271545 ltpow | .0971114 .090911 1.07 0.295 -.0894225 .2836453 ltfer | .1693346 .0438098 3.87 0.001 .0794444 .2592248 hrs | .1503752 .0587581 2.56 0.016 .0298136 .2709368 mci | .1978373 .0810587 2.44 0.022 .0315186 .364156 ngca | .7784081 .4016301 1.94 0.063 -.0456688 1.602485 _cons | 2.625513 .7307092 3.59 0.001 1.126221 4.124804 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| year | 17 0 17 | province | 28 28 0 *| -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation
eghdfe y x, absorb(id year industry) 可以实现控制多维固定效应
reghdfe y x, absorb(year#industry) 实现控制交乘固定效应
reghdfe也可以同时对标准误进行聚类
总结
从表格展示的回归结果可以发现,xtreg
,reg
,areg
和reghdfe
四个命令估计的系数大小是一致的,只是标准误会有略微差异。其中,xtreg
和reghdfe
命令估计得到的标准误是一致的,它们背后的估计方法是固定效应,而reg
和areg
命令估计得到的标准误是一致的,因为这两个命令背后的估计方法是特殊的混合OLS(LSDV方法)。