qui:frause oaxaca, clear
qui:reg lnwage educ exper tenure age if female==0
est sto male
qui:reg lnwage educ exper tenure age if female==1
est sto female
Introduction
A question I have seen online many…many…many times is how to compare the coefficients of a model that has been estimated using a highdimensional set of fixed effects.
The starting answer has always been…to suest
the both equations, or stack both equations to compare the effects. However, suest
will not work with reghdfe
nor xtreg
. And stacking equations is even less intuitive.
Today, however, I will present you an easy way to do this with with a little command of my own creation, but also using some simple syntax.
To use the strategies I will present here you will need reghdfe
(from ssc
) and cre
(from fra
, my own repository). You will need frause
from ssc.
suest
the problem
Lets start with a simple wage regression model, where we aim to compare the coefficients of men and women. For this, we will use the data set oaxaca
, and a simple Mincerian regression model:
First, lets estimate both models:
and use suest
to put them together, and test if coefficients are different from each other or not. For this I will use lincom
and test
commands:
qui: suest male female
lincom [male_mean]:educ-[female_mean]:educ
test [male_mean=female_mean]:educ
test [male_mean=female_mean], common
( 1) [male_mean]educ - [female_mean]educ = 0
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | -.0300371 .0119579 -2.51 0.012 -.0534741 -.0066001
------------------------------------------------------------------------------
( 1) [male_mean]educ - [female_mean]educ = 0
chi2( 1) = 6.31
Prob > chi2 = 0.0120
( 1) [male_mean]educ - [female_mean]educ = 0
( 2) [male_mean]exper - [female_mean]exper = 0
( 3) [male_mean]tenure - [female_mean]tenure = 0
( 4) [male_mean]age - [female_mean]age = 0
chi2( 4) = 26.63
Prob > chi2 = 0.0000
I could also use more involved methods like creating my own ml
or gmm
option, but there is no need in this simplified method.
Stacking
The next option is do Stacking. This sounds difficult, but its nothing different than using the old trick of interactions. we simply need to estimate a model where all covariates are interacted with our sampling indicator (gender):
qui:reg lnwage i.female##c.(educ exper tenure age), robust
lincom 1.female#c.educ
test 1.female#c.educ
test 1.female#c.educ 1.female#c.exp 1.female#c.tenure 1.female#c.age
( 1) 1.female#c.educ = 0
------------------------------------------------------------------------------
lnwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .0300371 .0119956 2.50 0.012 .0065062 .0535681
------------------------------------------------------------------------------
( 1) 1.female#c.educ = 0
F( 1, 1424) = 6.27
Prob > F = 0.0124
( 1) 1.female#c.educ = 0
( 2) 1.female#c.exper = 0
( 3) 1.female#c.tenure = 0
( 4) 1.female#c.age = 0
F( 4, 1424) = 6.62
Prob > F = 0.0000
Again we will obtain the same results as before.
But now the hard question. What if we have a HDFE?
Stacking FE
To simulate the sitution of a high-dimensional FE, I will use age
. This will allow me to still obtain point estimates using simple regression (and say suest
), while comparing it to the alternative:
qui:reg lnwage educ exper tenure i.age if female==0
est sto male
qui:reg lnwage educ exper tenure i.age if female==1
est sto female
qui:suest male female, cluster(age)
lincom [male_mean]:educ-[female_mean]:educ
test [male_mean=female_mean]:educ exper tenure
( 1) [male_mean]educ - [female_mean]educ = 0
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | -.0354668 .0095251 -3.72 0.000 -.0541357 -.0167978
------------------------------------------------------------------------------
( 1) [male_mean]educ - [female_mean]educ = 0
( 2) [male_mean]exper - [female_mean]exper = 0
( 3) [male_mean]tenure - [female_mean]tenure = 0
chi2( 3) = 30.72
Prob > chi2 = 0.0000
Now the second method, using reghdfe
egen age_fem = group(age fem)
qui:reghdfe lnwage i.female##c.(educ exper tenure), abs(female#age) cluster(age_fem)
lincom 1.female#c.educ
test 1.female#c.educ 1.female#c.exp 1.female#c.tenure
( 1) 1.female#c.educ = 0
------------------------------------------------------------------------------
lnwage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .0354668 .0104804 3.38 0.001 .0146424 .0562911
------------------------------------------------------------------------------
( 1) 1.female#c.educ = 0
( 2) 1.female#c.exper = 0
( 3) 1.female#c.tenure = 0
F( 3, 89) = 6.49
Prob > F = 0.0005
Hard Example:
Lets do this with a harder example, using nlswork
dataset, comparing of a wage regression coefficients between north and south:
webuse nlswork, clear
egen cl = group(idcode south)
qui: reghdfe ln_wage i.south##c.(age msp not_smsa c_city union tenure hours) , abs(idcode#south) cluster(cl)
test 1.south#c.age 1.south#c.msp 1.south#c.union
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)
(8 missing values generated)
( 1) 1.south#c.age = 0
( 2) 1.south#c.msp = 0
( 3) 1.south#c.union = 0
F( 3, 3586) = 1.24
Prob > F = 0.2941
But also using CRE:
webuse nlswork, clear
qui:cre, abs(idcode) keep prefix(m1): regress ln_wage age msp not_smsa c_city union tenure hours if south==0
est sto north
qui:cre, abs(idcode) keep prefix(m2): regress ln_wage age msp not_smsa c_city union tenure hours if south==1
est sto south
qui:suest north south
test [north_mean=south_mean]: age msp union
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)
( 1) [north_mean]age - [south_mean]age = 0
( 2) [north_mean]msp - [south_mean]msp = 0
( 3) [north_mean]union - [south_mean]union = 0
chi2( 3) = 3.82
Prob > chi2 = 0.2821
Conclusions
There you have it. Two ways to compare coefficients across two models using interactions or suest
.
Both provide the same results, if you cluster variables with the absorbed variable.
Hope you find it useful