How to suest a HDFE

I propose a feasible strategy to compare coefficients across models with high dimensional fixed effects
Stata
Programming
Fixed Effects
Author

Fernando Rios-Avila

Published

July 14, 2023

Introduction

A question I have seen online many…many…many times is how to compare the coefficients of a model that has been estimated using a highdimensional set of fixed effects.

The starting answer has always been…to suest the both equations, or stack both equations to compare the effects. However, suest will not work with reghdfe nor xtreg. And stacking equations is even less intuitive.

Today, however, I will present you an easy way to do this with with a little command of my own creation, but also using some simple syntax.

Setup

To use the strategies I will present here you will need reghdfe (from ssc) and cre (from fra, my own repository). You will need frause from ssc.

suest the problem

Lets start with a simple wage regression model, where we aim to compare the coefficients of men and women. For this, we will use the data set oaxaca, and a simple Mincerian regression model:

First, lets estimate both models:

qui:frause oaxaca, clear
qui:reg lnwage educ exper tenure age if female==0
est sto male
qui:reg lnwage educ exper tenure age if female==1
est sto female

and use suest to put them together, and test if coefficients are different from each other or not. For this I will use lincom and test commands:

qui: suest male female
lincom [male_mean]:educ-[female_mean]:educ
test [male_mean=female_mean]:educ
test [male_mean=female_mean], common

 ( 1)  [male_mean]educ - [female_mean]educ = 0

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.0300371   .0119579    -2.51   0.012    -.0534741   -.0066001
------------------------------------------------------------------------------

 ( 1)  [male_mean]educ - [female_mean]educ = 0

           chi2(  1) =    6.31
         Prob > chi2 =    0.0120

 ( 1)  [male_mean]educ - [female_mean]educ = 0
 ( 2)  [male_mean]exper - [female_mean]exper = 0
 ( 3)  [male_mean]tenure - [female_mean]tenure = 0
 ( 4)  [male_mean]age - [female_mean]age = 0

           chi2(  4) =   26.63
         Prob > chi2 =    0.0000

I could also use more involved methods like creating my own ml or gmm option, but there is no need in this simplified method.

Stacking

The next option is do Stacking. This sounds difficult, but its nothing different than using the old trick of interactions. we simply need to estimate a model where all covariates are interacted with our sampling indicator (gender):

qui:reg lnwage i.female##c.(educ exper tenure age), robust
lincom 1.female#c.educ
test 1.female#c.educ
test 1.female#c.educ 1.female#c.exp 1.female#c.tenure 1.female#c.age

 ( 1)  1.female#c.educ = 0

------------------------------------------------------------------------------
      lnwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |   .0300371   .0119956     2.50   0.012     .0065062    .0535681
------------------------------------------------------------------------------

 ( 1)  1.female#c.educ = 0

       F(  1,  1424) =    6.27
            Prob > F =    0.0124

 ( 1)  1.female#c.educ = 0
 ( 2)  1.female#c.exper = 0
 ( 3)  1.female#c.tenure = 0
 ( 4)  1.female#c.age = 0

       F(  4,  1424) =    6.62
            Prob > F =    0.0000

Again we will obtain the same results as before.

But now the hard question. What if we have a HDFE?

Stacking FE

To simulate the sitution of a high-dimensional FE, I will use age. This will allow me to still obtain point estimates using simple regression (and say suest), while comparing it to the alternative:

qui:reg lnwage educ exper tenure i.age if female==0
est sto male
qui:reg lnwage educ exper tenure i.age if female==1
est sto female
qui:suest male female, cluster(age)
lincom [male_mean]:educ-[female_mean]:educ
test [male_mean=female_mean]:educ exper tenure

 ( 1)  [male_mean]educ - [female_mean]educ = 0

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.0354668   .0095251    -3.72   0.000    -.0541357   -.0167978
------------------------------------------------------------------------------

 ( 1)  [male_mean]educ - [female_mean]educ = 0
 ( 2)  [male_mean]exper - [female_mean]exper = 0
 ( 3)  [male_mean]tenure - [female_mean]tenure = 0

           chi2(  3) =   30.72
         Prob > chi2 =    0.0000

Now the second method, using reghdfe

egen age_fem = group(age  fem)
qui:reghdfe lnwage i.female##c.(educ exper tenure), abs(female#age) cluster(age_fem)
lincom 1.female#c.educ
test 1.female#c.educ 1.female#c.exp 1.female#c.tenure 

 ( 1)  1.female#c.educ = 0

------------------------------------------------------------------------------
      lnwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |   .0354668   .0104804     3.38   0.001     .0146424    .0562911
------------------------------------------------------------------------------

 ( 1)  1.female#c.educ = 0
 ( 2)  1.female#c.exper = 0
 ( 3)  1.female#c.tenure = 0

       F(  3,    89) =    6.49
            Prob > F =    0.0005

Using suest and correlated random effects model cre

Now we use Correlated Random Effects model to estimate the FE models:

qui:cre, keep prefix(ml) abs(age):reg lnwage educ exper tenure if female==0
est sto male
qui:cre, keep prefix(m2) abs(age):reg lnwage educ exper tenure if female==1
est sto female
qui:suest male female, cluster(age)
lincom [male_mean]:educ-[female_mean]:educ
test [male_mean=female_mean]:educ exper tenure

 ( 1)  [male_mean]educ - [female_mean]educ = 0

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.0354668   .0095251    -3.72   0.000    -.0541357   -.0167978
------------------------------------------------------------------------------

 ( 1)  [male_mean]educ - [female_mean]educ = 0
 ( 2)  [male_mean]exper - [female_mean]exper = 0
 ( 3)  [male_mean]tenure - [female_mean]tenure = 0

           chi2(  3) =   30.72
         Prob > chi2 =    0.0000

Which gives me exactly the same result!

Hard Example:

Lets do this with a harder example, using nlswork dataset, comparing of a wage regression coefficients between north and south:

webuse nlswork, clear
egen cl = group(idcode south)
qui: reghdfe ln_wage i.south##c.(age msp  not_smsa c_city union tenure hours) , abs(idcode#south) cluster(cl)
test 1.south#c.age 1.south#c.msp 1.south#c.union
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)
(8 missing values generated)

 ( 1)  1.south#c.age = 0
 ( 2)  1.south#c.msp = 0
 ( 3)  1.south#c.union = 0

       F(  3,  3586) =    1.24
            Prob > F =    0.2941

But also using CRE:

webuse nlswork, clear
qui:cre, abs(idcode) keep prefix(m1): regress ln_wage age msp not_smsa c_city union tenure hours if south==0  
est sto north
qui:cre, abs(idcode) keep prefix(m2): regress ln_wage age msp not_smsa c_city union tenure hours if south==1  
est sto south
qui:suest north south
test [north_mean=south_mean]: age msp union
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

 ( 1)  [north_mean]age - [south_mean]age = 0
 ( 2)  [north_mean]msp - [south_mean]msp = 0
 ( 3)  [north_mean]union - [south_mean]union = 0

           chi2(  3) =    3.82
         Prob > chi2 =    0.2821

Conclusions

There you have it. Two ways to compare coefficients across two models using interactions or suest.

Both provide the same results, if you cluster variables with the absorbed variable.

Hope you find it useful