Code
** to get the data from my repository
ssc install frause, replace
frause mpdta, clearUsing CSDID and JWDID
If you are reading this, you probably know quite well all the problems associated with the infamous TWFE-DID especification. Nevertheless, its worth a quick recap, in case you are not aware of.
Before the new literature on DID, whenever people wanted to analyze treatment effects using a difference in differences approach, and had access to data with many periods and many individuals, they would tend to use a model specification that look like this:
\[y_{it} = \delta_i + \delta_t + \gamma \times trt + e_{it} \tag{1}\]
Where \(trt\) was a dummy variable that would take the value of 1 for treated units after the treatment was implemented, and zero otherwise. The idea was that all units with \(trt=0\) (those not yet treated) would be used used as controls. In this specification, one is also controlling for time fixed effects \(\delta_t\) as well as individual (or group) fixed effects \(\delta_i\).
This specification was considered a generalization of the simple 2x2 DID design:
\[ y = \delta_0 + \delta_1 post+ \delta_2 treat + \gamma \ (post\times treat) + e_{it} \]
Where \(\delta_i\) was the equivalent to \(\delta_1\), and \(\delta_2\) the equivalent for \(\delta_t\).
What many didn’t know at the time is that Equation 1 would provide correct identification of the Average treatment effect \(\gamma\), under strict assumptions:
This assumptions almost never hold. For example, the original treatment may become less effective few periods after they are implemented, and units treated later may experience stronger treatment effects than those treated earlier. When this happen, Equation 1 will provide incorrect results for three related reasons:
One of the most interesting consequences of this problem is that one may estimate negative treatment effects, even if all units in the sample had a theoreticaly possitive treatment effect.
Many paper published in 2021 including Goodman-Bacon (2021), Callaway and Sant’Anna (2021), Sun and Abraham (2021), Wooldridge (2021) and Borusyak, Jaravel, and Spiess (2022), to name a few, identified this problem, and proposed similar solutions: Estimate ATTs allowing for cohort and timing heterogeneity, avoiding the use of already treated units as controls.
While these papers also provide their own estimators (in Stata, R, and in some cases python), I will focus only on two solutions. The ones proposed by Callaway and Sant’Anna (2021) and Wooldridge (2021), which I programmed in Stata using csdid[2], and jwdid. Both approaches suggest to estimate heterogenous treatment effects based on when a unit is treated (its cohort \(G\)), and at what point in time one is interested in estimating that effect \(T\). We will call them the \(ATT(G,T)\). 1
Lets assume you have access to Balanced Panel Data. In other words, you observe the same group of individuals across the same window of time. Different individuals are treated at different points in time, and some are not treated at all (never treated). This is the best case scenario, since you do not need to worry about identifying cohorts by treat timing.
Let’s see how to estimate this type of models using csdid and jwdid. They estimate the treatment effects using different strategies, but under specific cases, they will provide the same point estimates. To show how this is done, I will employ the toy-dataset used in Callaway and Sant’Anna (2021).
First let’s set things up, loading the data:
This dataset contains information at the county level on population size (lpop), employment (lemp), and a variable that indicates when a county instuted a change in minimum wage (first_treat). To estimate a DID model with heterogenous treatment effects, could use either jwdid or csdid, which are available from SSC.2
Just one more step (not required but important if you did not identified the cohort variable). Lets create a dummy that takes the value of 1 only after the treated unit is treated:
This is what Equation 1 is using. So how do I create the “treatment cohort” variable?. That can be easily done using one of csdid subprograms. You just need to provide the group or panel identifier, and the time variable:
Its worth noting that the official commands xthdidregress and hdidregress do not require you to do this, because it automatically creates the gvar internally, based on the information provided. You can now create a small tabulation, as a sanity check, to verity the variable gvar is correctly created:
| year | 0 | 2004 | 2006 | 2007 | Total | 
|---|---|---|---|---|---|
| 2003 | 309 | 20 | 40 | 131 | 500 | 
| 2004 | 309 | 20 | 40 | 131 | 500 | 
| 2005 | 309 | 20 | 40 | 131 | 500 | 
| 2006 | 309 | 20 | 40 | 131 | 500 | 
| 2007 | 309 | 20 | 40 | 131 | 500 | 
| Total | 1,545 | 100 | 200 | 655 | 2,500 | 
If all goes as expected, you a tabulation similar to the one you see here. You have a total of 2500 observations, but only 500 counties. 309 were not treated (had no changes in minimum wage between 2003 and 2007), 20 were treated in 2004, 40 in 2006 and 131 in 2007.
These numbers are what I call “effective sample size”, because csdid (and implicitly jwdid) use only this ammount of information when estimating the treatment effects. In other words, we should assume the sample size is 20 (for the purposes of model specification and controls).
By default, jwdid uses the not yet treated as comparison group. So, to make the results comparable to csdid, I will request using those “never-treated” as comparison group. For csdid, the default is using short gaps for pre-treatment ATT’s, along with the never treated as comaprison group. To change that I will use long2 option. This will make the following comparable to the outcomes from standard Event-studies estimations. The next lines should produce identical point estimates, with very similar standard errors.
............
Difference-in-difference with Multiple Time Periods
                                                         Number of obs = 2,500
Outcome model  : regression adjustment
Treatment model: none
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
g2004        |
 t_2003_2004 |  -.0105032    .023251    -0.45   0.651    -.0560744    .0350679
 t_2003_2005 |  -.0704232   .0309848    -2.27   0.023    -.1311522   -.0096941
 t_2003_2006 |  -.1372587   .0364357    -3.77   0.000    -.2086713   -.0658461
 t_2003_2007 |  -.1008114   .0343592    -2.93   0.003    -.1681542   -.0334685
-------------+----------------------------------------------------------------
g2006        |
 t_2003_2005 |  -.0037693    .031342    -0.12   0.904    -.0651985      .05766
 t_2004_2005 |   .0027508   .0195586     0.14   0.888    -.0355833    .0410849
 t_2005_2006 |  -.0045946   .0177552    -0.26   0.796    -.0393942    .0302049
 t_2005_2007 |  -.0412245   .0202292    -2.04   0.042    -.0808729    -.001576
-------------+----------------------------------------------------------------
g2007        |
 t_2003_2006 |   .0033064   .0244519     0.14   0.892    -.0446184    .0512311
 t_2004_2006 |    .033813   .0211292     1.60   0.110    -.0075994    .0752254
 t_2005_2006 |   .0310871   .0178775     1.74   0.082    -.0039522    .0661264
 t_2006_2007 |  -.0260544   .0166554    -1.56   0.118    -.0586985    .0065896
------------------------------------------------------------------------------
Control: Never Treated
See Callaway and Sant'Anna (2021) for detailsWARNING: Singleton observations not dropped; statistical significance is biased
>  (link)
(MWFE estimator converged in 2 iterations)
HDFE Linear regression                            Number of obs   =      2,500
Absorbing 2 HDFE groups                           F(  12,    499) =       2.88
Statistics robust to heteroskedasticity           Prob > F        =     0.0008
                                                  R-squared       =     0.9933
                                                  Adj R-squared   =     0.9915
                                                  Within R-sq.    =     0.0124
Number of clusters (countyreal) =        500      Root MSE        =     0.1389
                           (Std. err. adjusted for 500 clusters in countyreal)
------------------------------------------------------------------------------
             |               Robust
        lemp | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   gvar#year#|
    c.__tr__ |
  2004 2004  |  -.0105032   .0233492    -0.45   0.653    -.0563781    .0353716
  2004 2005  |  -.0704232   .0311156    -2.26   0.024    -.1315568   -.0092895
  2004 2006  |  -.1372587   .0365895    -3.75   0.000    -.2091472   -.0653703
  2004 2007  |  -.1008114   .0345043    -2.92   0.004    -.1686029   -.0330198
  2006 2003  |  -.0037693   .0314743    -0.12   0.905    -.0656078    .0580693
  2006 2004  |   .0027508   .0196411     0.14   0.889    -.0358387    .0413403
  2006 2006  |  -.0045946   .0178301    -0.26   0.797     -.039626    .0304368
  2006 2007  |  -.0412245   .0203146    -2.03   0.043    -.0811371   -.0013118
  2007 2003  |   .0033064   .0245551     0.13   0.893    -.0449378    .0515505
  2007 2004  |    .033813   .0212184     1.59   0.112    -.0078753    .0755014
  2007 2005  |   .0310871    .017953     1.73   0.084    -.0041856    .0663599
  2007 2007  |  -.0260544   .0167257    -1.56   0.120     -.058916    .0068072
             |
       _cons |   5.773609   .0033784  1708.97   0.000     5.766971    5.780247
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
  countyreal |       500         500           0    *|
        year |         5           0           5     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computationYou can of course add controls. In this case lpop is a time constant variable, so it can be added to both commands. For us to obtain the same results, however, you need to add the method(reg) to csdid. Otherwise it will implement the double robust estimator dripw (default)
............
Difference-in-difference with Multiple Time Periods
                                                         Number of obs = 2,500
Outcome model  : regression adjustment
Treatment model: none
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
g2004        |
 t_2003_2004 |  -.0149112   .0220557    -0.68   0.499    -.0581396    .0283171
 t_2003_2005 |  -.0769963   .0283597    -2.71   0.007    -.1325804   -.0214122
 t_2003_2006 |  -.1410801   .0348363    -4.05   0.000     -.209358   -.0728022
 t_2003_2007 |  -.1075443   .0327377    -3.29   0.001     -.171709   -.0433796
-------------+----------------------------------------------------------------
g2006        |
 t_2003_2005 |   .0090343   .0300861     0.30   0.764    -.0499333     .068002
 t_2004_2005 |   .0069683   .0183458     0.38   0.704    -.0289888    .0429254
 t_2005_2006 |   .0007655   .0191959     0.04   0.968    -.0368578    .0383888
 t_2005_2007 |  -.0415356   .0197169    -2.11   0.035      -.08018   -.0028913
-------------+----------------------------------------------------------------
g2007        |
 t_2003_2006 |   .0068961   .0244888     0.28   0.778    -.0411011    .0548933
 t_2004_2006 |   .0332619   .0211607     1.57   0.116    -.0082123    .0747362
 t_2005_2006 |   .0285021   .0181321     1.57   0.116    -.0070361    .0640403
 t_2006_2007 |  -.0287895   .0161679    -1.78   0.075    -.0604779    .0028989
------------------------------------------------------------------------------
Control: Never Treated
See Callaway and Sant'Anna (2021) for detailsWARNING: Singleton observations not dropped; statistical significance is biased
>  (link)
(MWFE estimator converged in 2 iterations)
note: 2007.year#c.lpop omitted because of collinearity
HDFE Linear regression                            Number of obs   =      2,500
Absorbing 2 HDFE groups                           F(  28,    499) =       3.15
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.9934
                                                  Adj R-squared   =     0.9916
                                                  Within R-sq.    =     0.0240
Number of clusters (countyreal) =        500      Root MSE        =     0.1386
                           (Std. err. adjusted for 500 clusters in countyreal)
------------------------------------------------------------------------------
             |               Robust
        lemp | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   gvar#year#|
    c.__tr__ |
  2004 2004  |  -.0149112   .0222198    -0.67   0.502    -.0585672    .0287448
  2004 2005  |  -.0769963   .0277681    -2.77   0.006    -.1315531   -.0224395
  2004 2006  |  -.1410801   .0322433    -4.38   0.000    -.2044295   -.0777307
  2004 2007  |  -.1075443   .0328764    -3.27   0.001    -.1721375    -.042951
  2006 2003  |   .0090343   .0302224     0.30   0.765    -.0503444    .0684131
  2006 2004  |   .0069683   .0181921     0.38   0.702    -.0287742    .0427108
  2006 2006  |   .0007655   .0186329     0.04   0.967    -.0358432    .0373742
  2006 2007  |  -.0415356   .0191982    -2.16   0.031    -.0792549   -.0038164
  2007 2003  |   .0068961   .0246543     0.28   0.780    -.0415429    .0553351
  2007 2004  |   .0332619   .0213008     1.56   0.119    -.0085883    .0751122
  2007 2005  |   .0285021   .0182653     1.56   0.119    -.0073843    .0643885
  2007 2007  |  -.0287895   .0161312    -1.78   0.075     -.060483     .002904
             |
   gvar#year#|
    c.__tr__#|
   c._x_lpop |
  2004 2004  |   .0005953   .0183556     0.03   0.974    -.0354686    .0366592
  2004 2005  |   .0234096   .0183749     1.27   0.203     -.012692    .0595113
  2004 2006  |   .0482261   .0224194     2.15   0.032      .004178    .0922742
  2004 2007  |   .0091886   .0271423     0.34   0.735    -.0441386    .0625158
  2006 2003  |  -.0126074   .0243335    -0.52   0.605    -.0604162    .0352014
  2006 2004  |  -.0177865   .0161892    -1.10   0.272    -.0495939    .0140208
  2006 2006  |   .0282074   .0141213     2.00   0.046     .0004628     .055952
  2006 2007  |   .0277793   .0180844     1.54   0.125    -.0077516    .0633102
  2007 2003  |   .0083787   .0254037     0.33   0.742    -.0415327    .0582902
  2007 2004  |  -.0079105   .0188673    -0.42   0.675    -.0449797    .0291587
  2007 2005  |  -.0025825   .0178299    -0.14   0.885    -.0376135    .0324484
  2007 2007  |  -.0203637   .0162117    -1.26   0.210    -.0522153    .0114878
             |
 year#c.lpop |
       2003  |  -.0229821   .0129329    -1.78   0.076    -.0483918    .0024277
       2004  |  -.0079359   .0107757    -0.74   0.462    -.0291072    .0132355
       2005  |  -.0005453   .0097177    -0.06   0.955     -.019638    .0185474
       2006  |  -.0099382   .0090048    -1.10   0.270    -.0276302    .0077537
       2007  |          0  (omitted)
             |
       _cons |   5.800979   .0225656   257.07   0.000     5.756644    5.845315
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
  countyreal |       500         500           0    *|
        year |         5           0           5     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computationIn general, however, you may not be interested in analyzing each individual ATTGT, but rather some aggregated results. For example, overall average, or dynamic effects.3. Prof. Wooldrige, however, also suggest to ajust standard errors to account for sampling variation on the covariates. To do so, I need to modify the syntax of jwdid slighly, as well as the post-estimation commands.
Average Treatment Effect on Treated
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         ATT |  -.0419686   .0114448    -3.67   0.000    -.0644001   -.0195372
------------------------------------------------------------------------------
ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     Pre_avg |   .0193186   .0171138     1.13   0.259    -.0142239     .052861
    Post_avg |  -.0807817   .0187459    -4.31   0.000    -.1175229   -.0440405
         Tm4 |   .0068961   .0244888     0.28   0.778    -.0411011    .0548933
         Tm3 |   .0275947   .0180257     1.53   0.126    -.0077351    .0629244
         Tm2 |    .023465   .0144416     1.62   0.104    -.0048402    .0517701
         Tp0 |  -.0211467    .011481    -1.84   0.065    -.0436492    .0013557
         Tp1 |  -.0533559   .0162931    -3.27   0.001    -.0852897    -.021422
         Tp2 |  -.1410801   .0348363    -4.05   0.000     -.209358   -.0728022
         Tp3 |  -.1075443   .0327377    -3.29   0.001     -.171709   -.0433796
------------------------------------------------------------------------------
ATT by Calendar Period
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    CAverage |  -.0445323    .014885    -2.99   0.003    -.0737064   -.0153582
       T2004 |  -.0149112   .0220557    -0.68   0.499    -.0581396    .0283171
       T2005 |  -.0769963   .0283597    -2.71   0.007    -.1325804   -.0214122
       T2006 |  -.0465164   .0209984    -2.22   0.027    -.0876724   -.0053603
       T2007 |  -.0397054   .0128659    -3.09   0.002     -.064922   -.0144888
------------------------------------------------------------------------------
ATT by group
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    GAverage |  -.0329292   .0117872    -2.79   0.005    -.0560318   -.0098267
       G2004 |   -.085133   .0242512    -3.51   0.000    -.1326645   -.0376015
       G2006 |  -.0203851   .0174025    -1.17   0.241    -.0544933    .0137232
       G2007 |  -.0287895   .0161679    -1.78   0.075    -.0604779    .0028989
------------------------------------------------------------------------------WARNING: Singleton observations not dropped; statistical significance is biased
>  (link)
------------------------------------------------------------------------------
             |            Delta-method
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at |
   (2 vs 1)  |  -.0419686   .0109096    -3.85   0.000     -.063351   -.0205863
------------------------------------------------------------------------------
------------------------------------------------------------------------------
             |            Delta-method
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at@|
   __event__ |
(2 vs 1) -4  |   .0068961   .0246543     0.28   0.780    -.0414254    .0552177
(2 vs 1) -3  |   .0275947   .0181227     1.52   0.128    -.0079251    .0631145
(2 vs 1) -2  |    .023465   .0145109     1.62   0.106    -.0049758    .0519057
(2 vs 1) -1  |          0  (omitted)
 (2 vs 1) 0  |  -.0211467   .0113774    -1.86   0.063     -.043446    .0011525
 (2 vs 1) 1  |  -.0533559    .015752    -3.39   0.001    -.0842293   -.0224824
 (2 vs 1) 2  |  -.1410801   .0322433    -4.38   0.000    -.2042759   -.0778843
 (2 vs 1) 3  |  -.1075443   .0328764    -3.27   0.001    -.1719809   -.0431077
------------------------------------------------------------------------------
------------------------------------------------------------------------------
             |            Delta-method
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at@|
__calendar__ |
   (2 vs 1)  |
       2004  |  -.0149112   .0222198    -0.67   0.502    -.0584613    .0286389
   (2 vs 1)  |
       2005  |  -.0769963   .0277681    -2.77   0.006    -.1314208   -.0225719
   (2 vs 1)  |
       2006  |  -.0465164   .0184021    -2.53   0.011    -.0825839   -.0104488
   (2 vs 1)  |
       2007  |  -.0397054   .0127042    -3.13   0.002    -.0646051   -.0148057
------------------------------------------------------------------------------
------------------------------------------------------------------------------
             |            Delta-method
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at@|
   __group__ |
   (2 vs 1)  |
       2004  |   -.085133   .0237216    -3.59   0.000    -.1316265   -.0386395
   (2 vs 1)  |
       2006  |  -.0203851   .0167622    -1.22   0.224    -.0532384    .0124683
   (2 vs 1)  |
       2007  |  -.0287895   .0161312    -1.78   0.074    -.0604061    .0028271
------------------------------------------------------------------------------                           (Std. err. adjusted for 500 clusters in countyreal)
------------------------------------------------------------------------------
             |            Unconditional
             | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at |
   (2 vs 1)  |  -.0419686   .0115467    -3.63   0.000    -.0646548   -.0192824
------------------------------------------------------------------------------
                           (Std. err. adjusted for 500 clusters in countyreal)
------------------------------------------------------------------------------
             |            Unconditional
             | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at@|
   __event__ |
(2 vs 1) -4  |   .0068961   .0247069     0.28   0.780    -.0416463    .0554385
(2 vs 1) -3  |   .0275947   .0181862     1.52   0.130    -.0081364    .0633257
(2 vs 1) -2  |    .023465   .0145703     1.61   0.108    -.0051617    .0520916
(2 vs 1) -1  |          0  (omitted)
 (2 vs 1) 0  |  -.0211467   .0115833    -1.83   0.069    -.0439048    .0016113
 (2 vs 1) 1  |  -.0533559   .0164382    -3.25   0.001    -.0856524   -.0210593
 (2 vs 1) 2  |  -.1410801   .0351465    -4.01   0.000    -.2101335   -.0720267
 (2 vs 1) 3  |  -.1075443   .0330292    -3.26   0.001    -.1724378   -.0426508
------------------------------------------------------------------------------
                           (Std. err. adjusted for 500 clusters in countyreal)
------------------------------------------------------------------------------
             |            Unconditional
             | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at@|
__calendar__ |
   (2 vs 1)  |
       2004  |  -.0149112   .0222521    -0.67   0.503    -.0586306    .0288081
   (2 vs 1)  |
       2005  |  -.0769963   .0286123    -2.69   0.007    -.1332117   -.0207809
   (2 vs 1)  |
       2006  |  -.0465164   .0211854    -2.20   0.029    -.0881399   -.0048928
   (2 vs 1)  |
       2007  |  -.0397054   .0129804    -3.06   0.002    -.0652084   -.0142024
------------------------------------------------------------------------------
                           (Std. err. adjusted for 500 clusters in countyreal)
------------------------------------------------------------------------------
             |            Unconditional
             | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         _at@|
   __group__ |
   (2 vs 1)  |
       2004  |   -.085133   .0244672    -3.48   0.001    -.1332044   -.0370616
   (2 vs 1)  |
       2006  |  -.0203851   .0175575    -1.16   0.246    -.0548807    .0141106
   (2 vs 1)  |
       2007  |  -.0287895   .0163118    -1.76   0.078    -.0608378    .0032589
------------------------------------------------------------------------------Again, both showing identical point estimates with slighly different standard errors.
Repeated crossection operates slighly different than panel data. On the one hand, you do not observe the same units across time. Thus, there is no unit specific fixed-effect. Instead one would be trying to account for cohort specific effect, or treatment-level fixed effects. This may become clearer with an example.
Lets us first consider the same dataset as before. To simulate the situation of repeated crossection, lets assume that each round, data was collected for a random sample of counties within each State, and that, for some unknown reason, we cannot identify counties across time. Lets us also assume that the treatment was implemented at the State level (for clustering purposes). First some data preparation.
To simulate repeated crossection structure, I will drop 10% of the data, create a State id, and drop the county identifier.
(Written by R.              )
(250 observations deleted)Next is to create the gvar. This will follow the same syntax as before, with the main difference that instead of using countyreal as ivar, we will use state. Its important that this variable represents the identifier of the level at which the treatment was implemented, and that we can follow up across time.
And as always, its a good idea to crosstabulate year with the new gvar:
| year | 0 | 2004 | 2006 | 2007 | Total | 
|---|---|---|---|---|---|
| 2003 | 279 | 19 | 35 | 114 | 447 | 
| 2004 | 286 | 19 | 34 | 117 | 456 | 
| 2005 | 279 | 18 | 38 | 121 | 456 | 
| 2006 | 275 | 18 | 40 | 115 | 448 | 
| 2007 | 275 | 18 | 36 | 114 | 443 | 
| Total | 1,474 | 93 | 193 | 615 | 2,375 | 
What you should observe, as shown here, is that you still have roughtly the same number of observations per gvar across time. This is important because you need to have a groups that are comparable across time. Think of this as having a pseudo panel.
In terms of estimation, the syntax needs to be adjusted slighly:
If you are using csdid, the adjustment simply requires to drop ivar option. To make sure we are clustering at the correct level, one should use cluster() instead, using the pseudo panel identifier state as clustering variable:
............
Difference-in-difference with Multiple Time Periods
                                                         Number of obs = 2,250
Outcome model  : regression adjustment
Treatment model: none
                                 (Std. err. adjusted for 29 clusters in state)
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
g2004        |
 t_2003_2004 |   .0671177   .0420718     1.60   0.111    -.0153416     .149577
 t_2003_2005 |  -.2982137   .0481335    -6.20   0.000    -.3925536   -.2038737
 t_2003_2006 |  -.2056332   .0352891    -5.83   0.000    -.2747986   -.1364679
 t_2003_2007 |  -.0657414   .0394828    -1.67   0.096    -.1431262    .0116435
-------------+----------------------------------------------------------------
g2006        |
 t_2003_2005 |   .0718644   .0669577     1.07   0.283    -.0593703     .203099
 t_2004_2005 |   .2762365   .1143842     2.41   0.016     .0520476    .5004253
 t_2005_2006 |   .1252212   .0839351     1.49   0.136    -.0392885    .2897309
 t_2005_2007 |   .1471258   .0853201     1.72   0.085    -.0200985    .3143501
-------------+----------------------------------------------------------------
g2007        |
 t_2003_2006 |   .0932773   .0789284     1.18   0.237    -.0614195    .2479742
 t_2004_2006 |    .124871   .0761301     1.64   0.101    -.0243412    .2740832
 t_2005_2006 |  -.0352257   .0628217    -0.56   0.575    -.1583539    .0879025
 t_2006_2007 |  -.0136172   .0952995    -0.14   0.886    -.2004007    .1731663
------------------------------------------------------------------------------
Control: Never Treated
See Callaway and Sant'Anna (2021) for detailsAggregations are obtained using estat post estimation commnads.
If you are using jwdid, there are two alternatives. The first one is to use ivar() with the new pseudo panel identifier (state).
WARNING: Singleton observations not dropped; statistical significance is biased
>  (link)
(MWFE estimator converged in 4 iterations)
warning: missing F statistic; dropped variables due to collinearity or too few 
> clusters
HDFE Linear regression                            Number of obs   =      2,250
Absorbing 2 HDFE groups                           F(  12,     28) =          .
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.1797
                                                  Adj R-squared   =     0.1630
                                                  Within R-sq.    =     0.0010
Number of clusters (state)   =         29         Root MSE        =     1.3796
                                 (Std. err. adjusted for 29 clusters in state)
------------------------------------------------------------------------------
             |               Robust
        lemp | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   gvar#year#|
    c.__tr__ |
  2004 2004  |   .0799107   .0377474     2.12   0.043     .0025886    .1572328
  2004 2005  |  -.2926293   .0448166    -6.53   0.000    -.3844319   -.2008267
  2004 2006  |  -.1982852     .03159    -6.28   0.000    -.2629944   -.1335761
  2004 2007  |  -.0582612   .0431331    -1.35   0.188    -.1466153    .0300929
  2006 2003  |   .0498869    .068996     0.72   0.476    -.0914451    .1912188
  2006 2004  |   .2964946   .1219394     2.43   0.022      .046713    .5462761
  2006 2006  |   .1220182   .0809493     1.51   0.143    -.0437989    .2878353
  2006 2007  |   .1401132   .0801401     1.75   0.091    -.0240464    .3042727
  2007 2003  |   .0118143   .0676839     0.17   0.863      -.12683    .1504585
  2007 2004  |   .0855261   .0903276     0.95   0.352    -.0995017    .2705538
  2007 2005  |  -.0974496   .0709482    -1.37   0.180    -.2427804    .0478812
  2007 2007  |  -.0729594   .0980111    -0.74   0.463    -.2737261    .1278073
             |
       _cons |   5.762268   .0141152   408.23   0.000     5.733355    5.791182
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       state |        29          29           0    *|
        year |         5           0           5     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computationData will be clustered at the state level, explicitly accounting for State fixed effects.
Alternatively, one could change ivar() with cluster(), to ensure clustering is done at the correct level, and add the option group, so that cohort fixed effects are added to the specification.4
WARNING: Singleton observations not dropped; statistical significance is biased
>  (link)
(MWFE estimator converged in 3 iterations)
warning: missing F statistic; dropped variables due to collinearity or too few 
> clusters
HDFE Linear regression                            Number of obs   =      2,250
Absorbing 2 HDFE groups                           F(  12,     28) =          .
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.0298
                                                  Adj R-squared   =     0.0215
                                                  Within R-sq.    =     0.0008
Number of clusters (state)   =         29         Root MSE        =     1.4916
                                 (Std. err. adjusted for 29 clusters in state)
------------------------------------------------------------------------------
             |               Robust
        lemp | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   gvar#year#|
    c.__tr__ |
  2004 2004  |   .0671177   .0429985     1.56   0.130    -.0209608    .1551962
  2004 2005  |  -.2982137   .0491937    -6.06   0.000    -.3989825   -.1974449
  2004 2006  |  -.2056332   .0360664    -5.70   0.000    -.2795119   -.1317546
  2004 2007  |  -.0657414   .0403525    -1.63   0.114    -.1483996    .0169169
  2006 2003  |   .0718644   .0684326     1.05   0.303    -.0683133    .2120421
  2006 2004  |   .2762365   .1169037     2.36   0.025     .0367701    .5157028
  2006 2006  |   .1252212   .0857839     1.46   0.155    -.0504991    .3009415
  2006 2007  |   .1471258   .0871994     1.69   0.103    -.0314941    .3257457
  2007 2003  |   .0932773    .080667     1.16   0.257    -.0719614    .2585161
  2007 2004  |    .124871    .077807     1.60   0.120    -.0345093    .2842514
  2007 2005  |  -.0352257   .0642054    -0.55   0.588    -.1667446    .0962931
  2007 2007  |  -.0136172   .0973986    -0.14   0.890    -.2131292    .1858948
             |
       _cons |   5.749808   .1159179    49.60   0.000     5.512361    5.987255
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        gvar |         4           0           4     |
        year |         5           1           4     |
-----------------------------------------------------+The results across the different approaches will be different because of how the data is used. But, as you can see here, point estimates are broadly comparable.
As far as I know, either option is correct.
I hope this reference helps people who want to use jwdid and csdid for the estimation of DID models in the presence of repeated crossection.
If you have comments, or questions, do not hesitate to contact me.
I will not provide details of the model specification in this post, but leave it for some future page on this site.↩︎
If you want to use csdid2, please check the blog-post on using “own installer”, and try fra install csdid2.↩︎
also available group and calendar↩︎
In my repository, the newer version of jwdid does this automatically if ivar() is not provided.↩︎
---
title: "DID: Panel Data & Repeated Crossection"
subtitle: "Using `CSDID` and `JWDID`"
format: html
bibliography: references.bib
---
## First Things First
### DID with Multiple Periods and Time Heterogeneity
If you are reading this, you probably know quite well all the problems associated with the infamous TWFE-DID especification. Nevertheless, its worth a quick recap, in case you are not aware of.
Before the new literature on DID, whenever people wanted to analyze treatment effects using a difference in differences approach, and had access to data with many periods and many individuals, they would tend to use a model specification that look like this:
 
$$y_{it} = \delta_i + \delta_t + \gamma \times trt + e_{it}
$${#eq-1}
Where $trt$ was a dummy variable that would take the value of 1 for treated units after the treatment was implemented, and zero otherwise. The idea was that all units with $trt=0$ (those not yet treated) would be used used as controls. In this specification, one is also controlling for time fixed effects $\delta_t$ as well as individual (or group) fixed effects $\delta_i$.
This specification was considered a generalization of the simple 2x2 DID design:
$$
y = \delta_0 + \delta_1 post+ \delta_2 treat + \gamma \ (post\times treat) + e_{it}
$$
Where $\delta_i$ was the equivalent to $\delta_1$, and $\delta_2$ the equivalent for $\delta_t$.
What many didn't know at the time is that @eq-1 would provide correct identification of the Average treatment effect $\gamma$, under strict assumptions:
- There is no treatment timing heterogeneity (All units are treated at the same time)
- If there is timing heterogeneity, the treatment effect is homogenous across time and across groups. ($\gamma$ is the same across time or across groups)
This assumptions almost never hold. For example, the original treatment may become less effective few periods after they are implemented, and units treated later may experience stronger treatment effects than those treated earlier. When this happen, @eq-1 will provide incorrect results for three related reasons:
1. Linear regressions do not discriminate between good or bad controls when identifying coefficients. They simply exploit all possible varation in the data. 
2. Already treated units would be implicitly used as "control units", when analyzing units treated at a later point in time.
3. Because of this, some treated units will receive "negative" weights, when estimating Average treatemnt effects.
One of the most interesting consequences of this problem is that one may estimate negative treatment effects, even if all units in the sample had a theoreticaly possitive treatment effect.
Many paper published in 2021 including @goodman_bacon_2021, @callaway_santanna_2021, @sun_estimating_2021, @wooldridge_twoway_2021 and @borusyak_revisiting_2022, to name a few, identified this problem, and proposed similar solutions: Estimate ATTs allowing for cohort and timing heterogeneity, avoiding the use of already treated units as controls. 
While these papers also provide their own estimators (in `Stata`, `R`, and in some cases `python`), I will focus only on two solutions. The ones proposed by @callaway_santanna_2021 and @wooldridge_twoway_2021, which I programmed in `Stata` using `csdid[2]`, and `jwdid`. Both approaches suggest to estimate heterogenous treatment effects based on when a unit is treated (its cohort $G$), and at what point in time one is interested in estimating that effect $T$. We will call them the $ATT(G,T)$. ^[I will not provide details of the model specification in this post, but leave it for some future page on this site.]
##  GxT DID (instead  of 2x2 DID)
### Panel Data
Lets assume you have access to Balanced Panel Data. In other words, you observe the same group of individuals across the same window of time. Different individuals are treated at different points in time, and some are not treated at all (never treated). This is the best case scenario, since you do not need to worry about identifying cohorts by treat timing. 
Let's see how to estimate this type of models using `csdid` and `jwdid`. They estimate the treatment effects using different strategies, but under specific cases, they will provide the same point estimates. To show how this is done, I will employ the toy-dataset used in @callaway_santanna_2021.
First let's set things up, loading the data:
```{stata}
*| output: false
** to get the data from my repository
ssc install frause, replace
frause mpdta, clear
```
This dataset contains information at the county level on population size (lpop), employment (lemp), and a variable that indicates when a county instuted a change in minimum wage (first_treat). To estimate a DID model with heterogenous treatment effects, could use either `jwdid` or `csdid`, which are available from SSC.^[If you want to use `csdid2`, please check the blog-post on using "own installer", and try `fra install csdid2`.]
```{stata}
*| output: false
** This is a dependency for csdid
ssc install drdid, replace
ssc install csdid, replace
ssc install jwdid, replace
```
Just one more step (not required but important if you did not identified the `cohort` variable). Lets create a dummy that takes the value of 1 only after the treated unit is treated:
```{stata}
gen trt = (first_treat<=year)*(first_treat>0)
```
This is what @eq-1 is using. So how do I create the "*treatment cohort*" variable?. That can be easily done using one of `csdid` subprograms. You just need to provide the group or panel identifier, and the time variable:
```{stata}
egen gvar = csgvar(trt), ivar(countyreal) tvar(year)
```
Its worth noting that the official commands `xthdidregress` and `hdidregress` do not require you to do this, because it automatically creates the gvar internally, based on the information provided. You can now create a small tabulation, as a sanity check, to verity the variable `gvar` is correctly created:
```stata
tab year gvar
```
|year\gvar |         0    |   2004   |    2006   |    2007 |     Total|
|-----------|--------------|----------|-----------|---------|----------|
|      2003 |       309    |     20   |      40   |     131 |       500 |
|      2004 |       309    |     20   |      40   |     131 |       500 |
|      2005 |       309    |     20   |      40   |     131 |       500 |
|      2006 |       309    |     20   |      40   |     131 |       500 |
|      2007 |       309    |     20   |      40   |     131 |       500 |
|     Total |     1,545    |    100   |     200   |     655 |     2,500 |
If all goes as expected, you a tabulation similar to the one you see here. You have a total of 2500 observations, but only 500 counties. 309 were not treated (had no changes in minimum wage between 2003 and 2007), 20 were treated in 2004, 40 in 2006 and 131 in 2007.
These numbers are what I call "effective sample size", because  `csdid` (and implicitly `jwdid`) use only this ammount of information when estimating the treatment effects. In other words, we should assume the sample size is 20 (for the purposes of model specification and controls).
By default, `jwdid` uses the not yet treated as comparison group. So, to make the results comparable to `csdid`, I will request using those "never-treated" as comparison group. For `csdid`, the default is using short gaps for pre-treatment ATT's, along with the never treated as comaprison group. To change that I will use `long2` option. This will make the following comparable to the outcomes from standard Event-studies estimations. The next lines should produce identical point estimates, with very similar standard errors. 
::: {.panel-tabset}
## CSDID
```{stata}
csdid lemp , ivar(county) time(year) gvar(gvar) long2
```
## JWDID
```{stata}
jwdid lemp , ivar(county) time(year) gvar(gvar) never
```
:::
You can of course add controls. In this case `lpop` is a time constant variable, so it can be added to both commands. For us to obtain the same results, however, you need to add the `method(reg)` to `csdid`. Otherwise it will implement the double robust estimator `dripw` (default)
::: {.panel-tabset}
## CSDID
```{stata}
csdid lemp lpop , ivar(county) time(year) gvar(gvar) long2 method(reg)
```
## JWDID
```{stata}
jwdid lemp lpop, ivar(county) time(year) gvar(gvar) never 
```
:::
In general, however, you may not be interested in analyzing each individual ATTGT, but rather some aggregated results. For example, overall average, or dynamic effects.^[also available `group` and `calendar`]. Prof. Wooldrige, however, also suggest to ajust standard errors to account for sampling variation on the covariates. To do so, I need to modify the syntax of `jwdid` slighly, as well as the post-estimation commands.
::: {.panel-tabset}
## CSDID
```{stata}
qui:csdid lemp lpop , ivar(county) time(year) gvar(gvar) long2 method(reg)
estat simple
estat event
estat calendar
estat group
```
## JWDID
```{stata}
qui:jwdid lemp lpop, ivar(county) time(year) gvar(gvar) never 
estat simple
estat event
estat calendar
estat group
```
## JWDID: vce(unconditional)
```{stata}
qui:jwdid lemp lpop, ivar(county) time(year) gvar(gvar) never ///
    method(regress) group
estat simple, vce(unconditional)
estat event, vce(unconditional)
estat calendar, vce(unconditional)
estat group, vce(unconditional)
```
:::
Again, both showing identical point estimates with slighly different standard errors. 
## Repeated Crossection
Repeated crossection operates slighly different than panel data. On the one hand, you do not observe the same units across time. Thus, there is no *unit specific fixed-effect*. Instead one would be trying to account for cohort specific effect, or treatment-level fixed effects. This may become clearer with an example. 
Lets us first consider the same dataset as before. To simulate the situation of repeated crossection, lets assume that each round, data was collected for a random sample of counties within each State, and that, for some unknown reason, we cannot identify counties across time. Lets us also assume that the treatment was implemented at the State level (for clustering purposes). First some data preparation.
To simulate repeated crossection structure, I will drop 10% of the data, create a ***State id***, and drop the county identifier.  
```{stata}
** load and obtain trt
frause mpdta, clear
gen trt = (first_treat<=year)*(first_treat>0)
gen state = int(countyreal/1000)
** Randomly keep 90% of the data
set seed 1
sample 90
drop countyreal
```
Next is to create the `gvar`. This will follow the same syntax as before, with the main difference that instead of using **countyreal** as `ivar`, we will use `state`. Its important that this variable represents the identifier of the level at which the treatment was implemented, and that we can follow up across time. 
```{stata}
egen gvar = csgvar(trt), ivar(state) tvar(year)
```
And as always, its a good idea to crosstabulate year with the new gvar:
```stata
tab year gvar
```
|year\gvar |         0    |   2004   |    2006   |    2007 |     Total|
|-----------|--------------|----------|-----------|---------|----------|
|      2003 |       279    |     19    |     35    |    114 |       447 |
|      2004 |       286    |     19    |     34    |    117 |       456 |
|      2005 |       279    |     18    |     38    |    121 |       456 |
|      2006 |       275    |     18    |     40     |   115 |       448 |
|      2007 |       275    |     18    |     36      |  114 |       443 |
|     Total |     1,474     |    93   |     193     |   615 |     2,375 |
What you should observe, as shown here, is that you still have roughtly the same number of observations per `gvar` across time. This is important because you need to have a groups that are comparable across time. Think of this as having a pseudo panel. 
In terms of estimation, the syntax needs to be adjusted slighly:
::: {.panel-tabset}
## CSDID
If you are using `csdid`, the adjustment simply requires to drop `ivar` option. To make sure we are clustering at the correct level, one should use `cluster()` instead, using the pseudo panel identifier `state` as clustering variable:
```{stata}
csdid lemp , cluster(state) time(year) gvar(gvar) long2
```
Aggregations are obtained using `estat` post estimation commnads.
## JWDID-I
If you are using `jwdid`, there are two alternatives. The first one is to use `ivar()` with the new pseudo panel identifier (`state`).
```{stata}
jwdid lemp , ivar(state) time(year) gvar(gvar) never
```
Data will be clustered at the state level, explicitly accounting for State fixed effects.
## JWDID-II
Alternatively, one could change `ivar()` with `cluster()`, to ensure clustering is done at the correct level, and add the option `group`, so that cohort fixed effects are added to the specification.^[In my repository, the newer version of `jwdid` does this automatically if `ivar()` is not provided.]
```{stata}
jwdid lemp , cluster(state) time(year) gvar(gvar) never group
```
:::
The results across the different approaches will be different because of how the data is used. But, as you can see here, point estimates are broadly comparable. 
As far as I know, either option is correct.
## Conclusions
I hope this reference helps people who want to use `jwdid` and `csdid` for the estimation of DID models in the presence of repeated crossection.
If you have comments, or questions, do not hesitate to contact me.