Code
get the data from my repository
** to ssc install frause, replace
clear frause mpdta,
Using CSDID
and JWDID
If you are reading this, you probably know quite well all the problems associated with the infamous TWFEDID especification. Nevertheless, its worth a quick recap, in case you are not aware of.
Before the new literature on DID, whenever people wanted to analyze treatment effects using a difference in differences approach, and had access to data with many periods and many individuals, they would tend to use a model specification that look like this:
\[y_{it} = \delta_i + \delta_t + \gamma \times trt + e_{it} \tag{1}\]
Where \(trt\) was a dummy variable that would take the value of 1 for treated units after the treatment was implemented, and zero otherwise. The idea was that all units with \(trt=0\) (those not yet treated) would be used used as controls. In this specification, one is also controlling for time fixed effects \(\delta_t\) as well as individual (or group) fixed effects \(\delta_i\).
This specification was considered a generalization of the simple 2x2 DID design:
\[ y = \delta_0 + \delta_1 post+ \delta_2 treat + \gamma \ (post\times treat) + e_{it} \]
Where \(\delta_i\) was the equivalent to \(\delta_1\), and \(\delta_2\) the equivalent for \(\delta_t\).
What many didn’t know at the time is that Equation 1 would provide correct identification of the Average treatment effect \(\gamma\), under strict assumptions:
This assumptions almost never hold. For example, the original treatment may become less effective few periods after they are implemented, and units treated later may experience stronger treatment effects than those treated earlier. When this happen, Equation 1 will provide incorrect results for three related reasons:
One of the most interesting consequences of this problem is that one may estimate negative treatment effects, even if all units in the sample had a theoreticaly possitive treatment effect.
Many paper published in 2021 including GoodmanBacon (2021), Callaway and Sant’Anna (2021), Sun and Abraham (2021), Wooldridge (2021) and Borusyak, Jaravel, and Spiess (2022), to name a few, identified this problem, and proposed similar solutions: Estimate ATTs allowing for cohort and timing heterogeneity, avoiding the use of already treated units as controls.
While these papers also provide their own estimators (in Stata
, R
, and in some cases python
), I will focus only on two solutions. The ones proposed by Callaway and Sant’Anna (2021) and Wooldridge (2021), which I programmed in Stata
using csdid[2]
, and jwdid
. Both approaches suggest to estimate heterogenous treatment effects based on when a unit is treated (its cohort \(G\)), and at what point in time one is interested in estimating that effect \(T\). We will call them the \(ATT(G,T)\). ^{1}
Lets assume you have access to Balanced Panel Data. In other words, you observe the same group of individuals across the same window of time. Different individuals are treated at different points in time, and some are not treated at all (never treated). This is the best case scenario, since you do not need to worry about identifying cohorts by treat timing.
Let’s see how to estimate this type of models using csdid
and jwdid
. They estimate the treatment effects using different strategies, but under specific cases, they will provide the same point estimates. To show how this is done, I will employ the toydataset used in Callaway and Sant’Anna (2021).
First let’s set things up, loading the data:
This dataset contains information at the county level on population size (lpop), employment (lemp), and a variable that indicates when a county instuted a change in minimum wage (first_treat). To estimate a DID model with heterogenous treatment effects, could use either jwdid
or csdid
, which are available from SSC.^{2}
Just one more step (not required but important if you did not identified the cohort
variable). Lets create a dummy that takes the value of 1 only after the treated unit is treated:
This is what Equation 1 is using. So how do I create the “treatment cohort” variable?. That can be easily done using one of csdid
subprograms. You just need to provide the group or panel identifier, and the time variable:
Its worth noting that the official commands xthdidregress
and hdidregress
do not require you to do this, because it automatically creates the gvar internally, based on the information provided. You can now create a small tabulation, as a sanity check, to verity the variable gvar
is correctly created:
year  0  2004  2006  2007  Total 

2003  309  20  40  131  500 
2004  309  20  40  131  500 
2005  309  20  40  131  500 
2006  309  20  40  131  500 
2007  309  20  40  131  500 
Total  1,545  100  200  655  2,500 
If all goes as expected, you a tabulation similar to the one you see here. You have a total of 2500 observations, but only 500 counties. 309 were not treated (had no changes in minimum wage between 2003 and 2007), 20 were treated in 2004, 40 in 2006 and 131 in 2007.
These numbers are what I call “effective sample size”, because csdid
(and implicitly jwdid
) use only this ammount of information when estimating the treatment effects. In other words, we should assume the sample size is 20 (for the purposes of model specification and controls).
By default, jwdid
uses the not yet treated as comparison group. So, to make the results comparable to csdid
, I will request using those “nevertreated” as comparison group. For csdid
, the default is using short gaps for pretreatment ATT’s, along with the never treated as comaprison group. To change that I will use long2
option. This will make the following comparable to the outcomes from standard Eventstudies estimations. The next lines should produce identical point estimates, with very similar standard errors.
............
Differenceindifference with Multiple Time Periods
Number of obs = 2,500
Outcome model : regression adjustment
Treatment model: none

 Coefficient Std. err. z P>z [95% conf. interval]
+
g2004 
t_2003_2004  .0105032 .023251 0.45 0.651 .0560744 .0350679
t_2003_2005  .0704232 .0309848 2.27 0.023 .1311522 .0096941
t_2003_2006  .1372587 .0364357 3.77 0.000 .2086713 .0658461
t_2003_2007  .1008114 .0343592 2.93 0.003 .1681542 .0334685
+
g2006 
t_2003_2005  .0037693 .031342 0.12 0.904 .0651985 .05766
t_2004_2005  .0027508 .0195586 0.14 0.888 .0355833 .0410849
t_2005_2006  .0045946 .0177552 0.26 0.796 .0393942 .0302049
t_2005_2007  .0412245 .0202292 2.04 0.042 .0808729 .001576
+
g2007 
t_2003_2006  .0033064 .0244519 0.14 0.892 .0446184 .0512311
t_2004_2006  .033813 .0211292 1.60 0.110 .0075994 .0752254
t_2005_2006  .0310871 .0178775 1.74 0.082 .0039522 .0661264
t_2006_2007  .0260544 .0166554 1.56 0.118 .0586985 .0065896

Control: Never Treated
See Callaway and Sant'Anna (2021) for details
WARNING: Singleton observations not dropped; statistical significance is biased
> (link)
(MWFE estimator converged in 2 iterations)
HDFE Linear regression Number of obs = 2,500
Absorbing 2 HDFE groups F( 12, 499) = 2.88
Statistics robust to heteroskedasticity Prob > F = 0.0008
Rsquared = 0.9933
Adj Rsquared = 0.9915
Within Rsq. = 0.0124
Number of clusters (countyreal) = 500 Root MSE = 0.1389
(Std. err. adjusted for 500 clusters in countyreal)

 Robust
lemp  Coefficient std. err. t P>t [95% conf. interval]
+
gvar#year#
c.__tr__ 
2004 2004  .0105032 .0233492 0.45 0.653 .0563781 .0353716
2004 2005  .0704232 .0311156 2.26 0.024 .1315568 .0092895
2004 2006  .1372587 .0365895 3.75 0.000 .2091472 .0653703
2004 2007  .1008114 .0345043 2.92 0.004 .1686029 .0330198
2006 2003  .0037693 .0314743 0.12 0.905 .0656078 .0580693
2006 2004  .0027508 .0196411 0.14 0.889 .0358387 .0413403
2006 2006  .0045946 .0178301 0.26 0.797 .039626 .0304368
2006 2007  .0412245 .0203146 2.03 0.043 .0811371 .0013118
2007 2003  .0033064 .0245551 0.13 0.893 .0449378 .0515505
2007 2004  .033813 .0212184 1.59 0.112 .0078753 .0755014
2007 2005  .0310871 .017953 1.73 0.084 .0041856 .0663599
2007 2007  .0260544 .0167257 1.56 0.120 .058916 .0068072

_cons  5.773609 .0033784 1708.97 0.000 5.766971 5.780247

Absorbed degrees of freedom:
+
Absorbed FE  Categories  Redundant = Num. Coefs 
+
countyreal  500 500 0 *
year  5 0 5 
+
* = FE nested within cluster; treated as redundant for DoF computation
You can of course add controls. In this case lpop
is a time constant variable, so it can be added to both commands. For us to obtain the same results, however, you need to add the method(reg)
to csdid
. Otherwise it will implement the double robust estimator dripw
(default)
............
Differenceindifference with Multiple Time Periods
Number of obs = 2,500
Outcome model : regression adjustment
Treatment model: none

 Coefficient Std. err. z P>z [95% conf. interval]
+
g2004 
t_2003_2004  .0149112 .0220557 0.68 0.499 .0581396 .0283171
t_2003_2005  .0769963 .0283597 2.71 0.007 .1325804 .0214122
t_2003_2006  .1410801 .0348363 4.05 0.000 .209358 .0728022
t_2003_2007  .1075443 .0327377 3.29 0.001 .171709 .0433796
+
g2006 
t_2003_2005  .0090343 .0300861 0.30 0.764 .0499333 .068002
t_2004_2005  .0069683 .0183458 0.38 0.704 .0289888 .0429254
t_2005_2006  .0007655 .0191959 0.04 0.968 .0368578 .0383888
t_2005_2007  .0415356 .0197169 2.11 0.035 .08018 .0028913
+
g2007 
t_2003_2006  .0068961 .0244888 0.28 0.778 .0411011 .0548933
t_2004_2006  .0332619 .0211607 1.57 0.116 .0082123 .0747362
t_2005_2006  .0285021 .0181321 1.57 0.116 .0070361 .0640403
t_2006_2007  .0287895 .0161679 1.78 0.075 .0604779 .0028989

Control: Never Treated
See Callaway and Sant'Anna (2021) for details
WARNING: Singleton observations not dropped; statistical significance is biased
> (link)
(MWFE estimator converged in 2 iterations)
note: 2007.year#c.lpop omitted because of collinearity
HDFE Linear regression Number of obs = 2,500
Absorbing 2 HDFE groups F( 28, 499) = 3.15
Statistics robust to heteroskedasticity Prob > F = 0.0000
Rsquared = 0.9934
Adj Rsquared = 0.9916
Within Rsq. = 0.0240
Number of clusters (countyreal) = 500 Root MSE = 0.1386
(Std. err. adjusted for 500 clusters in countyreal)

 Robust
lemp  Coefficient std. err. t P>t [95% conf. interval]
+
gvar#year#
c.__tr__ 
2004 2004  .0149112 .0222198 0.67 0.502 .0585672 .0287448
2004 2005  .0769963 .0277681 2.77 0.006 .1315531 .0224395
2004 2006  .1410801 .0322433 4.38 0.000 .2044295 .0777307
2004 2007  .1075443 .0328764 3.27 0.001 .1721375 .042951
2006 2003  .0090343 .0302224 0.30 0.765 .0503444 .0684131
2006 2004  .0069683 .0181921 0.38 0.702 .0287742 .0427108
2006 2006  .0007655 .0186329 0.04 0.967 .0358432 .0373742
2006 2007  .0415356 .0191982 2.16 0.031 .0792549 .0038164
2007 2003  .0068961 .0246543 0.28 0.780 .0415429 .0553351
2007 2004  .0332619 .0213008 1.56 0.119 .0085883 .0751122
2007 2005  .0285021 .0182653 1.56 0.119 .0073843 .0643885
2007 2007  .0287895 .0161312 1.78 0.075 .060483 .002904

gvar#year#
c.__tr__#
c._x_lpop 
2004 2004  .0005953 .0183556 0.03 0.974 .0354686 .0366592
2004 2005  .0234096 .0183749 1.27 0.203 .012692 .0595113
2004 2006  .0482261 .0224194 2.15 0.032 .004178 .0922742
2004 2007  .0091886 .0271423 0.34 0.735 .0441386 .0625158
2006 2003  .0126074 .0243335 0.52 0.605 .0604162 .0352014
2006 2004  .0177865 .0161892 1.10 0.272 .0495939 .0140208
2006 2006  .0282074 .0141213 2.00 0.046 .0004628 .055952
2006 2007  .0277793 .0180844 1.54 0.125 .0077516 .0633102
2007 2003  .0083787 .0254037 0.33 0.742 .0415327 .0582902
2007 2004  .0079105 .0188673 0.42 0.675 .0449797 .0291587
2007 2005  .0025825 .0178299 0.14 0.885 .0376135 .0324484
2007 2007  .0203637 .0162117 1.26 0.210 .0522153 .0114878

year#c.lpop 
2003  .0229821 .0129329 1.78 0.076 .0483918 .0024277
2004  .0079359 .0107757 0.74 0.462 .0291072 .0132355
2005  .0005453 .0097177 0.06 0.955 .019638 .0185474
2006  .0099382 .0090048 1.10 0.270 .0276302 .0077537
2007  0 (omitted)

_cons  5.800979 .0225656 257.07 0.000 5.756644 5.845315

Absorbed degrees of freedom:
+
Absorbed FE  Categories  Redundant = Num. Coefs 
+
countyreal  500 500 0 *
year  5 0 5 
+
* = FE nested within cluster; treated as redundant for DoF computation
In general, however, you may not be interested in analyzing each individual ATTGT, but rather some aggregated results. For example, overall average, or dynamic effects.^{3}. Prof. Wooldrige, however, also suggest to ajust standard errors to account for sampling variation on the covariates. To do so, I need to modify the syntax of jwdid
slighly, as well as the postestimation commands.
Average Treatment Effect on Treated

 Coefficient Std. err. z P>z [95% conf. interval]
+
ATT  .0419686 .0114448 3.67 0.000 .0644001 .0195372

ATT by Periods Before and After treatment
Event Study:Dynamic effects

 Coefficient Std. err. z P>z [95% conf. interval]
+
Pre_avg  .0193186 .0171138 1.13 0.259 .0142239 .052861
Post_avg  .0807817 .0187459 4.31 0.000 .1175229 .0440405
Tm4  .0068961 .0244888 0.28 0.778 .0411011 .0548933
Tm3  .0275947 .0180257 1.53 0.126 .0077351 .0629244
Tm2  .023465 .0144416 1.62 0.104 .0048402 .0517701
Tp0  .0211467 .011481 1.84 0.065 .0436492 .0013557
Tp1  .0533559 .0162931 3.27 0.001 .0852897 .021422
Tp2  .1410801 .0348363 4.05 0.000 .209358 .0728022
Tp3  .1075443 .0327377 3.29 0.001 .171709 .0433796

ATT by Calendar Period

 Coefficient Std. err. z P>z [95% conf. interval]
+
CAverage  .0445323 .014885 2.99 0.003 .0737064 .0153582
T2004  .0149112 .0220557 0.68 0.499 .0581396 .0283171
T2005  .0769963 .0283597 2.71 0.007 .1325804 .0214122
T2006  .0465164 .0209984 2.22 0.027 .0876724 .0053603
T2007  .0397054 .0128659 3.09 0.002 .064922 .0144888

ATT by group

 Coefficient Std. err. z P>z [95% conf. interval]
+
GAverage  .0329292 .0117872 2.79 0.005 .0560318 .0098267
G2004  .085133 .0242512 3.51 0.000 .1326645 .0376015
G2006  .0203851 .0174025 1.17 0.241 .0544933 .0137232
G2007  .0287895 .0161679 1.78 0.075 .0604779 .0028989

WARNING: Singleton observations not dropped; statistical significance is biased
> (link)

 Deltamethod
 Coefficient std. err. z P>z [95% conf. interval]
+
_at 
(2 vs 1)  .0419686 .0109096 3.85 0.000 .063351 .0205863


 Deltamethod
 Coefficient std. err. z P>z [95% conf. interval]
+
_at@
__event__ 
(2 vs 1) 4  .0068961 .0246543 0.28 0.780 .0414254 .0552177
(2 vs 1) 3  .0275947 .0181227 1.52 0.128 .0079251 .0631145
(2 vs 1) 2  .023465 .0145109 1.62 0.106 .0049758 .0519057
(2 vs 1) 1  0 (omitted)
(2 vs 1) 0  .0211467 .0113774 1.86 0.063 .043446 .0011525
(2 vs 1) 1  .0533559 .015752 3.39 0.001 .0842293 .0224824
(2 vs 1) 2  .1410801 .0322433 4.38 0.000 .2042759 .0778843
(2 vs 1) 3  .1075443 .0328764 3.27 0.001 .1719809 .0431077


 Deltamethod
 Coefficient std. err. z P>z [95% conf. interval]
+
_at@
__calendar__ 
(2 vs 1) 
2004  .0149112 .0222198 0.67 0.502 .0584613 .0286389
(2 vs 1) 
2005  .0769963 .0277681 2.77 0.006 .1314208 .0225719
(2 vs 1) 
2006  .0465164 .0184021 2.53 0.011 .0825839 .0104488
(2 vs 1) 
2007  .0397054 .0127042 3.13 0.002 .0646051 .0148057


 Deltamethod
 Coefficient std. err. z P>z [95% conf. interval]
+
_at@
__group__ 
(2 vs 1) 
2004  .085133 .0237216 3.59 0.000 .1316265 .0386395
(2 vs 1) 
2006  .0203851 .0167622 1.22 0.224 .0532384 .0124683
(2 vs 1) 
2007  .0287895 .0161312 1.78 0.074 .0604061 .0028271

(Std. err. adjusted for 500 clusters in countyreal)

 Unconditional
 Coefficient std. err. t P>t [95% conf. interval]
+
_at 
(2 vs 1)  .0419686 .0115467 3.63 0.000 .0646548 .0192824

(Std. err. adjusted for 500 clusters in countyreal)

 Unconditional
 Coefficient std. err. t P>t [95% conf. interval]
+
_at@
__event__ 
(2 vs 1) 4  .0068961 .0247069 0.28 0.780 .0416463 .0554385
(2 vs 1) 3  .0275947 .0181862 1.52 0.130 .0081364 .0633257
(2 vs 1) 2  .023465 .0145703 1.61 0.108 .0051617 .0520916
(2 vs 1) 1  0 (omitted)
(2 vs 1) 0  .0211467 .0115833 1.83 0.069 .0439048 .0016113
(2 vs 1) 1  .0533559 .0164382 3.25 0.001 .0856524 .0210593
(2 vs 1) 2  .1410801 .0351465 4.01 0.000 .2101335 .0720267
(2 vs 1) 3  .1075443 .0330292 3.26 0.001 .1724378 .0426508

(Std. err. adjusted for 500 clusters in countyreal)

 Unconditional
 Coefficient std. err. t P>t [95% conf. interval]
+
_at@
__calendar__ 
(2 vs 1) 
2004  .0149112 .0222521 0.67 0.503 .0586306 .0288081
(2 vs 1) 
2005  .0769963 .0286123 2.69 0.007 .1332117 .0207809
(2 vs 1) 
2006  .0465164 .0211854 2.20 0.029 .0881399 .0048928
(2 vs 1) 
2007  .0397054 .0129804 3.06 0.002 .0652084 .0142024

(Std. err. adjusted for 500 clusters in countyreal)

 Unconditional
 Coefficient std. err. t P>t [95% conf. interval]
+
_at@
__group__ 
(2 vs 1) 
2004  .085133 .0244672 3.48 0.001 .1332044 .0370616
(2 vs 1) 
2006  .0203851 .0175575 1.16 0.246 .0548807 .0141106
(2 vs 1) 
2007  .0287895 .0163118 1.76 0.078 .0608378 .0032589

Again, both showing identical point estimates with slighly different standard errors.
Repeated crossection operates slighly different than panel data. On the one hand, you do not observe the same units across time. Thus, there is no unit specific fixedeffect. Instead one would be trying to account for cohort specific effect, or treatmentlevel fixed effects. This may become clearer with an example.
Lets us first consider the same dataset as before. To simulate the situation of repeated crossection, lets assume that each round, data was collected for a random sample of counties within each State, and that, for some unknown reason, we cannot identify counties across time. Lets us also assume that the treatment was implemented at the State level (for clustering purposes). First some data preparation.
To simulate repeated crossection structure, I will drop 10% of the data, create a State id, and drop the county identifier.
(Written by R. )
(250 observations deleted)
Next is to create the gvar
. This will follow the same syntax as before, with the main difference that instead of using countyreal as ivar
, we will use state
. Its important that this variable represents the identifier of the level at which the treatment was implemented, and that we can follow up across time.
And as always, its a good idea to crosstabulate year with the new gvar:
year  0  2004  2006  2007  Total 

2003  279  19  35  114  447 
2004  286  19  34  117  456 
2005  279  18  38  121  456 
2006  275  18  40  115  448 
2007  275  18  36  114  443 
Total  1,474  93  193  615  2,375 
What you should observe, as shown here, is that you still have roughtly the same number of observations per gvar
across time. This is important because you need to have a groups that are comparable across time. Think of this as having a pseudo panel.
In terms of estimation, the syntax needs to be adjusted slighly:
If you are using csdid
, the adjustment simply requires to drop ivar
option. To make sure we are clustering at the correct level, one should use cluster()
instead, using the pseudo panel identifier state
as clustering variable:
............
Differenceindifference with Multiple Time Periods
Number of obs = 2,250
Outcome model : regression adjustment
Treatment model: none
(Std. err. adjusted for 29 clusters in state)

 Coefficient Std. err. z P>z [95% conf. interval]
+
g2004 
t_2003_2004  .0671177 .0420718 1.60 0.111 .0153416 .149577
t_2003_2005  .2982137 .0481335 6.20 0.000 .3925536 .2038737
t_2003_2006  .2056332 .0352891 5.83 0.000 .2747986 .1364679
t_2003_2007  .0657414 .0394828 1.67 0.096 .1431262 .0116435
+
g2006 
t_2003_2005  .0718644 .0669577 1.07 0.283 .0593703 .203099
t_2004_2005  .2762365 .1143842 2.41 0.016 .0520476 .5004253
t_2005_2006  .1252212 .0839351 1.49 0.136 .0392885 .2897309
t_2005_2007  .1471258 .0853201 1.72 0.085 .0200985 .3143501
+
g2007 
t_2003_2006  .0932773 .0789284 1.18 0.237 .0614195 .2479742
t_2004_2006  .124871 .0761301 1.64 0.101 .0243412 .2740832
t_2005_2006  .0352257 .0628217 0.56 0.575 .1583539 .0879025
t_2006_2007  .0136172 .0952995 0.14 0.886 .2004007 .1731663

Control: Never Treated
See Callaway and Sant'Anna (2021) for details
Aggregations are obtained using estat
post estimation commnads.
If you are using jwdid
, there are two alternatives. The first one is to use ivar()
with the new pseudo panel identifier (state
).
WARNING: Singleton observations not dropped; statistical significance is biased
> (link)
(MWFE estimator converged in 4 iterations)
warning: missing F statistic; dropped variables due to collinearity or too few
> clusters
HDFE Linear regression Number of obs = 2,250
Absorbing 2 HDFE groups F( 12, 28) = .
Statistics robust to heteroskedasticity Prob > F = .
Rsquared = 0.1797
Adj Rsquared = 0.1630
Within Rsq. = 0.0010
Number of clusters (state) = 29 Root MSE = 1.3796
(Std. err. adjusted for 29 clusters in state)

 Robust
lemp  Coefficient std. err. t P>t [95% conf. interval]
+
gvar#year#
c.__tr__ 
2004 2004  .0799107 .0377474 2.12 0.043 .0025886 .1572328
2004 2005  .2926293 .0448166 6.53 0.000 .3844319 .2008267
2004 2006  .1982852 .03159 6.28 0.000 .2629944 .1335761
2004 2007  .0582612 .0431331 1.35 0.188 .1466153 .0300929
2006 2003  .0498869 .068996 0.72 0.476 .0914451 .1912188
2006 2004  .2964946 .1219394 2.43 0.022 .046713 .5462761
2006 2006  .1220182 .0809493 1.51 0.143 .0437989 .2878353
2006 2007  .1401132 .0801401 1.75 0.091 .0240464 .3042727
2007 2003  .0118143 .0676839 0.17 0.863 .12683 .1504585
2007 2004  .0855261 .0903276 0.95 0.352 .0995017 .2705538
2007 2005  .0974496 .0709482 1.37 0.180 .2427804 .0478812
2007 2007  .0729594 .0980111 0.74 0.463 .2737261 .1278073

_cons  5.762268 .0141152 408.23 0.000 5.733355 5.791182

Absorbed degrees of freedom:
+
Absorbed FE  Categories  Redundant = Num. Coefs 
+
state  29 29 0 *
year  5 0 5 
+
* = FE nested within cluster; treated as redundant for DoF computation
Data will be clustered at the state level, explicitly accounting for State fixed effects.
Alternatively, one could change ivar()
with cluster()
, to ensure clustering is done at the correct level, and add the option group
, so that cohort fixed effects are added to the specification.^{4}
WARNING: Singleton observations not dropped; statistical significance is biased
> (link)
(MWFE estimator converged in 3 iterations)
warning: missing F statistic; dropped variables due to collinearity or too few
> clusters
HDFE Linear regression Number of obs = 2,250
Absorbing 2 HDFE groups F( 12, 28) = .
Statistics robust to heteroskedasticity Prob > F = .
Rsquared = 0.0298
Adj Rsquared = 0.0215
Within Rsq. = 0.0008
Number of clusters (state) = 29 Root MSE = 1.4916
(Std. err. adjusted for 29 clusters in state)

 Robust
lemp  Coefficient std. err. t P>t [95% conf. interval]
+
gvar#year#
c.__tr__ 
2004 2004  .0671177 .0429985 1.56 0.130 .0209608 .1551962
2004 2005  .2982137 .0491937 6.06 0.000 .3989825 .1974449
2004 2006  .2056332 .0360664 5.70 0.000 .2795119 .1317546
2004 2007  .0657414 .0403525 1.63 0.114 .1483996 .0169169
2006 2003  .0718644 .0684326 1.05 0.303 .0683133 .2120421
2006 2004  .2762365 .1169037 2.36 0.025 .0367701 .5157028
2006 2006  .1252212 .0857839 1.46 0.155 .0504991 .3009415
2006 2007  .1471258 .0871994 1.69 0.103 .0314941 .3257457
2007 2003  .0932773 .080667 1.16 0.257 .0719614 .2585161
2007 2004  .124871 .077807 1.60 0.120 .0345093 .2842514
2007 2005  .0352257 .0642054 0.55 0.588 .1667446 .0962931
2007 2007  .0136172 .0973986 0.14 0.890 .2131292 .1858948

_cons  5.749808 .1159179 49.60 0.000 5.512361 5.987255

Absorbed degrees of freedom:
+
Absorbed FE  Categories  Redundant = Num. Coefs 
+
gvar  4 0 4 
year  5 1 4 
+
The results across the different approaches will be different because of how the data is used. But, as you can see here, point estimates are broadly comparable.
As far as I know, either option is correct.
I hope this reference helps people who want to use jwdid
and csdid
for the estimation of DID models in the presence of repeated crossection.
If you have comments, or questions, do not hesitate to contact me.

title: "DID: Panel Data & Repeated Crossection"
subtitle: "Using `CSDID` and `JWDID`"
format: html
bibliography: references.bib

## First Things First
### DID with Multiple Periods and Time Heterogeneity
If you are reading this, you probably know quite well all the problems associated with the infamous TWFEDID especification. Nevertheless, its worth a quick recap, in case you are not aware of.
Before the new literature on DID, whenever people wanted to analyze treatment effects using a difference in differences approach, and had access to data with many periods and many individuals, they would tend to use a model specification that look like this:
$$y_{it} = \delta_i + \delta_t + \gamma \times trt + e_{it}
$${#eq1}
Where $trt$ was a dummy variable that would take the value of 1 for treated units after the treatment was implemented, and zero otherwise. The idea was that all units with $trt=0$ (those not yet treated) would be used used as controls. In this specification, one is also controlling for time fixed effects $\delta_t$ as well as individual (or group) fixed effects $\delta_i$.
This specification was considered a generalization of the simple 2x2 DID design:
$$
y = \delta_0 + \delta_1 post+ \delta_2 treat + \gamma \ (post\times treat) + e_{it}
$$
Where $\delta_i$ was the equivalent to $\delta_1$, and $\delta_2$ the equivalent for $\delta_t$.
What many didn't know at the time is that @eq1 would provide correct identification of the Average treatment effect $\gamma$, under strict assumptions:
 There is no treatment timing heterogeneity (All units are treated at the same time)
 If there is timing heterogeneity, the treatment effect is homogenous across time and across groups. ($\gamma$ is the same across time or across groups)
This assumptions almost never hold. For example, the original treatment may become less effective few periods after they are implemented, and units treated later may experience stronger treatment effects than those treated earlier. When this happen, @eq1 will provide incorrect results for three related reasons:
1. Linear regressions do not discriminate between good or bad controls when identifying coefficients. They simply exploit all possible varation in the data.
2. Already treated units would be implicitly used as "control units", when analyzing units treated at a later point in time.
3. Because of this, some treated units will receive "negative" weights, when estimating Average treatemnt effects.
One of the most interesting consequences of this problem is that one may estimate negative treatment effects, even if all units in the sample had a theoreticaly possitive treatment effect.
Many paper published in 2021 including @goodman_bacon_2021, @callaway_santanna_2021, @sun_estimating_2021, @wooldridge_twoway_2021 and @borusyak_revisiting_2022, to name a few, identified this problem, and proposed similar solutions: Estimate ATTs allowing for cohort and timing heterogeneity, avoiding the use of already treated units as controls.
While these papers also provide their own estimators (in `Stata`, `R`, and in some cases `python`), I will focus only on two solutions. The ones proposed by @callaway_santanna_2021 and @wooldridge_twoway_2021, which I programmed in `Stata` using `csdid[2]`, and `jwdid`. Both approaches suggest to estimate heterogenous treatment effects based on when a unit is treated (its cohort $G$), and at what point in time one is interested in estimating that effect $T$. We will call them the $ATT(G,T)$. ^[I will not provide details of the model specification in this post, but leave it for some future page on this site.]
## GxT DID (instead of 2x2 DID)
### Panel Data
Lets assume you have access to Balanced Panel Data. In other words, you observe the same group of individuals across the same window of time. Different individuals are treated at different points in time, and some are not treated at all (never treated). This is the best case scenario, since you do not need to worry about identifying cohorts by treat timing.
Let's see how to estimate this type of models using `csdid` and `jwdid`. They estimate the treatment effects using different strategies, but under specific cases, they will provide the same point estimates. To show how this is done, I will employ the toydataset used in @callaway_santanna_2021.
First let's set things up, loading the data:
```{stata}
* output: false
** to get the data from my repository
ssc install frause, replace
frause mpdta, clear
```
This dataset contains information at the county level on population size (lpop), employment (lemp), and a variable that indicates when a county instuted a change in minimum wage (first_treat). To estimate a DID model with heterogenous treatment effects, could use either `jwdid` or `csdid`, which are available from SSC.^[If you want to use `csdid2`, please check the blogpost on using "own installer", and try `fra install csdid2`.]
```{stata}
* output: false
** This is a dependency for csdid
ssc install drdid, replace
ssc install csdid, replace
ssc install jwdid, replace
```
Just one more step (not required but important if you did not identified the `cohort` variable). Lets create a dummy that takes the value of 1 only after the treated unit is treated:
```{stata}
gen trt = (first_treat<=year)*(first_treat>0)
```
This is what @eq1 is using. So how do I create the "*treatment cohort*" variable?. That can be easily done using one of `csdid` subprograms. You just need to provide the group or panel identifier, and the time variable:
```{stata}
egen gvar = csgvar(trt), ivar(countyreal) tvar(year)
```
Its worth noting that the official commands `xthdidregress` and `hdidregress` do not require you to do this, because it automatically creates the gvar internally, based on the information provided. You can now create a small tabulation, as a sanity check, to verity the variable `gvar` is correctly created:
```stata
tab year gvar
```
year\gvar  0  2004  2006  2007  Total

 2003  309  20  40  131  500 
 2004  309  20  40  131  500 
 2005  309  20  40  131  500 
 2006  309  20  40  131  500 
 2007  309  20  40  131  500 
 Total  1,545  100  200  655  2,500 
If all goes as expected, you a tabulation similar to the one you see here. You have a total of 2500 observations, but only 500 counties. 309 were not treated (had no changes in minimum wage between 2003 and 2007), 20 were treated in 2004, 40 in 2006 and 131 in 2007.
These numbers are what I call "effective sample size", because `csdid` (and implicitly `jwdid`) use only this ammount of information when estimating the treatment effects. In other words, we should assume the sample size is 20 (for the purposes of model specification and controls).
By default, `jwdid` uses the not yet treated as comparison group. So, to make the results comparable to `csdid`, I will request using those "nevertreated" as comparison group. For `csdid`, the default is using short gaps for pretreatment ATT's, along with the never treated as comaprison group. To change that I will use `long2` option. This will make the following comparable to the outcomes from standard Eventstudies estimations. The next lines should produce identical point estimates, with very similar standard errors.
::: {.paneltabset}
## CSDID
```{stata}
csdid lemp , ivar(county) time(year) gvar(gvar) long2
```
## JWDID
```{stata}
jwdid lemp , ivar(county) time(year) gvar(gvar) never
```
:::
You can of course add controls. In this case `lpop` is a time constant variable, so it can be added to both commands. For us to obtain the same results, however, you need to add the `method(reg)` to `csdid`. Otherwise it will implement the double robust estimator `dripw` (default)
::: {.paneltabset}
## CSDID
```{stata}
csdid lemp lpop , ivar(county) time(year) gvar(gvar) long2 method(reg)
```
## JWDID
```{stata}
jwdid lemp lpop, ivar(county) time(year) gvar(gvar) never
```
:::
In general, however, you may not be interested in analyzing each individual ATTGT, but rather some aggregated results. For example, overall average, or dynamic effects.^[also available `group` and `calendar`]. Prof. Wooldrige, however, also suggest to ajust standard errors to account for sampling variation on the covariates. To do so, I need to modify the syntax of `jwdid` slighly, as well as the postestimation commands.
::: {.paneltabset}
## CSDID
```{stata}
qui:csdid lemp lpop , ivar(county) time(year) gvar(gvar) long2 method(reg)
estat simple
estat event
estat calendar
estat group
```
## JWDID
```{stata}
qui:jwdid lemp lpop, ivar(county) time(year) gvar(gvar) never
estat simple
estat event
estat calendar
estat group
```
## JWDID: vce(unconditional)
```{stata}
qui:jwdid lemp lpop, ivar(county) time(year) gvar(gvar) never ///
method(regress) group
estat simple, vce(unconditional)
estat event, vce(unconditional)
estat calendar, vce(unconditional)
estat group, vce(unconditional)
```
:::
Again, both showing identical point estimates with slighly different standard errors.
## Repeated Crossection
Repeated crossection operates slighly different than panel data. On the one hand, you do not observe the same units across time. Thus, there is no *unit specific fixedeffect*. Instead one would be trying to account for cohort specific effect, or treatmentlevel fixed effects. This may become clearer with an example.
Lets us first consider the same dataset as before. To simulate the situation of repeated crossection, lets assume that each round, data was collected for a random sample of counties within each State, and that, for some unknown reason, we cannot identify counties across time. Lets us also assume that the treatment was implemented at the State level (for clustering purposes). First some data preparation.
To simulate repeated crossection structure, I will drop 10% of the data, create a ***State id***, and drop the county identifier.
```{stata}
** load and obtain trt
frause mpdta, clear
gen trt = (first_treat<=year)*(first_treat>0)
gen state = int(countyreal/1000)
** Randomly keep 90% of the data
set seed 1
sample 90
drop countyreal
```
Next is to create the `gvar`. This will follow the same syntax as before, with the main difference that instead of using **countyreal** as `ivar`, we will use `state`. Its important that this variable represents the identifier of the level at which the treatment was implemented, and that we can follow up across time.
```{stata}
egen gvar = csgvar(trt), ivar(state) tvar(year)
```
And as always, its a good idea to crosstabulate year with the new gvar:
```stata
tab year gvar
```
year\gvar  0  2004  2006  2007  Total

 2003  279  19  35  114  447 
 2004  286  19  34  117  456 
 2005  279  18  38  121  456 
 2006  275  18  40  115  448 
 2007  275  18  36  114  443 
 Total  1,474  93  193  615  2,375 
What you should observe, as shown here, is that you still have roughtly the same number of observations per `gvar` across time. This is important because you need to have a groups that are comparable across time. Think of this as having a pseudo panel.
In terms of estimation, the syntax needs to be adjusted slighly:
::: {.paneltabset}
## CSDID
If you are using `csdid`, the adjustment simply requires to drop `ivar` option. To make sure we are clustering at the correct level, one should use `cluster()` instead, using the pseudo panel identifier `state` as clustering variable:
```{stata}
csdid lemp , cluster(state) time(year) gvar(gvar) long2
```
Aggregations are obtained using `estat` post estimation commnads.
## JWDIDI
If you are using `jwdid`, there are two alternatives. The first one is to use `ivar()` with the new pseudo panel identifier (`state`).
```{stata}
jwdid lemp , ivar(state) time(year) gvar(gvar) never
```
Data will be clustered at the state level, explicitly accounting for State fixed effects.
## JWDIDII
Alternatively, one could change `ivar()` with `cluster()`, to ensure clustering is done at the correct level, and add the option `group`, so that cohort fixed effects are added to the specification.^[In my repository, the newer version of `jwdid` does this automatically if `ivar()` is not provided.]
```{stata}
jwdid lemp , cluster(state) time(year) gvar(gvar) never group
```
:::
The results across the different approaches will be different because of how the data is used. But, as you can see here, point estimates are broadly comparable.
As far as I know, either option is correct.
## Conclusions
I hope this reference helps people who want to use `jwdid` and `csdid` for the estimation of DID models in the presence of repeated crossection.
If you have comments, or questions, do not hesitate to contact me.