Hello everyone, So version 1.38 is out!.
Yes you may notice that this "versioning" number is different from the version "0.1" that is available to install using net install
. The reason for that is the rythm at which I have been programming, and adding options to the program.
As always, this should be a quick note on the for the current version for drdid
command for Stata. There are a few changes that occured (and have not yet documented). You probably already saw the post by Miklos, which gives provides instructions to download the package. So, you can either install it using those instractions, or just copy the files I provide here.
Since I'll be maintaining the package, both sites will have the same information. Except that here I provide a bit more of my flair. (artistic freedom)
So, after 1 week of rest (and vacation) I came back to add couple of extra options to drdid
. The now command produces both panel and repeated crossection estimators proposed in Sant'Anna and Zhao (2020), plus one done using teffects
: The Inverse Probability Weighting Augmented regression estimator (for panel data). While I have not included this on the helpfile yet (still need to fix some of its features), the command now allows you to use weights (which will be used as pweights), and can also produce the 1-step wild-bootstrap. For the case of the Wbootstrap, however, the command only produces CI based on normal distribution, and not based on the symetric t-stat as is used in the original DRDID
package.
I have also added the option to produce clustered standard errors, although only when using asymptotic standard errors. The wildbootstrap clustered SE are yet to be programmed.
You may also notice that the output from the drdid
has changed slighly. EP (who wishes to remain anonymous) has helped providing a new display for the command, aligning better with Stata
offical command style. He is also working on providing the option to obtain DRDID
estimates via gmm
. This approach is not yet implemented, but once its done, it will provide a richer set of options to estimate and standard errors, and may even allow you to use different explanatory variables in the outcome and probability model.
I have also added a better set of datachecks to make sure you have the data setup correctly. For instance, It will prompt you better error messages if you do not have a 2x2 DID structure. and will check if you do have a correct panel structure when you use the option ivar
. Before it wouldnt do that, and would give you an incorrect results if you have, say, multiple observations per ivar
and time
.
Finally, to protect data integrity, the influence functions are no longer "saved" or "stored" in the dataset, and neither is __dy__
. They are not created as temporary variables, and will only save the IF when requested using the options stub()
and/or replace
.
Alright, I hope this guide helps in using this command, at least until is properly released through SSC. Please if you find any bugs or difficulties, do not hesitate to contact me at friosa@gmail.com, or throught twitter!
For the replication exercise, I ll use the same dataset used in the example for the drdid command in R. Let's call it lalonde.dta.
use https://friosavila.github.io/playingwithstata/drdid/lalonde.dta, clear
Then, you may need to make sure to copy the files drdid.ado in your personal ado folder. For my pc, it's in C:\ado\personal
. You could also install the files typing:
net install drdid, from ("https://raw.githubusercontent.com/friosavila/csdid_drdid/v0.1/code")
The latter, however may not have a helpfile that is up to date.
I tried to keep the syntax of drdid relatively standard. At least standard with other commands I have worked before. So some of the options may change slightly, as other features are integrated.
The general syntax of the command is as follows
drdid depvar [indepvar] [if] [in] [iw], [ivar(varname)]
time(varname) tr(varname) [estimator] [wboot cluster(varname) stub(string) replace]
Here an explanation of all the pieces:
depvar
: is your dependent variable or outcome of interestindepvar
: are your independent variables, you may or may not have variables here. These variables will included in the outcome regression specification and the propensity score estimation.ivar
: is a variable that identifies the panel ID. If you drop this, the command will use repeated crossection estimators instead. If included, it will estimate the panel estimators.time
: identifies the time variable (for example year). It does not matter if the periods are contiguous or not. However, its important that you have only 2 values in time
for the working sample. The earlier period will be used as pre
, whereas the later period will be used as `post'.tr
: is the treatment variable. It does not matter what values you use, as long as there are only two values in the used sample. Observations with lower values are the control group
, whereas observations with the higher values are the treated groups
.estimator
is used to indicate which estimator you want to use. Below the list of all that is available:
drimp
Estimates the DR improved estimator. If you add rc1
it provides you with the alternative estimator (that is not locally efficient)dripw
Estimates the DR IPW estimator. You can also use rc1
to provide the alternative (not locally efficient) estimator.reg
Estimates the Outcome regression estimator.stdipw
Estimates the Standard IPW estimator.ipw
Estimates the estimator similar to Abadies (2005)ipwra
Estimates the IPW regression augmented estimator. This is estimated through Stata
command teffects ipwra
. Not available for Repeated Crossectionall
Provides all estimators. Perhaps you may find it useful for robustness. (I did it for replication). This results cannot be used to make comparison across estimators.wboot
: When used, the command will estimate the 1-step wild-bootstrap (or multiplier bootstrap), with 999 repetions. It currently implements Mammen(1993) approach. This does not work after all
or after `ipwra'.cluster
: When used, the command will estimate the clustered standard errors based. This option does not yet work in combination with boot
stub
: When used, the recentered influence function will be stored in the dataset with the name stub
att
. If the variable already exists, you can request to overwrite it using replace
. This option does not work when one requests all estimators, or when ipwra
estimator is requested.So, how does this work? Lets start with the lanlonde dataset, and the Panel estimators. For a quick application, I ll use the all
option:
drdid re age educ black married nodegree hisp re74 if treated==0 | sample==2 , ivar(id) time(year) tr( experimental ) all
and if everything went well, you should be seeing this:
Doubly robust difference-in-differences estimator summary
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
dripw | -871.3271 396.0211 -2.20 0.028 -1647.514 -95.14007
drimp | -901.2703 393.6127 -2.29 0.022 -1672.737 -129.8037
reg | -1300.645 349.8259 -3.72 0.000 -1986.291 -614.9985
ipw | -1107.872 408.6127 -2.71 0.007 -1908.738 -307.0058
stdipw | -1021.609 397.5201 -2.57 0.010 -1800.734 -242.4845
sipwra | -908.2912 393.8673 -2.31 0.021 -1680.257 -136.3255
------------------------------------------------------------------------------
Note: This table is provided for comparison across estimations only. You cannot use them to compare across estimates across different estimators
dripw :Doubly Robust IPW
drimp :Doubly Robust Improved estimator
reg :Outcome regression or Regression augmented estimator
ipw :Abadie(2005) IPW estimator
stdipw:Standardized IPW estimator
sipwra:IPW and Regression adjustment estimator.
We could also estimate the implement the repeated crossection estimators to this data, by simply excluding ivar()
option. This could also be applied, for example, if you have access to unbalanced panel data.
Perhaps the main problem with this option is that the estimations will be less efficient, with larger standard errors. The loss of efficiency is caused because we are now ignoring an important piece of information: the panel id.
. drdid re age educ black married nodegree hisp re74 if treated==0 | sample==2 , time(year) tr( experimental ) all
Doubly robust difference-in-differences estimator summary
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
dripw | -871.3271 410.2751 -2.12 0.034 -1675.452 -67.20268
dripw_rc1 | -871.3271 435.0661 -2.00 0.045 -1724.041 -18.61332
drimp | -901.2703 408.3107 -2.21 0.027 -1701.545 -100.996
drimp_rc1 | -901.2703 434.3043 -2.08 0.038 -1752.491 -50.04954
reg | -1300.645 418.5023 -3.11 0.002 -2120.894 -480.3951
ipw | -1107.872 619.4393 -1.79 0.074 -2321.951 106.2068
stdipw | -1021.609 495.464 -2.06 0.039 -1992.701 -50.51783
------------------------------------------------------------------------------
Note: This table is provided for comparison across estimations only. You cannot use it to compare estimates across different estimators
dripw :Doubly Robust IPW
drimp :Doubly Robust Improved estimator
reg :Outcome regression or Regression augmented estimator
ipw :Abadie(2005) IPW estimator
stdipw:Standardized IPW estimator
sipwra:IPW and Regression adjustment estimator.
It is possible, however, to apply clustered standard errors using the panel id. This is useful if you have unbalanced data, and want to keep all observations in the sample. In this case, it will be the equivalent to using ivar
, because the data is fully balanced.
. drdid re age educ black married nodegree hisp re74 if treated==0 | sample==2 , time(year) tr( experimental ) all cluster(id)
Doubly robust difference-in-differences estimator summary
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
dripw | -871.3271 396.1635 -2.20 0.028 -1647.793 -94.86092
dripw_rc1 | -871.3271 396.1635 -2.20 0.028 -1647.793 -94.86092
drimp | -901.2703 393.6127 -2.29 0.022 -1672.737 -129.8037
drimp_rc1 | -901.2703 393.6127 -2.29 0.022 -1672.737 -129.8037
reg | -1300.645 349.8259 -3.72 0.000 -1986.291 -614.9985
ipw | -1107.872 408.6127 -2.71 0.007 -1908.738 -307.0058
stdipw | -1021.609 397.5201 -2.57 0.010 -1800.734 -242.4845
------------------------------------------------------------------------------
Note: This table is provided for comparison across estimations only. You cannot use it to compare estimates across different estimators
dripw :Doubly Robust IPW
drimp :Doubly Robust Improved estimator
reg :Outcome regression or Regression augmented estimator
ipw :Abadie(2005) IPW estimator
stdipw:Standardized IPW estimator
sipwra:IPW and Regression adjustment estimator.
We can also do the same using the simulated dataset provided also in R's DRDID
. This to show how it estimates the model assuming repeated crossection.
use https://friosavila.github.io/playingwithstata/drdid/sim_rc.dta, clear
And let me use the same syntax as before, to obtain all estimates at the same time. Notice that I'm not including the "ivar" indicator. So the command uses the repeated crossection estimators:
. drdid y x1 x2 x3 x4, time(post) tr( d) all
Doubly robust difference-in-differences estimator summary
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
dripw | -.1677954 .2007987 -0.84 0.403 -.5613537 .2257628
dripw_rc1 | -3.633433 3.105569 -1.17 0.242 -9.720237 2.45337
drimp | -.2097772 .1978384 -1.06 0.289 -.5975333 .1779789
drimp_rc1 | -3.683518 3.112235 -1.18 0.237 -9.783387 2.416351
reg | -8.790978 7.774585 -1.13 0.258 -24.02888 6.446929
ipw | -19.8933 53.84128 -0.37 0.712 -125.4203 85.63367
stdipw | -15.80331 9.083384 -1.74 0.082 -33.60641 1.9998
------------------------------------------------------------------------------
Note: This table is provided for comparison across estimations only. You cannot use it to compare estimates across different estimators
dripw :Doubly Robust IPW
drimp :Doubly Robust Improved estimator
reg :Outcome regression or Regression augmented estimator
ipw :Abadie(2005) IPW estimator
stdipw:Standardized IPW estimator
sipwra:IPW and Regression adjustment estimator.
What about Wild bootstrap? You can request WB SE but if you do that, you need to request specific estimator. Below an example for dripw:
. drdid y x1 x2 x3 x4, time(post) tr( d) dripw
Doubly robust difference-in-differences Number of obs = 1,000
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
d |
(1 vs 0) | -.1677954 .2007987 -0.84 0.403 -.5613537 .2257628
------------------------------------------------------------------------------
. drdid y x1 x2 x3 x4, time(post) tr( d) dripw boot
Doubly robust difference-in-differences Number of obs = 1,000
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
d |
(1 vs 0) | -.1677954 .2060326 -0.81 0.415 -.571612 .2360211
------------------------------------------------------------------------------
And that is it!. Please, if you find any bugs or encounter any problems. Let me know. So, what is next?
Yes, CSDID
is here!.
This took longer, just because I was a bit burned with the first week of DRDID
coding. Even having the base code (Thank you Pedro!) it was hard to understand how each moving piece moved. And, if you do not understand how it all moves, you cannot move forward.
In any case...Yes, It clicked again, and we have a beta version!
I used to add information about CSDID
here, but since it has just been completed, I'll create its own page!
Please use the menu on the right, and go to the new corresponding page.
DRDID
?drdid
is basically done. Last thing missing will be WB with clustered stadnard errors, and making sure CLusters are nested within panel id. That will be next.gmm
estimators to drdid
. It will just be to provide you with more options for analysis and report. He also provided a new Display. function.If you have comments or questions, please, let me know!