I have read this problem quite a few times in Statalist
as well as other forums to like Stackoverflow
. You try to do the right thing. Write and document a do-file
, to produce your results the same way every time, but, for some reason they key changing. The question is always Why?
Of course, there could be a large number of reasons you this is happening, but I will narrow them to the most common causes:
- randomness The first one is the most obvious. You may be trying to introduce in your program a procedure that relies on some random process. Bootstrap standard errors or creating of simulated data are the most common couses for this. Solution:
set seed
. - sort The second most common problem is because you have some statement in your
dofile
that relies on sorting data with ties. When this happens, the default solution inStata
is to randomly assign an order to observations with the same value, which may affect procedures that are order dependent (matching). This is not a problem when you are sorting fully identified data. But if ties exists, it may kick in. Solution:set sortseed
. Although one should be careful in analyzing if this is the true cause of replicability, or rather the next option - merging Most merging procedures in
Stata
are well defined, and won’t cause any problem: 1:m, m:1 and 1:1 will not create much of a problem. However, there is the infamousm:m
. As many other people may have said this before…you NEVER-EVER domerge m:m
. If you think you need to do it, think ofjointby
, becausemerge m:m
is always wrong. In fact, this may trigger a problem that may “seem” caused and solved as mentionted before, so be mindful. - Multicolinearity Something that occurs less often are problems related to multicolinearity. When two variables are collinear, and you are trying to estimate a linear or linear like model,
Stata
will try to fix the problem by dropping some observations before the estimation occurs. Most of the time, the excluded variable is the first one, but more often than not, the excluded variable would be at random. Solution: Revise your model specification.
If you have other reasons you have found to cause problems replicating your results, let me know, and we can add them to the list.