Loading Large Datasets

use in vs use if
Stata
tips
Author

Fernando Rios-Avila

Published

March 27, 2023

If you ever wanted to work with very large datasets, you will find that doing may be difficult, if not impossible, due to memory limitations, or at the very least time costly.

One option you may want to try when doing this, however, is to use the options if in and/or using varlists.

For example, say that you want to work with the dataset oaxaxa which is located in your working directory. You can load only a subset of the data typing:

c:\ado\personal
use oaxaca if female ==1, clear
sum female age
(Excerpt from the Swiss Labor Market Survey 1998)

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      female |        888           1           0          1          1
         age |        888    39.88401    10.72665         18         62

you could alternatively load only few variables in the dataset:

use female age educ using oaxaca , clear
sum *
(Excerpt from the Swiss Labor Market Survey 1998)

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        educ |      1,647    11.40134    2.374952          5       17.5
      female |      1,647    .5391621    .4986154          0          1
         age |      1,647    39.25379    11.03187         18         62

or use a combination of both. Now, if your dataset is really large, i suggest you to use in, which will do the task much faster than using if. Why? because using if you still need to go over every observation of your dataset. However using in only goes over the pre-specified number of observations.

use female age educ using oaxaca in 1/10, clear
list , sep(0)
(Excerpt from the Swiss Labor Market Survey 1998)

     +---------------------+
     | educ   female   age |
     |---------------------|
  1. |    9        1    37 |
  2. |    9        0    62 |
  3. | 10.5        1    40 |
  4. |   12        0    55 |
  5. |   12        0    36 |
  6. | 10.5        0    31 |
  7. | 10.5        0    50 |
  8. | 17.5        0    42 |
  9. | 17.5        0    36 |
 10. | 10.5        0    30 |
     +---------------------+