c:\ado\personal
If you ever wanted to work with very large datasets, you will find that doing may be difficult, if not impossible, due to memory limitations, or at the very least time costly.
One option you may want to try when doing this, however, is to use the options if
in
and/or using varlists
.
For example, say that you want to work with the dataset oaxaxa
which is located in your working directory. You can load only a subset of the data typing:
use oaxaca if female ==1, clear
sum female age
(Excerpt from the Swiss Labor Market Survey 1998)
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
female | 888 1 0 1 1
age | 888 39.88401 10.72665 18 62
you could alternatively load only few variables in the dataset:
use female age educ using oaxaca , clear
sum *
(Excerpt from the Swiss Labor Market Survey 1998)
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
educ | 1,647 11.40134 2.374952 5 17.5
female | 1,647 .5391621 .4986154 0 1
age | 1,647 39.25379 11.03187 18 62
or use a combination of both. Now, if your dataset is really large, i suggest you to use in
, which will do the task much faster than using if
. Why? because using if
you still need to go over every observation of your dataset. However using in
only goes over the pre-specified number of observations.
use female age educ using oaxaca in 1/10, clear
list , sep(0)
(Excerpt from the Swiss Labor Market Survey 1998)
+---------------------+
| educ female age |
|---------------------|
1. | 9 1 37 |
2. | 9 0 62 |
3. | 10.5 1 40 |
4. | 12 0 55 |
5. | 12 0 36 |
6. | 10.5 0 31 |
7. | 10.5 0 50 |
8. | 17.5 0 42 |
9. | 17.5 0 36 |
10. | 10.5 0 30 |
+---------------------+