// Load sample data
sysuse auto, clear
// Create summary statistics table
summarize price mpg weight length
estpost est sto summary_stats
using summary_stats.txt, ///
esttab summary_stats "mean(fmt(2)) sd(fmt(2)) min max") ///
cells(replace label nomtitle nonumber md
Continuing this quick series of guides with Stata. Something we all need. Tables! Tables are a fundamental part of presenting research results, and there is a community-contributed command that makes it easy to create publication-ready tables in Stata: estout/esttab/estpost
. These command come to you thanks to Ben Jann. They allow you to generate tables from estimation results, matrices, in a format suitable for publication, including HTML, LaTeX and Markdown.
Of course, for this you need a quick setup:
// Install estout package
ssc install estout
Summary Statistics
One of the most common tasks in data analysis is to generate summary statistics. We all know that the easiest way to do this is to use the summarize
command. However, making statics tables from the command is not straightforward. However, with estpost
, you can easily generate summary statistics tables.
Let’s start with creating a table of summary statistics. For this example, we will use the auto
dataset that comes with Stata, and output the tables in markdown format. The output you will see is based on that format.
mean | sd | min | max | |
---|---|---|---|---|
Price | 6165.26 | 2949.50 | 3291.00 | 15906.00 |
Mileage (mpg) | 21.30 | 5.79 | 12.00 | 41.00 |
Weight (lbs.) | 3019.46 | 777.19 | 1760.00 | 4840.00 |
Length (in.) | 187.93 | 22.27 | 142.00 | 233.00 |
Observations | 74 |
Data source: Auto dataset
So how does this work?
estpost
catches the output of the summarize
command and stores into many e() matrices, all part of a new “summary_stats”.
Then esttab
formats the content to create a nice table. The cells()
option specifies the contents of each table. It should use the “matrices” saved by estpost
. The nomtitle
option removes the title of the table. The nonumber
option removes the row numbers. The md
option specifies that the output format is markdown, but other options are possible. The replace
option overwrites the file if it already exists. The label
option uses variable labels instead of names. Within the cells()
option, name(fmt(???))
specifies the format of the cell content.
Notice that if “statistic” are specified within quotes, they will be posted side by side. If they are not, they will be posted one below the other.
Advanced Summary Statistics
You can also generate more advanced summary statistics tables, including grouped statistics and custom formatting. Here’s an example:
// Load sample data
webuse nlsw88, clear
// Calculate summary statistics by occupation category
tabstat wage age tenure, by(race) statistics(mean sd min max n) columns(statistics)
estpost est sto advanced_summary
// Create advanced summary statistics table
using advanced_summary.txt, ///
esttab advanced_summary "mean(fmt(2)) sd(fmt(2)) min(fmt(1)) max(fmt(1)) count(fmt(0))") ///
cells(noobs nonumber nomtitle ///
"Mean" "Std. Dev." "Min" "Max" "N") ///
collabels("White" "Black" "Other") ///
eqlabels(" Hourly Wage" age " Age" tenure " Job Tenure") ///
varlabels(wage r) width(20) ///
alignment(replace noline md
Mean | Std. Dev. | Min | Max | N | |
---|---|---|---|---|---|
White | |||||
Hourly Wage | 8.08 | 5.96 | 1.0 | 40.2 | 1637 |
Age | 39.27 | 3.08 | 34.0 | 46.0 | 1637 |
Job Tenure | 5.81 | 5.46 | 0.0 | 25.9 | 1627 |
Black | |||||
Hourly Wage | 6.84 | 5.08 | 1.2 | 40.7 | 583 |
Age | 38.81 | 2.98 | 34.0 | 45.0 | 583 |
Job Tenure | 6.50 | 5.62 | 0.0 | 24.8 | 578 |
Other | |||||
Hourly Wage | 8.55 | 5.21 | 1.8 | 25.8 | 26 |
Age | 39.31 | 3.25 | 34.0 | 44.0 | 26 |
Job Tenure | 4.95 | 5.24 | 0.2 | 21.2 | 26 |
Total | |||||
Hourly Wage | 7.77 | 5.76 | 1.0 | 40.7 | 2246 |
Age | 39.15 | 3.06 | 34.0 | 46.0 | 2246 |
Job Tenure | 5.98 | 5.51 | 0.0 | 25.9 | 2231 |
Data source: NLSW 1988
In this example, notice that I use varlabels to provide custom labels for the variables. Because I want to use markdown for the output, I use
to add spaces, and add some hierarchy on the tables. I also use eqlabels to provide custom labels for the equation names (in this case, race categories). The alignment(r) option right-aligns all cells, althought it does not have an effect on md. The noline option removes the horizontal lines between rows.
Regression Table
Next is the most common excercise we would do in Stata: regression analysis. Here’s how you can create a table with regression results:
// Load sample data
sysuse auto, clear
regress price weight mpg
estimates store model1
regress price weight mpg foreign
estimates store model2
// Create regression table
using regression_results.txt, ///
esttab model1 model2 star(* 0.10 ** 0.05 *** 0.01) ///
b(3) se(3) replace note("") noline md ///
r2 ar2 nonumber "Model 1" "Model 2") mtitle(
Model 1 | Model 2 | |
---|---|---|
weight | 1.747*** | 3.465*** |
(0.641) | (0.631) | |
mpg | -49.512 | 21.854 |
(86.156) | (74.221) | |
foreign | 3673.060*** | |
(683.978) | ||
_cons | 1946.069 | -5853.696* |
(3597.050) | (3376.987) | |
N | 74 | 74 |
R2 | 0.293 | 0.500 |
adj. R2 | 0.273 | 0.478 |
* p < 0.10, ** p < 0.05, *** p < 0.01
Data source: Auto, Standard errors in parentheses, * p<0.10, ** p<0.05, *** p<0.01
This code is more straight forward. After the regressions are estimated, and stored with est sto
, the esttab
command is used to create the table. The b(3)
and se(3)
options specify that the coefficients and standard errors should be displayed with 3 decimal places. The star()
option specifies the significance levels for the stars. The r2 ar2
options include the R-squared and adjusted R-squared statistics. The nonumber
option removes the row numbers. This is necessary for markdown
tables, because markdown only allows for a single title row.
Correlation Matrix
Correlation matrices are also common in data analysis. They are easy enough to create with correlate
. Making them into a table can also be easy:
// Generate correlation matrix
weight length, matrix
estpost correlate price mpg est sto corr_matrix
// Create correlation table
using correlation_matrix.txt, ///
esttab corr_matrix "rho(fmt(3))") replace nonumber collabels(none) ///
cell(label md nomtitle noline unstack
Price | Mileage (mpg) | Weight (lbs.) | Length (in.) | |
---|---|---|---|---|
Price | 1.000 | |||
Mileage (mpg) | -0.469 | 1.000 | ||
Weight (lbs.) | 0.539 | -0.807 | 1.000 | |
Length (in.) | 0.432 | -0.796 | 0.946 | 1.000 |
Observations | 74 |
Data source: Auto
This creates a correlation matrix with formatted coefficients. In contrast with correlate
, if using estatpost
, its necessary to also use the option matrix
.
In case of esttab
, it is necessary to use unstuck
. But other than that, obtaining the tables is quite straightforward.
Regressions with Fixed Effects
Now something that is very common in regression analysis: fixed effects models. A usual question is how to create tables that signal the inclusion of fixed effects. Here’s an example:
// Load panel data and set as panel
webuse nlswork, clear
year
xtset idcode // ssc install reghdfe
// Run regressions
reghdfe ln_wage tenure c.age##c.age est sto no_fe
reghdfe ln_wage tenure c.age##c.age , absorb(idcode)est sto id_fe
local id_fe "X"
estadd
year)
reghdfe ln_wage tenure c.age##c.age , absorb(idcode est sto idyr_fe
local id_fe "X"
estadd local yr_fe "X"
estadd
// Create table
using mreg.txt, ///
esttab no_fe id_fe idyr_fe scalar("id_fe Individual FE" "yr_fe Yr FE") ///
nonotes nomtitle replace noline md
(1) | (2) | (3) | |
---|---|---|---|
tenure | 0.0391*** | 0.0217*** | 0.0214*** |
(50.48) | (27.21) | (26.88) | |
age | 0.0752*** | 0.0523*** | 0.0750*** |
(21.65) | (18.78) | (7.03) | |
c.age#c.age | -0.00109*** | -0.000672*** | -0.00107*** |
(-18.86) | (-14.56) | (-17.74) | |
_cons | 0.334*** | 0.688*** | 0.384 |
(6.62) | (16.95) | (1.29) | |
N | 28101 | 27549 | 27549 |
Individual FE | X | X | |
Yr FE | X |
Data source: NLSW 1988, * p < 0.05, ** p < 0.01, *** p < 0.001
This example shows how to use estadd
to include additional information to a regression. This information can be later used in the table. In this case, the fixed effects are included in the table. The scalar
option is used to include the extra information.
Conclusion
I think these are the most basic tables you can create with esttab
. There are many more options available, and you can customize the tables to your needs. Using Quarto, something I am doing more often, is to create the tables in Stata, modify the latex code, and then include them in the final document. This is a great way to create reproducible research.