Using Quarto for Stata dynamic documents

Introduction

This article is a simple example of how to use Quarto to create dynamic documents using Stata. Which is possible using the Stata kernel nbstata thanks to Tim Huegerich, who put this incredible tool together.

To use this, you will need:

  • Stata 17 or higher
  • python
  • nbstata

Installation

For now lets assume that you have Stata and Python installed, and working together, so that you can use Stata from Python. If this is working, installing nbstata is as simple as typing the following from the command window:

pip install nbstata
python -m nbstata.install 

# or to update 
pip install nbstata --upgrade

You probably do not need --conf-file, but if you want to apply other options to Stata output, you may beed to create the config file, and modified as needed.

Quarto Setup

While this page is being rendered as a quarto website project, you can also use the following YAML heading, once you installed Python and nbstata in your computer. You will also need Quarto (if using say VSC) or one of the latests iterations of Rstudio.

---
title: Using Quarto for Stata dynamic documents
format: html
jupyter: nbstata
---

This request using the nbstata as the kernel, allowing Quarto to use Python to call on Stata.

Lets do this:

Let us consider an example where we study the mpg and weight variables in auto.dta. In this example, all code used to construct the desired output will be displayed as fenced block code, followed by the output it produces. This is done using the option *|echo: fenced.

We first use the sysuse command to load the dataset and then describe the data using the describe command.

Code
```{stata}
sysuse auto, clear
describe
```
(1978 automobile data)

Contains data from C:\Program Files\Stata17/ado\base/a/auto.dta
 Observations:            74                  1978 automobile data
    Variables:            12                  13 Apr 2020 17:45
                                              (_dta has notes)
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear ratio
foreign         byte    %8.0g      origin     Car origin
-------------------------------------------------------------------------------
Sorted by: foreign

Now, we want to check if mpg is always greater than 0 and less than 100. We use the assert command to perform the check. In this case, we do not want to include any output in the target HTML file, so we use the quietly attribute to modify the behavior of the dd_do Stata dynamic tag.

Code
```{stata}
 assert mpg > 0 & mpg < 100
```

If the data do not satisfy the conditions, quatro will fail with an error message, which will occur if we run the same assert command in a do-file.

Next, we want to summarize the weight variable:

Code
```{stata}
summarize weight
```

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      weight |         74    3019.459    777.1936       1760       4840

We want to use the minimum and maximum values of weight in a sentence. Instead of copying and pasting the numbers from the summarize output, we can use the display Stata to show r(min) and r(max) stored results. We will also use the options *| output: asis to obtain text that follows markdown formatting.

Code
```{stata}
*| output: asis
display "The variable weight has minimum value "  %4.2f `r(min)' " and " ///
         "has maximum value "   %4.2f `r(max)' "."
```

The variable weight has minimum value 1760.00 and has maximum value 4840.00.

In other words, if one wants to use dynamic tags, its possible to do so by simply using display, with the corresponding locals, so Stata evaluates the expressions as normal. *| output: asis is used so the output can still be interpreted using markdown syntax.

As with dyndoc, display can also be used as a calculator. For example, if we want to include the \(range = max - min\) in a sentence, instead of calculating the number and then copying and pasting it, we can use

Code
```{stata}
*| output: asis
display "The variable weight has range "  %4.2f `r(max)'-`r(min)' "."
```

The variable weight has range 3080.00.

Now, we want to graph mpg and weight using a scatterplot. There are at least two ways to do this.

First, one can simply create the scatterplot using the same procedure as before:

Code
```{stata}
scatter mpg weight, mcolor(blue%50)
```

which generates a scatterplot of mpg and weight with 50% opacity color markers.

Now, we want to export the graph to a file and include an image link to the file.

Code
```{stata}
qui:graph export fig1.png, width(1600) replace
```

This produces a graph of 1600 pixels width.

scatter

It is possible, however, to combine figure creation using quatro tags and directives. Here, however, you need to make sure all figures are named:

Code
```{stata}
*| label: fig-cost
*| fig-cap: Price vs MPG
*| fig-subcap:
*|   - Foreign Cars
*|   - Domestic Cars
*| layout-ncol: 2
*| column: page

scatter price mpg if foreign==1, name(m1, replace) ylabel(0(4000)16000)
qui:graph export fig2a.png, width(1600) replace
scatter price mpg if foreign==0, name(m2, replace) ylabel(0(4000)16000)
qui:graph export fig2b.png, width(1600) replace
```
(a) Foreign Cars
(b) Domestic Cars
Figure 1: Price vs MPG

And of course, we can now the figure tags to link it to the text:

Figure 1 provides a simple scatter between prices and MPG for foreign and domestic cars. While there seems to be a strong negative relationship between these variables among foreign cars (see Figure 1 (a)), the relationship among domestic cars is much weaker, when looking at cars with a fuel efficiency larger than 15mpg (see Figure 1 (b)).

The last approach, however, may not work with PDF format, or jupyter-notebook format, unless the figures are saved. But does seem to work with HTML and docx.

Nevertheless, one could also do the following:

::: {#fig-mpgprice layout-ncol=2 .column-page }
![foreign](fig2a.png){#fig-mpgprice-1}
![domestic](fig2b.png){#fig-mpgprice-2}
Price vs MPG
:::

to produce

(a) foreign
(b) domestic
Figure 2: Price vs MPG

As plot in Figure 2 provides a simple scatter between prices and MPG for foreign and domestic cars. While there seems to be a strong negative relationship between these variables among foreign cars (see Figure 2 (a)), the relationship among domestic cars is much weaker, when looking at cars with a fuel efficiency larger than 15mpg (see Figure 2 (b)).

If using VScode, to render all formats at once, you need to type quarto render filename.qmd in the terminal.