Renaming variables in Bulk

I describe an extended option from rename to easily rename your variabes
Stata
Tips
Author

Fernando Rios-Avila and Fahad Mirza

Published

June 2, 2023

Aknowledgements

This tip was brought to you by Fahad Mirza. Its one of those little things I have found useful, but usually forget, and have to look for it all over again.

Luckily, I know have my own site, where I can save and store this information! I give, however, total creadit to Fahad.

The problem

The problem is simple. Some times, you may have a series of variables with somewhat unappealing names. I particuarly dislike names that are too long. While some people like having descriptive variable names, I find it particularly distracting.

My preference is to have variables with good labels, and/or good value labels, whenever necessary. For variables themselves, I like short names, that are descriptive, however, I also like to have them labeled sequentially!.

How do we do that?

Obviously, the first approach is to go one by one. In fact, not too many Stata's ago, that was the only option. That particular task could have been done using loops as follows:

sysuse auto, clear
* This loop iterates over all variable names in the dataset
foreach i of varlist * {
    local j = `j'+1
    ren `i' var_`j'
}
describe *
(1978 automobile data)

Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
var_1           str18   %-18s                 Make and model
var_2           int     %8.0gc                Price
var_3           int     %8.0g                 Mileage (mpg)
var_4           int     %8.0g                 Repair record 1978
var_5           float   %6.1f                 Headroom (in.)
var_6           int     %8.0g                 Trunk space (cu. ft.)
var_7           int     %8.0gc                Weight (lbs.)
var_8           int     %8.0g                 Length (in.)
var_9           int     %8.0g                 Turn circle (ft.)
var_10          int     %8.0g                 Displacement (cu. in.)
var_11          float   %6.2f                 Gear ratio
var_12          byte    %8.0g      origin     Car origin

There is a better way

While the process above is rather simple, there is a better way of doing this, as Fahad suggests. That is using some of the extended options of rename.

Lets first replicate the code above, using the code that Fahad suggested.

sysuse auto, clear
ren (*) (var_#), addnumber
describe
(1978 automobile data)

Contains data from C:\Program Files\Stata17/ado\base/a/auto.dta
 Observations:            74                  1978 automobile data
    Variables:            12                  13 Apr 2020 17:45
                                              (_dta has notes)
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
var_1           str18   %-18s                 Make and model
var_2           int     %8.0gc                Price
var_3           int     %8.0g                 Mileage (mpg)
var_4           int     %8.0g                 Repair record 1978
var_5           float   %6.1f                 Headroom (in.)
var_6           int     %8.0g                 Trunk space (cu. ft.)
var_7           int     %8.0gc                Weight (lbs.)
var_8           int     %8.0g                 Length (in.)
var_9           int     %8.0g                 Turn circle (ft.)
var_10          int     %8.0g                 Displacement (cu. in.)
var_11          float   %6.2f                 Gear ratio
var_12          byte    %8.0g      origin     Car origin
-------------------------------------------------------------------------------
Sorted by: var_12
     Note: Dataset has changed since last saved.

This is a much shorter, and cleaner code. What it does is take all variables within the first parenthesis to be rename using the instructions of the second set. Of course rename has quite few other options that you may find useful. Just type help rename group, to see all other extended options.

Before ending this tip. Something else you may find useful. You can use the option dryrun. Doing this none of the variable names will change, but instead you will see a report of how variable names will change after the command is executed.

sysuse auto, clear
ren (*) (var_#), addnumber dryrun
(1978 automobile data)

       oldname | newname
  -------------+-------------
          make | var_1
         price | var_2
           mpg | var_3
         rep78 | var_4
      headroom | var_5
         trunk | var_6
        weight | var_7
        length | var_8
          turn | var_9
  displacement | var_10
    gear_ratio | var_11
       foreign | var_12
  ---------------------------