Export results to Tableau or Excel from Stata

How to arrange Stata results in long (tidy) format using matrices or postfiles so they can be directly imported by Tableau or Excel
Author

R. Andrés Castañeda

Published

August 1, 2018

This post is part of an updated version of the chapters of the third part of the handbook Poverty and inequality measures in practice internally available at the World Bank intranet.

Even though you might be able to do most of your work in Stata, it is sometimes faster and easier to plot some charts in Tableau or Excel. In contrast to other programs like R, Stata does not have an Integrated Development Environment (IDE) like RStudio or a big community to create amazing packages like shinyR, diagramR, or ggplot2 that may suffice for any kind of graphic representation of results.

Thus, the objective of this post is to show you how to arrange your data in order to have it ready to be imported by Tableau or Excel and then make your pivot charts or dashboards for graphical representation of your data. If you apply these techniques correctly, you won’t have to change anything on the file that you export.

Long-shape formatting vs wide-shape formatting

Before you learn the “how-to” of exporting results, you need to understand the “what” of data formatting. That is, before you start programming, you need to think about the final output.

First, you need to know what you want to calculate in order to plan your process. Let’s look at the following example. You want to calculate the trends of poverty headcount over three years (2001, 2005, and 2010), for two different countries (Paraguay and Panama), using two poverty lines (2.5 USD and 4 USD a day, 2005 PPP). How should you organize your results?

You could organize your data in wide form:

country p_25usd_2001 p_40_2001 p_25_2005 p_40_2005 p_25_2010 p_40_2010
pry 21.884 37.384 19.212 37.248 16.232 30.472
pan 28.364 42.880 22.246 37.081 13.028 23.718

Or in long form:

country year line poverty
pry 2001 $2.5 usd 21.884
pry 2001 $4 usd 37.384
pry 2005 $2.5 usd 19.212

As you can see, both tables have the same information but the data are arranged differently. The first table, in wide form, is not useful to work with in Tableau or Excel for two reasons. First, you do not want a variable called “p_25_2005” because only you know what that means. Second, Tableau or Excel would not know how many levels your data has, so you cannot construct a hierarchy.

In contrast, with long-form formatting your data will be organized and easy to manage. Long formatting refers to allocating all qualitative information in independent variables and creating as few quantitative variables as possible. This is what Wickham calls Tidy Data: each variable is a column, each observation is a row, and each type of observational unit is a table.

The matrix method

The use of matrices might be scary for the beginner but there is no reason to fear. In fact, once you have grasped the use of matrices to store your results, you will rarely deviate from using them.

The easiest way to think of matrices is to understand which values are calculated through the process and which information must be added in order to distinguish quantitative values from each other. In the example above, the poverty rates are the outputs of the process, and the value of the years or the poverty lines are the inputs. Finally, you have to understand the use of the matrix function nullmat. This function allows you to incorporate information in a matrix that you have not created yet, as well as allowing you to add information to an existing matrix.

/*Part 1: calculate and create matrices*/
local countries "pry pan"    // set of countries
local years "2001 2005 2010" // set of years
mat Y = 2001 \ 2005 \ 2010   // matrix with the information of years
cap mat drop Poverty         // drop matrix in case we need to re-run the code

local c = 0                  // counter for countries id
foreach country of local countries {
  local ++c
  mat C = J(`: word count `years'',1,`c') // Matrix with the country id
  datalib, country(`country') year(`years') clear
  gen byte poor4 = ipcf < lp_4usd // identification of poor
  gen byte poor2 = ipcf < lp_2usd
  tabstat poor* [w = pondera], by(ano) nototal save // poverty rates
  tabstatmat P, not                                 // store in a matrix the poverty rates
  mat Poverty = nullmat(Poverty) \ C, Y, P          /* put together country id, year,
                                                       and poverty rate*/
}

/*Part 2: Reshape data and add value labels*/
mat colnames Poverty = country year poor4 poor2 // name columns
drop _all                                       // clear current data
svmat Poverty, n(col)                           // transform matrix to data
reshape long poor, i(country year) j(line)      // reshape to long format

* assign values
local countries "pry pan"
local i = 0
foreach country of local countries {
  local ++i
  label define country `i' `country', modify
}
label value country country
label define line 2 "$2.5 usd" 4 "$4 usd", modify
label value line line
ren poor poverty
replace poverty = poverty*100
format poverty %5.3f
list, clean

The Postfile Method

If you do not want to deal with identifiers in your program, you might be willing to exchange the flexibility of the matrix method for the ease of the postfile method. You will notice that even though the results are the same, programming with postfile requires less lines of code. However, consider that even though you type more lines with the matrix method, your code is faster, more efficient, more flexible, and more adaptable than the postfile method. In this example, the matrix code takes 8’’ to complete whereas the postfile code takes almost 11’’.

Postfiles are an excellent tool when you have too many qualitative variables since you do not need to keep track of all the counters or identifiers for the string variables as you do with matrices. Memory, computing resources, and efficiency, however, are highly compromised when you use postfiles.

/*Part 1: calculate and create matrices*/
local countries "pry pan"    // set of countries
local years "2001 2005 2010" // set of years

tempname f                   // define handle for postfile
tempfile tfile               // define name for temporal file

postfile `f' str3 country str4 year str15 line /// define variable names and format
            double poverty using `tfile', replace

foreach country of local countries { // loop for countries
 foreach year of local years { // loop for years
  datalib, country(`country') year(`year') clear
  foreach pl in 2 4 { // loop for poverty lines
    apoverty ipcf [w=pondera], varpl(lp_`pl'usd) // poverty
    post `f' ("`country'") ("`year'") /// store information
        ("$`pl' usd") (`r(head_1)')
  }
 }
}
postclose `f' // close postfile

/*Part 2: formatting */
use `tfile', clear // load file
replace line = "$2.5 usd" if line == "$2 usd"
format poverty %5.3f
list, clean

Exporting your results

The best way to export results is by exporting your tables in an Excel file and assigning an independent spreadsheet for your data only. You can do anything you want with your data once it is in the Excel file, but do NOT use the spreadsheet where your exported data lives for anything other than hosting your data. If you are working in Excel, create your graphs and tables in a different sheet and link them to your exported data. This way, if you need to make any changes to your code, you will be able to export your results again from Stata and just update the data sheet instead of deleting your graphs and tables.

There are several Stata commands to export results in Excel but we recommend the export excel functionality added by Stata in version 12.

export excel using "C:\Mydata\Pov_rates.xlsx", ///
  sheet("raw") sheetreplace first(varlabel)

The option sheet allows you to declare the name of the spreadsheet. The option sheetreplace ensures that, in case you need to rerun your code, Stata will change only the spreadsheet with the exported data and will leave the rest of your sheets untouched. Finally, by using the option first(varlabel), you will be able to export the labels of your variables as the name of the variables in the exported file. Furthermore, labeling your variables is very useful when working in Tableau since it automatically uses the name of the variables as the name of the axis in a graph or the title in a table.

References