Working with Loops
This post is part of an updated version of the chapters of the third part of the handbook Poverty and inequality measures in practice internally available at the World Bank intranet.
In this post, I provide a few tips and better understanding on using loops in Stata. I assume that you already know what a loop is and the main differences between foreach, forvalues, and while loops. Perhaps the latter is less well-known, so I talk a little about it below.
The basic difference between foreach and forvalues is that the former loops over any kind of lists (e.g., varlist, numlist, local, global…), whereas the latter loops over different numeric values in a range. According to the Stata manual, using forvalues is the most efficient way to execute loops in numeric ranges and, though it might be true in most cases, it is not true for all cases. This is explained below too.
Thus, let’s see first use of while loops and then see the differences and functionalities of foreach and forvalues loops.
while Loops
The while command is used when you do not know how many times your procedure should loop. For example, the following code solves the equation \(x^2+x-6 = 0\).
qui {
local xnew 1
local xold 0
local iteration 1
while abs(`xnew'-`xold')>.000001 & `iteration'<100 {
local xold `xnew'
local xnew=`xold'-(`xold'^2+`xold'-6)/(2*`xold'+1)
noi display "Iteration: `iteration++', x = `xnew'"
}
}When solving the equation for \(x\), we know the result is \(2\). However, if we need Stata to solve this equation, we have to iterate until the last iteration is practically the same as the previous one, or until the loop has iterated so many times that you may think your problem does not have a numerical solution. In this case, the while loop stops until either the difference of the last two iterations is smaller than .0001% or until the loop iterates 100 times. Make sure you define the local macro 'xold' initially, otherwise the loop will not start.
forvalues and foreach Loops
When you need to repeat pieces of code over a specific set of values, you can use the forvalues or the foreach loops. Say you have a dataset that comes from an Excel sheet where each column (or variable) represents the GDP per capita, PPP (constant 2011 international $) for a specific year from 1960 to 2017, and each row corresponds to a specific country. Let’s assume that you need to find the GDP growth by normalizing to 2010. We will use the command wbopendata by Joao Pedro Azevedo, available on Github.
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
ds yr* // variables starting with yr
local years = "`r(varlist)'"
local years: subinstr local years "yr" "", all // remove yr
local years: subinstr local years " " ", ", all // separate by comma
local miny = min(`years')
local maxy = max(`years')
forvalues y = `miny'(1)`maxy' {
gen gdp_gr`y' = yr`y'/yr2010 // growth normalized to 2010
}
noi ds gdp_gr*
}The same procedure can be done using the foreach loop. Yet, notice that the line foreach y of local years could be replaced by foreach y in `years'.
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
ds yr* // variables starting with yr
local years = "`r(varlist)'"
local years: subinstr local years "yr" "", all // remove yr
foreach y of local years {
gen gdp_gr`y' = yr`y'/yr2010 // growth normalized to 2010
}
noi ds gdp_gr*
}the foreach command
The foreach command is the most flexible and efficient command for looping over a list of elements. That list could be anything; strings, numbers, or a combination of both. The basic syntax is
foreach lname {in|of list-type} list {
*lines of code...
}where lname is the name of the local macro that contains each element of the list of the loop. If you manually defined the loop (i.e., you type each value of the list), you must use the in variant. In contrast, if the list could be defined somewhere else, like in a local macro, varlist, or numlist, you should use the of variation. For example, if you have GDP per capita, PPP (constant 2011 international $) for each year, yr2005, yr2006…yr2010, and you want to calculate the natural logarithm for each, there are several ways:
Option 1
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
foreach var in yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 {
local y: subinstr local var "yr" "", all
gen ln_gdp`y' = ln(`var')
}
noi ds ln*
}Option 2
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
foreach y of numlist 2005/2011 {
gen ln_gdp`y' = ln(yr`y')
}
noi ds ln*
}Option 3
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
foreach var of varlist yr2005-yr2011 {
local y: subinstr local var "yr" "", all
gen ln_gdp`y' = ln(`var')
}
noi ds ln*
}You may see that options 2 and 3 are preferable to option 1. Yet, my favorite option is the following:
Option 4
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
ds yr2005-yr2011 // get varlist as I want it
local vars = "`r(varlist)'" // store it in a local
foreach var of local vars { // change varlist for local with respect to option 3
local y: subinstr local var "yr" "", all
gen ln_gdp`y' = ln(`var')
}
noi ds ln*
}Option 4 requires more lines of code but it is more convenient when programming because you’re saving in a local macro the list of elements over which the loop is running. This is very useful for subsequent work with the same list.
The forvalues command
This command loops over a list of consecutive numbers in a range.
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
forvalues y = 2005(1)2010 {
gen ln_gdp`y' = ln(yr`y')
}
noi ds ln*
}Problems with foreach and forvalues
In general, the foreach loop can replace any forvalues loop, but the forvalues loop is more efficient for long sequential lists. For instance, the foreach loop fails to execute the following code because the numlist is too long to be contained in a local macro.
foreach x of numlist 0(.01)100 {
disp "`x'"
}Yet, forvalues also has some difficulties. In some cases, when the list of numbers includes decimals, forvalues uses wrong elements of the list.
qui {
forvalues x = 0(.01).3 {
noi disp "`x'"
}
}In contrast, the foreach loop over a numlist works fine.
qui {
foreach x of numlist 0(.01).3 {
noi disp "`x'"
}
}The reason for this weird behavior in forvalues is that it calculates the next item of the list each time the loop starts over, whereas the foreach loop uses a predefined numlist.
Considerations
Unless you have to loop over a very long list of consecutive numbers, I suggest to use the
foreachloop anytime you can. This is so, not only because codingforeachloops will become natural to you, but also because theforvaluesloop performs incorrectly when the list of numbers includes decimals.As shown in option 4, define the list of elements as a plural noun that clearly characterizes your list and the lname of the loop as the singular noun, your code would make more sense. For example:
foreach year of local years {foreach country of local countries {foreach child of local children {
Always indent the lines of code inside each loop. If you have a two-level loop, the lines of code of the inner loop should be indented twice.
When the number of lines of code inside the loop is greater than, say, 5 or 6 lines, consider making a comment on the final line of each loop so it is easy to see where it ends.
Ugly code
foreach year of numlist 1991/2000 { use "${dataout}\poorland_`year'.dta", clear gen ipcf = thi/members label var ipcf "per capita household income" local line = 0 foreach lp in lp_1usd lp_2usd lp_4usd { local ++line apoverty ipcf [fw = weight], varpl(`lp') h pgr fgt3 mat P = nullmat(P) \ `year', `line', 1 , `r(head_1)' } }Neat code
foreach year of numlist 1991/2000 { // loop for each database use "${dataout}\poorland_`year'.dta", clear gen ipcf = thi/members label var ipcf "per capita household income" local line = 0 // counter for poverty line foreach lp in lp_1usd lp_2usd lp_4usd { // loop for poverty lines local ++line apoverty ipcf [fw = weight], varpl(`lp') h pgr fgt3 mat P = nullmat(P) \ `year', `line', 1 , `r(head_1)' } // end of poverty lines loop } // end of years loop
Stata tip: very often the use of the
egencommand is much faster than using loops.