# Working with Loops

This post is part of an updated version of the chapters of the third part of the handbook

Poverty and inequality measures in pracitce(2014) internally avaialable at the World Bank intranet.

In this post, I provide few tips and better understanding on using loops in Stata. I assume that you already know what a loop is and the main differences between `foreach`

, `forvalues`

, and `while`

loops. Perhaps the latter is less well-known, so I talk a little about it below.

The basic difference between `foreach`

and `forvalues`

is that the former loops over any kind of lists (e.g., `varlist`

, `numlist`

, `local`

, `global`

… ), whereas the latter loops over different numeric values of in a range. Accroding the to the Statat manual, using `forvalues`

is the most efficient way to execute loops in numeric ranges and, though it might be true in most cases, it is not true for all the cases. This is explained below too.

Thus, let’s see first use of `while`

loops and the then see the differnces and functionalities of `foreach`

and `forvalues`

loops.

`while`

Loops

The `while`

command is used when you do not know how many times your procedure should loop. For example,
the following code solves the equation \(x^2+x-6 =0\).

```
qui {
local xnew 1
local xold 0
local iteration 1
while abs(`xnew'-`xold')>.000001 & `iteration'<100 {
local xold `xnew'
local xnew=`xold'-(`xold'^2+`xold'-6)/(2*`xold'+1)
noi display "Iteration: `iteration++', x = `xnew'"
}
}
r; t=5.46 6:06:46
. qui {
Iteration: 1, x = 2.333333333333333
Iteration: 2, x = 2.019607843137255
Iteration: 3, x = 2.000076295109483
Iteration: 4, x = 2.000000001164153
Iteration: 5, x = 2
r; t=0.02 6:06:46
r; t=0.02 6:06:46
```

When solving the equation for \(x\), we know the result is \(2\). However, if we need Stata to solve this equation, we have to iterate until the last iteration is practically the same to the previous one, or until the loop has iterated so many times that you may think your problem does not have a numerical solution. In this case, the `while`

loop stops until either the difference of the last two iterations is smaller than .0001% or until the loop iterates 100 times. As you can see, because of the the way the solution is coded, we do not know how many times it will take to converge to 2 because it depends on the first value of the local macro ``xnew’`

. For instance, if ``xnew’`

is a negative value, the iteration will never converge. Make sure you define the local macro `‘xold’`

initially, otherwise the loop will not start.

`forvalues`

and `foreach`

Loops

When you need to repeat pieces of code over a **specific** set of values, you can use the `forvalues`

or the `foreach`

loops. Say you have a dataset that comes from an Excel sheet where each column (or variable)
represents the GDP per capita, PPP (constant 2011 international $) for a specific year from 1960 to 2017, and each row corresponds to a specific country. Let’s assume that you need to find the GDP growth by normalizing to 2010. We will use the command `wbopendata`

by Joao Pedro Azevedo. Available in Github.

```
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
ds yr* // variables starting with yr
local years = "`r(varlist)'"
local years: subinstr local years "yr" "", all // remove yr
local years: subinstr local years " " ", ", all // separate by comma
local miny = min(`years')
local maxy = max(`years')
forvalues y = `miny'(1)`maxy' {
gen gdp_gr`y' = yr`y'/yr2010 // growth normalized to 2010
}
noi ds gdp_gr*
}
r; t=5.04 6:06:51
. qui {
gdp_gr1960 gdp_gr1970 gdp_gr1980 gdp_gr1990 gdp_gr2000 gdp_gr2010
gdp_gr1961 gdp_gr1971 gdp_gr1981 gdp_gr1991 gdp_gr2001 gdp_gr2011
gdp_gr1962 gdp_gr1972 gdp_gr1982 gdp_gr1992 gdp_gr2002 gdp_gr2012
gdp_gr1963 gdp_gr1973 gdp_gr1983 gdp_gr1993 gdp_gr2003 gdp_gr2013
gdp_gr1964 gdp_gr1974 gdp_gr1984 gdp_gr1994 gdp_gr2004 gdp_gr2014
gdp_gr1965 gdp_gr1975 gdp_gr1985 gdp_gr1995 gdp_gr2005 gdp_gr2015
gdp_gr1966 gdp_gr1976 gdp_gr1986 gdp_gr1996 gdp_gr2006 gdp_gr2016
gdp_gr1967 gdp_gr1977 gdp_gr1987 gdp_gr1997 gdp_gr2007 gdp_gr2017
gdp_gr1968 gdp_gr1978 gdp_gr1988 gdp_gr1998 gdp_gr2008 gdp_gr2018
gdp_gr1969 gdp_gr1979 gdp_gr1989 gdp_gr1999 gdp_gr2009
r; t=0.92 6:06:52
r; t=0.93 6:06:52
```

The same procedure can be done using the `foreach`

loop. Yet, notice that the line `foreach y of local years`

could be replaced by `foreach y in `years'`

.

```
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
ds yr* // variables starting with yr
local years = "`r(varlist)'"
local years: subinstr local years "yr" "", all // remove yr
foreach y of local years {
gen gdp_gr`y' = yr`y'/yr2010 // growth normalized to 2010
}
noi ds gdp_gr*
}
r; t=5.04 6:06:58
. qui {
gdp_gr1960 gdp_gr1970 gdp_gr1980 gdp_gr1990 gdp_gr2000 gdp_gr2010
gdp_gr1961 gdp_gr1971 gdp_gr1981 gdp_gr1991 gdp_gr2001 gdp_gr2011
gdp_gr1962 gdp_gr1972 gdp_gr1982 gdp_gr1992 gdp_gr2002 gdp_gr2012
gdp_gr1963 gdp_gr1973 gdp_gr1983 gdp_gr1993 gdp_gr2003 gdp_gr2013
gdp_gr1964 gdp_gr1974 gdp_gr1984 gdp_gr1994 gdp_gr2004 gdp_gr2014
gdp_gr1965 gdp_gr1975 gdp_gr1985 gdp_gr1995 gdp_gr2005 gdp_gr2015
gdp_gr1966 gdp_gr1976 gdp_gr1986 gdp_gr1996 gdp_gr2006 gdp_gr2016
gdp_gr1967 gdp_gr1977 gdp_gr1987 gdp_gr1997 gdp_gr2007 gdp_gr2017
gdp_gr1968 gdp_gr1978 gdp_gr1988 gdp_gr1998 gdp_gr2008 gdp_gr2018
gdp_gr1969 gdp_gr1979 gdp_gr1989 gdp_gr1999 gdp_gr2009
r; t=0.85 6:06:59
r; t=0.85 6:06:59
```

We will see in detail how to use the `foreach`

and the `forvalues`

commands, but let’s first go over some
general considerations about loops

### the `foreach`

command

The `foreach`

command is the most flexible and efficient command for looping over a list of elements. That list could be anything; strings, numbers, or a combination of both. The basic syntax is

```
foreach lname {in|of list-type} list {
*lines of code...
}
```

where `lname`

is the name of the local macro that contains each element of the list of the loop. If you manually defined the loop (i.e., you type each value of the list), you must use the `in`

variant. In contrast, if the list could be defined somewhere else, like in a local macro, varlist, or numlist,, you should use the `of`

variation because you are referring to a **list** that has been created somewhere else. For example, if you have GDP per capita, PPP (constant 2011 international $) for each year, `yr2005`

, `yr2006`

…`yr2010`

, and you want to calculate the natural logarithm for each, there are several ways you can do it with loops. The following are just three possible ways. The third one is, of course, the easiest to type.

#### Option 1

```
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
foreach var in yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 {
local y: subinstr local var "yr" "", all
gen ln_gdp`y' = ln(`var')
}
noi ds ln*
}
r; t=5.01 6:07:04
. qui {
ln_gdp2005 ln_gdp2007 ln_gdp2009 ln_gdp2011
ln_gdp2006 ln_gdp2008 ln_gdp2010
r; t=0.80 6:07:05
r; t=0.81 6:07:05
```

#### Option 2

```
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
foreach y of numlist 2005/2011 {
gen ln_gdp`y' = ln(yr`y')
}
noi ds ln*
}
r; t=5.05 6:07:11
. qui {
ln_gdp2005 ln_gdp2007 ln_gdp2009 ln_gdp2011
ln_gdp2006 ln_gdp2008 ln_gdp2010
r; t=4.34 6:07:15
r; t=4.34 6:07:15
```

#### Option 3

```
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
foreach var of varlist yr2005-yr2011 {
local y: subinstr local var "yr" "", all
gen ln_gdp`y' = ln(`var')
}
noi ds ln*
}
r; t=5.03 6:07:21
. qui {
ln_gdp2005 ln_gdp2007 ln_gdp2009 ln_gdp2011
ln_gdp2006 ln_gdp2008 ln_gdp2010
r; t=0.86 6:07:21
r; t=0.87 6:07:21
```

You may see that options 2 and 3 are preferable than option 1. Yet, my favorite option is the following:

#### Option 4

```
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
ds yr2005-yr2011 // get varlist as I want it
local vars = "`r(varlist)'" // store it in a local
foreach var of local vars { // change varlist for local with respect to option 3
local y: subinstr local var "yr" "", all
gen ln_gdp`y' = ln(`var')
}
noi ds ln*
}
r; t=5.04 6:07:27
. qui {
ln_gdp2005 ln_gdp2007 ln_gdp2009 ln_gdp2011
ln_gdp2006 ln_gdp2008 ln_gdp2010
r; t=0.84 6:07:28
r; t=0.84 6:07:28
```

Option 4 requires more lines of code but it is more convenient when programming because you’re saving in a local macro the list of element over which the loop is running. This is very useful for subsequent work with the same list, so I think it is better to define always the list of elements first in a local macro and then loop over the local (i.e., `foreach x of local...`

).

### The `forvalues`

command

This command loops over a list of consecutive numbers in a range. The example above can be executed using the `forvalues`

command:

```
qui {
wbopendata, indicator(ny.gdp.pcap.pp.kd) clear
forvalues y = 2005(1)2010 {
gen ln_gdp`y' = ln(yr`y')
}
noi ds ln*
}
r; t=5.04 6:07:33
. qui {
ln_gdp2005 ln_gdp2006 ln_gdp2007 ln_gdp2008 ln_gdp2009 ln_gdp2010
r; t=0.82 6:07:34
r; t=0.83 6:07:34
```

Notice that option 2 of the `foreach`

examples is equivalent to the `forvalues`

example above. However, according to the Stata manual, `forvalues`

is mor efficient.

## Problems with `foreach`

and `forvalues`

.

In general, the `foreach`

loop can replace any `forvalues`

loop, but the `forvalues`

loop is more
efficient to execute long sequential lists since it was designed with that intention. For instance, the `foreach`

loop fails to execute the following code because the the `numlist`

is too long to be contained in a local macro.

```
r; t=5.03 6:07:40
. foreach x of numlist 0(.01)100 {
2. disp "`x'"
3. }
invalid numlist has too many elements
r(123); t=0.00 6:07:40
end of do-file
r(123); t=0.01 6:07:40
```

Yet, `forvalues`

has also some difficulties. In same cases, when the list of numbers includes decimals, `forvalues`

uses wrong elements of the list.

```
qui {
forvalues x = 0(.01).3 {
noi disp "`x'"
}
}
r; t=5.04 6:07:45
. qui {
0
.01
.02
.03
.04
.05
.06
.07
.08
.09
.1
.11
.12
.13
.14
.15
.16
.17
.18
.19
.2
.21
.2200000000000001
.2300000000000001
.2400000000000001
.2500000000000001
.2600000000000001
.2700000000000001
.2800000000000001
.2900000000000001
r; t=0.02 6:07:45
r; t=0.02 6:07:45
```

In contrast, the `foreach`

loop over a numlist works fine.

```
qui {
foreach x of numlist 0(.01).3 {
noi disp "`x'"
}
}
r; t=5.01 6:07:51
. qui {
0
.01
.02
.03
.04
.05
.06
.07
.08
.09
.1
.11
.12
.13
.14
.15
.16
.17
.18
.19
.2
.21
.22
.23
.24
.25
.26
.27
.28
.29
.3
r; t=0.01 6:07:51
r; t=0.01 6:07:51
```

The reason for this wierd behavior in `forvalues`

is that it calculates the next item of the list each time the loop starts over, whereas the `foreach`

loop over a predefined numlist even if it has been defined in the loop itself.

## Considerations

In my view, unless you have to loop over a very long list of consecutive numbers, I suggest to use the

`foreach`

loop anytime you can. This is so, not only because coding`foreach`

loops as explained above will become natural to you, but also because the`forvalues`

loop performance incorrectly when the list of numbers includes decimals.- As shown in the example in option 4, I define the list of elements over which the loop will run in local
`vars`

and then defined the*lname*of the loop as`var`

. This is very intuitive and you should consider doing it as well. Usually, people define the*lname*of the loop as a random letter (e.g., x, y, z) but most of the times it does not bring any meaning to the code.^{1}If you define the list of elements as a plural noun that clearly characterizes your list and the*lname*of loop as the singular noun, your code would make more sense. for example, your loops could read something like this

`foreach year of local years {`

`foreach country of local countries {`

`foreach child of local children {`

`foreach person of local people {`

`foreach woman of local women {`

Yes, It is true that you may have to type more, but the cost of doing so is minimal in contrast to great gain in readability and clarity.

Always indent the lines of code inside each loop. If you have a two-level loop, the lines of code of the inner loop should be indented twice, and so on an so for with nested loops of higher levels.

when the number of lines of code inside the loop is greater than, say, 5 or 6 lines consider making a comment on the final line of each loop so it is easy to see where it ends.

By taking these two considerations into account, the readability of your code would be much better. For example, the of code below are not indented and the end of the loops are not commented.

**Ugly code**

```
foreach year of numlist 1991/2000 {
use "${dataout}\poorland_`year'.dta", clear
gen ipcf = thi/members
label var ipcf "per capita household income"
local line = 0
foreach lp in lp_1usd lp_2usd lp_4usd {
local ++line
apoverty ipcf [fw = weight], varpl(`lp') h pgr fgt3
mat P = nullmat(P) \ `year', `line', 1 , `r(head_1)'
mat P = nullmat(P) \ `year', `line', 2 , `r(pogapr_1)'
mat P = nullmat(P) \ `year', `line', 3 , `r(fogto3_1)'
}
}
```

In contrast, these lines are indented and the terminating lines are commented.

**Neat code**

```
foreach year of numlist 1991/2000 { // loop for each database
use "${dataout}\poorland_`year'.dta", clear
gen ipcf = thi/members
label var ipcf "per capita household income"
local line = 0 // counter for poverty line
foreach lp in lp_1usd lp_2usd lp_4usd { // loop for poverty lines
local ++line
apoverty ipcf [fw = weight], varpl(`lp') h pgr fgt3
mat P = nullmat(P) \ `year', `line', 1 , `r(head_1)'
mat P = nullmat(P) \ `year', `line', 2 , `r(pogapr_1)'
mat P = nullmat(P) \ `year', `line', 3 , `r(fogto3_1)'
} // end of poverty lines loop
} // end of years loop
```

Stata tip: very often the use of the

`egen`

command is much faster than using loops.

R.Andres Castañeda, ed. 2014. *Poverty and Inequality Measures in Practice: A Basic Reference Guide with Stata Examples*. Washington, D.C.: World Bank.

I usually use random letters in a loop when the letter refers to a counter.↩