Econometrics By Simulation: The Power of the Evaluate, Parse, Paste Combination

Monday, May 13, 2013

The Power of the Evaluate, Parse, Paste Combination

# See comments below.

# One of the powerful features of Stata which I have missed the most when working with R is the absence of the Stata Macros that allow the user to construct bits of Stata code from anything and combine them into commands or variable names.

# I know many a programmer who has modest to little use of Stata might sneer at this ability since it seems to imply some kind of laziness or lack of precision. However, I have found myself on many an occation forced to use inefficient structures in coding in R that could have easily been simplified in Stata.

# At last I have stumbled upon a solution!

# By combining the command eval(parse(text=paste("String Command"))) I am able to do exactly that I want.

# For example, let's say I wanted to construct a dataframe A with elements a through z which are populated by 100 random normal variables.

# First I need to construct a data frame.
A = data.frame(id=1:10)
# Now to populate it.
for (i in letters) eval(parse(text=paste("A$",i,"=rnorm(10)",sep="")))

# This could be simplified a bit.
teval = function(...) eval(parse(text=paste(...,sep="")))

A = data.frame(id=1:10)
# Now to populate it using almost an identical command to that above.
# Except now I need to use the dreaded double arrow assign command because we are attempting to assign from within a function.
for (i in letters) teval("A$",i,"<<-rnorm(10)")

for (i in letters[seq(1,26,2)]) teval("A$",i,"<<-runif(10)")

# Unfortunately, this command is still not as powerful as using Stata's macros.
# However, it is a lot closer.

20 comments:

AndresTMay 13, 2013 at 3:55 PM
What about using apply and its variants instead?

A= matrix(NA,nc=length(letters))
colnames(A)=letters
apply(A,2,function(x) rnorm(10))
ReplyDelete
Replies
AnonymousMay 13, 2013 at 4:21 PM
http://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf
ReplyDelete
Replies
Rory TurnbullMay 14, 2013 at 1:34 AM
m <- matrix(1, nrow=10, ncol=length(letters))
A <- cbind(id=1:10, data.frame(m))
names(A) <- c("id", letters)
for(i in letters) A[, i] <- rnorm(10)

This code is similar to that in Andres T's comment, but using the `[` operator allows us a fair bit of generality. Suppose we wished to grab some arbitrary columns and set them all to zero:

columns <- c("a", "g", "k", "p", "q")
for(i in columns) A[, i] <- 0

Anyway, if you're not already aware of them, I'd also look into get() and assign(), which might help you in getting stata-like macros.
ReplyDelete
Replies
UnknownMay 14, 2013 at 2:44 AM
Its a very rare case that constructing a string in R and evalling it is the right thing to do. Its a big red flag that says there's a better way to do it.

In this case, you are trying to assign a data frame column by a name in a variable using the $-notation. Don't. Use [[x]] notation instead:

for(i in letters){A[[i]]=rnorm(10)}

That's the correct way to access a list element or data frame column by value. The $-notation is a convenient way to access it by 'name', which is unevaluated. As you have seen, in order to use $-notation with the value of a variable (or more generally, an expression) you have to evaluate the expression and then feed it unevaluated to the $-expression. Ick!

Seriously, if you think eval() is the solution to your problem, you now have two problems.
ReplyDelete
Replies
UnknownMay 14, 2013 at 3:35 AM
Sorry, can't see what you are trying to do ... "to change an element of a list individually not produce a matrix of the specified form" ... could you provide an example of what you cannot do e.g. with the "apply" suggestion?
ReplyDelete
Replies
Kent JohnsonMay 14, 2013 at 6:07 AM
You can use variables to index lists and data frames. I would write your example as
A = data.frame(id=1:10)
for (i in letters) A[[i]] = rnorm(10)

I'm not sure what you mean by "targeting a subset of a list"; can you give an example?
ReplyDelete
Replies
Zach Deane-MayerMay 14, 2013 at 11:50 AM
If you want to programmatically target a subset of a list, or a column of a data.frame, why not use [[ or [?

e.g.
mylist <- list(
A=runif(10),
B=runif(20),
C=matrix(runif(100), ncol=10)
)

for (i in names(mylist)) {print(i); print(summary(mylist[[i]]))}

or:
for (i in letters) A[,i] <- rnorm(10)
ReplyDelete
Replies
FrancisMay 14, 2013 at 5:28 PM
Thank's so much everybody for your comments. I knew I was doing something wrong. All of your helpful advice greatly clarifies my handling of lists and data.frames. Let me ask one final question though.

Imagine I have three identical lists:
a=b=c=list(1:10, 1:100)

Is there any way to add a new element equal to letters to each of the lists. What I am trying to do would look something like:

for (i in letters[1:3]) assign(paste(i,"[[3]]",sep=""), letters)

Thanks for all of your help.
ReplyDelete
Replies
bequwMay 15, 2013 at 4:45 PM
To get rid of the "dreaded double arrow assign" and make it more intuitive use eval.parent() so that

teval = function(...) eval.parent(parse(text=paste(...,sep="")))

It would be nice if there was a library of functions designed for Stata-users (and teval could be an element). Other functions could mimicked Stata commands, but take a data= parameter and assume everything referenced was from that data set (as Stata only has one dataset in memory at a time). Making a new dataframe column, for example, could then be done by:

gen(data=mydf, newvar, oldvar1+oldvar2+oldvar3)
ReplyDelete
Replies
Kent JohnsonMay 15, 2013 at 6:34 PM
If you have several things that you want to treat the same, keep them in a list. Here is one way:
> lists = list(a=list(1:10, 1:100), b=list(1:10, 1:100), c=list(1:10, 1:100))
> str(lists)
List of 3
$ a:List of 2
..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
$ b:List of 2
..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
$ c:List of 2
..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
> for (name in names(lists)) lists[[name]][[3]] = letters
> str(lists)
List of 3
$ a:List of 3
..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
..$ : chr [1:26] "a" "b" "c" "d" ...
$ b:List of 3
..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
..$ : chr [1:26] "a" "b" "c" "d" ...
$ c:List of 3
..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
..$ : chr [1:26] "a" "b" "c" "d" ...

An alternative is
lists = lapply(lists, function(l) { l[[3]] = letters; l})

Unfortunately you can't use the simpler
for (l in lists) l[[3]] = letters
because the sub-lists are copied to the variable l.
ReplyDelete
Replies
FrancisMay 16, 2013 at 2:41 AM
Thanks again everybody!
ReplyDelete
Replies

Add comment