Monday, May 13, 2013

The Power of the Evaluate, Parse, Paste Combination

# See comments below.

# One of the powerful features of Stata which I have missed the most when working with R is the absence of the Stata Macros that allow the user to construct bits of Stata code from anything and combine them into commands or variable names.

# I know many a programmer who has modest to little use of Stata might sneer at this ability since it seems to imply some kind of laziness or lack of precision.  However, I have found myself on many an occation forced to use inefficient structures in coding in R that could have easily been simplified in Stata.

# At last I have stumbled upon a solution!

# By combining the command eval(parse(text=paste("String Command"))) I am able to do exactly that I want.

# For example, let's say I wanted to construct a dataframe A with elements a through z which are populated by 100 random normal variables.

# First I need to construct a data frame.
A = data.frame(id=1:10)
# Now to populate it.
for (i in letters) eval(parse(text=paste("A$",i,"=rnorm(10)",sep="")))

# This could be simplified a bit.
teval = function(...) eval(parse(text=paste(...,sep="")))

A = data.frame(id=1:10)
# Now to populate it using almost an identical command to that above.
# Except now I need to use the dreaded double arrow assign command because we are attempting to assign from within a function.
for (i in letters) teval("A$",i,"<<-rnorm(10)")

for (i in letters[seq(1,26,2)]) teval("A$",i,"<<-runif(10)")

# Unfortunately, this command is still not as powerful as using Stata's macros.
# However, it is a lot closer.

20 comments:

  1. What about using apply and its variants instead?

    A= matrix(NA,nc=length(letters))
    colnames(A)=letters
    apply(A,2,function(x) rnorm(10))

    ReplyDelete
    Replies
    1. Definitely a good point and more efficient coding and the following command would do the same thing:
      matrix(data = rnorm(26*10), nrow = 10, ncol = 26,
      dimnames = list(1:10,letters))

      However, my example was meant to indicate the need to be able to change an element of a list individually not produce a matrix of the specified form.

      Delete
    2. # First I need to construct a data frame.
      A = data.frame(id=1:10)
      # Now to populate it.
      for (i in letters) A[,i]=rnorm(10)

      Delete
  2. http://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf

    ReplyDelete
    Replies
    1. Interesting articles yet unfortunately as far as I can tell it fails to address the basic example I presented above. A Stata structured macro could easily accomplish the task of targeting a subset of a list. Playing around with the substitute command I could not. I am really not happy with my solution. If anybody could present a better one then I would be very happy. In general I am unconvinced of the gains from not providing macros and I find that the example in the article both restrictive and not very useful.

      I greatly appreciate the R community and everything they have accomplished with R. I also believe that R is far more flexible that Stata in many ways. Except R does not have the ability to implement Macros. The strongest argument against using macros is in terms of processing efficiency as far as I can tell. However, as an interpretive language often times processing efficiency is much less important than the programming efficiency. If I have to spend hours looking for the perfect command when I could have easily modified a standard command with a Macro than 99% of the time R is not working efficiently as far as I can tell.

      Delete
  3. m <- matrix(1, nrow=10, ncol=length(letters))
    A <- cbind(id=1:10, data.frame(m))
    names(A) <- c("id", letters)
    for(i in letters) A[, i] <- rnorm(10)

    This code is similar to that in Andres T's comment, but using the `[` operator allows us a fair bit of generality. Suppose we wished to grab some arbitrary columns and set them all to zero:

    columns <- c("a", "g", "k", "p", "q")
    for(i in columns) A[, i] <- 0

    Anyway, if you're not already aware of them, I'd also look into get() and assign(), which might help you in getting stata-like macros.

    ReplyDelete
    Replies
    1. Thanks so much :)

      I knew I was doing something wrong.

      Delete
  4. Its a very rare case that constructing a string in R and evalling it is the right thing to do. Its a big red flag that says there's a better way to do it.

    In this case, you are trying to assign a data frame column by a name in a variable using the $-notation. Don't. Use [[x]] notation instead:

    for(i in letters){A[[i]]=rnorm(10)}

    That's the correct way to access a list element or data frame column by value. The $-notation is a convenient way to access it by 'name', which is unevaluated. As you have seen, in order to use $-notation with the value of a variable (or more generally, an expression) you have to evaluate the expression and then feed it unevaluated to the $-expression. Ick!

    Seriously, if you think eval() is the solution to your problem, you now have two problems.

    ReplyDelete
  5. Sorry, can't see what you are trying to do ... "to change an element of a list individually not produce a matrix of the specified form" ... could you provide an example of what you cannot do e.g. with the "apply" suggestion?

    ReplyDelete
  6. You can use variables to index lists and data frames. I would write your example as
    A = data.frame(id=1:10)
    for (i in letters) A[[i]] = rnorm(10)

    I'm not sure what you mean by "targeting a subset of a list"; can you give an example?

    ReplyDelete
  7. If you want to programmatically target a subset of a list, or a column of a data.frame, why not use [[ or [?

    e.g.
    mylist <- list(
    A=runif(10),
    B=runif(20),
    C=matrix(runif(100), ncol=10)
    )

    for (i in names(mylist)) {print(i); print(summary(mylist[[i]]))}

    or:
    for (i in letters) A[,i] <- rnorm(10)

    ReplyDelete
  8. Thank's so much everybody for your comments. I knew I was doing something wrong. All of your helpful advice greatly clarifies my handling of lists and data.frames. Let me ask one final question though.

    Imagine I have three identical lists:
    a=b=c=list(1:10, 1:100)

    Is there any way to add a new element equal to letters to each of the lists. What I am trying to do would look something like:

    for (i in letters[1:3]) assign(paste(i,"[[3]]",sep=""), letters)

    Thanks for all of your help.

    ReplyDelete
    Replies
    1. Hi Francis,

      x <- list(1:10, 10:100)
      z <- y <- x

      I've changed the list names to 'x', 'y', and 'z', since c() is already a function in R and bad things™ may happen if we redefine it. The easiest way to do this, since the lists are all equal to each other, is just to change one list, and then set all the other lists to be equal to it.

      x[[3]] <- letters
      z <- y <- x

      However, I imagine you want a more generalizable solution; what if you had 100 lists? Typing z <- y <- x <- w <- v and so on would get tedious. It's probably possible to do this a "smarter" way by playing around with the `[[`() function and using get() and paste(), but let's keep it simple. All we need to do is make a vector of the variable names, and then loop over them, like so:

      u <- list(1:10, 10:100)
      z <- y <- x <- w <- v <- u
      ourlists <- c("v", "w", "x", "y", "z")
      u[[3]] <- letters
      for(i in ourlists) assign(i, u)

      Delete
    2. This is a good response though not quite what I was looking for. Though the use of the assign command is really nice.

      Kent in the response below hit my question on the head.

      Delete
  9. To get rid of the "dreaded double arrow assign" and make it more intuitive use eval.parent() so that

    teval = function(...) eval.parent(parse(text=paste(...,sep="")))

    It would be nice if there was a library of functions designed for Stata-users (and teval could be an element). Other functions could mimicked Stata commands, but take a data= parameter and assume everything referenced was from that data set (as Stata only has one dataset in memory at a time). Making a new dataframe column, for example, could then be done by:

    gen(data=mydf, newvar, oldvar1+oldvar2+oldvar3)

    ReplyDelete
  10. If you have several things that you want to treat the same, keep them in a list. Here is one way:
    > lists = list(a=list(1:10, 1:100), b=list(1:10, 1:100), c=list(1:10, 1:100))
    > str(lists)
    List of 3
    $ a:List of 2
    ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
    ..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
    $ b:List of 2
    ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
    ..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
    $ c:List of 2
    ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
    ..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
    > for (name in names(lists)) lists[[name]][[3]] = letters
    > str(lists)
    List of 3
    $ a:List of 3
    ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
    ..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
    ..$ : chr [1:26] "a" "b" "c" "d" ...
    $ b:List of 3
    ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
    ..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
    ..$ : chr [1:26] "a" "b" "c" "d" ...
    $ c:List of 3
    ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
    ..$ : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
    ..$ : chr [1:26] "a" "b" "c" "d" ...

    An alternative is
    lists = lapply(lists, function(l) { l[[3]] = letters; l})

    Unfortunately you can't use the simpler
    for (l in lists) l[[3]] = letters
    because the sub-lists are copied to the variable l.

    ReplyDelete
    Replies
    1. Thanks Kent, this is a great ideal to bury everything within a single list. It certainly makes things cleaner that way then I can reference elements however I want and since lists can take anything it is pretty close to dealing with my own environment.

      Francis

      Delete