Econometrics By Simulation: More Readable Code with Pipes in R

Wednesday, July 30, 2014

More Readable Code with Pipes in R

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:

1. Reduced requirements of nested parenthesizes
2. Order of functional operations now read from left to right
3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.

require('magrittr')
 
# Let's play with some strings
 
str1 = "A scratch? Your arm's off."
str2 = "I've had worse."
 
str1 %>% substr(3,9)   
#[1]Evaluates to "scratch"
 
str1 %>% strsplit('?',fixed=TRUE)
#[[1]]
#[1] "A scratch"        " Your arm's off."
 
# Pipes can be chained as well
str1 %>% paste(str2) %>% toupper()
# [1] "A SCRATCH? YOUR ARM'S OFF. I'VE HAD WORSE."
 
# Let's see how pipes might work with drawing random variables
 
# I like to define a function that allows an element by element maximization
 
vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1, max)
-5:5 %>% vmax
# [1] 0 0 0 0 0 0 1 2 3 4 5
 
# This is identical to defining the function as:
vmax <- function(x, maximum=0) apply(cbind(x,0), 1, max)
vmax(-5:5)
 
# Notice that the latter formation uses the same number of parenthsize
# and be more readable.
 
# However recently I was drawing data for a simulation in which I wanted to 
# draw Nitem values from the quantiles of the normal distribution, censor the
# values at 0 and then randomize their order.
 
Nitem  <- 100
ctmean <- 1
ctsd   <- .5
 
draws <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
         qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)
 
# While this looks ugly, let's see how worse it would have been without pipes
draws <- sample(vmax(qnorm(seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)]
                  ,ctmean,ctsd)),Nitem)
 
# Both functional sequences are ugly though I think I prefer the first which
# I can easily read as seq is passed to qnorm passed to vmax passed to sample
 
# A few things to note with the %>% operator. If you want to send the value to
# an argument which is not the first or is a named value, use the '.'
 
mydata <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
          qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%
          data.frame(index = 1:Nitem , theta = .)
 
# Also not that the operator is not as slow as you might think it should be.
# Thus:
 
1 + 8 %>% sqrt
# Returns 3.828427
 
# Rather than
(1 + 8) %>% sqrt
# [1] 3

Created by Pretty R at inside-R.org

5 comments:

Bob MuenchenAugust 1, 2014 at 9:01 AM
Hi Francis,

Thanks for yet another excellent post! The exact role of the " . " pronoun had not been clear to me until I read this.

Cheers,
Bob
ReplyDelete
Replies
AnonymousAugust 1, 2014 at 9:29 AM
Are there any performance advantages with pipes? Readability is nice but it would only be a matter of preference if there were no other advantages.

-Marcus
ReplyDelete
Replies
www.phillipburger.net/wordpressOctober 23, 2014 at 9:56 AM
Hadley Wickham at his userR! dplyr tutorial discussed and gave an example of the pipe operator in the context of dplyr. At about minute 44 minutes in this video from the event:

https://www.youtube.com/watch?v=8SGif63VW6E
ReplyDelete
Replies

Add comment