Wednesday, July 30, 2014

More Readable Code with Pipes in R

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:
 
1. Reduced requirements of nested parenthesizes
2. Order of functional operations now read from left to right
3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.




require('magrittr')
 
# Let's play with some strings
 
str1 = "A scratch? Your arm's off."
str2 = "I've had worse."
 
str1 %>% substr(3,9)   
#[1]Evaluates to "scratch"
 
str1 %>% strsplit('?',fixed=TRUE)
#[[1]]
#[1] "A scratch"        " Your arm's off."
 
# Pipes can be chained as well
str1 %>% paste(str2) %>% toupper()
# [1] "A SCRATCH? YOUR ARM'S OFF. I'VE HAD WORSE."
 
# Let's see how pipes might work with drawing random variables
 
# I like to define a function that allows an element by element maximization
 
vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1, max)
-5:5 %>% vmax
# [1] 0 0 0 0 0 0 1 2 3 4 5
 
# This is identical to defining the function as:
vmax <- function(x, maximum=0) apply(cbind(x,0), 1, max)
vmax(-5:5)
 
# Notice that the latter formation uses the same number of parenthsize
# and be more readable.
 
# However recently I was drawing data for a simulation in which I wanted to 
# draw Nitem values from the quantiles of the normal distribution, censor the
# values at 0 and then randomize their order.
 
Nitem  <- 100
ctmean <- 1
ctsd   <- .5
 
draws <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
         qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)
 
# While this looks ugly, let's see how worse it would have been without pipes
draws <- sample(vmax(qnorm(seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)]
                  ,ctmean,ctsd)),Nitem)
 
# Both functional sequences are ugly though I think I prefer the first which
# I can easily read as seq is passed to qnorm passed to vmax passed to sample
 
# A few things to note with the %>% operator. If you want to send the value to
# an argument which is not the first or is a named value, use the '.'
 
mydata <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
          qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%
          data.frame(index = 1:Nitem , theta = .)
 
# Also not that the operator is not as slow as you might think it should be.
# Thus:
 
1 + 8 %>% sqrt
# Returns 3.828427
 
# Rather than
(1 + 8) %>% sqrt
# [1] 3
Created by Pretty R at inside-R.org

4 comments :

  1. Hi Francis,

    Thanks for yet another excellent post! The exact role of the " . " pronoun had not been clear to me until I read this.

    Cheers,
    Bob

    ReplyDelete
    Replies
    1. Thanks Bob,

      I had some difficulty with the limited examples available as well which was part of my motivation for writing the post. You can see me seeking help on stackoverflow:

      http://stackoverflow.com/questions/24956640/passing-named-arguments-through-magrittr

      Francis

      Delete
  2. Are there any performance advantages with pipes? Readability is nice but it would only be a matter of preference if there were no other advantages.

    -Marcus

    ReplyDelete
    Replies
    1. As far as I know there are no performance advantages though if performance is all we care about then we probably should not be using R at all since Julia seems to be an order of magnitude faster.

      Francis

      Delete