Wednesday, July 30, 2014

More Readable Code with Pipes in R

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:
 
1. Reduced requirements of nested parenthesizes
2. Order of functional operations now read from left to right
3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.




require('magrittr')
 
# Let's play with some strings
 
str1 = "A scratch? Your arm's off."
str2 = "I've had worse."
 
str1 %>% substr(3,9)   
#[1]Evaluates to "scratch"
 
str1 %>% strsplit('?',fixed=TRUE)
#[[1]]
#[1] "A scratch"        " Your arm's off."
 
# Pipes can be chained as well
str1 %>% paste(str2) %>% toupper()
# [1] "A SCRATCH? YOUR ARM'S OFF. I'VE HAD WORSE."
 
# Let's see how pipes might work with drawing random variables
 
# I like to define a function that allows an element by element maximization
 
vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1, max)
-5:5 %>% vmax
# [1] 0 0 0 0 0 0 1 2 3 4 5
 
# This is identical to defining the function as:
vmax <- function(x, maximum=0) apply(cbind(x,0), 1, max)
vmax(-5:5)
 
# Notice that the latter formation uses the same number of parenthsize
# and be more readable.
 
# However recently I was drawing data for a simulation in which I wanted to 
# draw Nitem values from the quantiles of the normal distribution, censor the
# values at 0 and then randomize their order.
 
Nitem  <- 100
ctmean <- 1
ctsd   <- .5
 
draws <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
         qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)
 
# While this looks ugly, let's see how worse it would have been without pipes
draws <- sample(vmax(qnorm(seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)]
                  ,ctmean,ctsd)),Nitem)
 
# Both functional sequences are ugly though I think I prefer the first which
# I can easily read as seq is passed to qnorm passed to vmax passed to sample
 
# A few things to note with the %>% operator. If you want to send the value to
# an argument which is not the first or is a named value, use the '.'
 
mydata <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
          qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%
          data.frame(index = 1:Nitem , theta = .)
 
# Also not that the operator is not as slow as you might think it should be.
# Thus:
 
1 + 8 %>% sqrt
# Returns 3.828427
 
# Rather than
(1 + 8) %>% sqrt
# [1] 3
Created by Pretty R at inside-R.org

5 comments:

  1. Hi Francis,

    Thanks for yet another excellent post! The exact role of the " . " pronoun had not been clear to me until I read this.

    Cheers,
    Bob

    ReplyDelete
    Replies
    1. Thanks Bob,

      I had some difficulty with the limited examples available as well which was part of my motivation for writing the post. You can see me seeking help on stackoverflow:

      http://stackoverflow.com/questions/24956640/passing-named-arguments-through-magrittr

      Francis

      Delete
  2. Are there any performance advantages with pipes? Readability is nice but it would only be a matter of preference if there were no other advantages.

    -Marcus

    ReplyDelete
    Replies
    1. As far as I know there are no performance advantages though if performance is all we care about then we probably should not be using R at all since Julia seems to be an order of magnitude faster.

      Francis

      Delete
  3. Hadley Wickham at his userR! dplyr tutorial discussed and gave an example of the pipe operator in the context of dplyr. At about minute 44 minutes in this video from the event:

    https://www.youtube.com/watch?v=8SGif63VW6E

    ReplyDelete