Wednesday, July 30, 2014

More Readable Code with Pipes in R

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:
1. Reduced requirements of nested parenthesizes
2. Order of functional operations now read from left to right
3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.

# Let's play with some strings
str1 = "A scratch? Your arm's off."
str2 = "I've had worse."
str1 %>% substr(3,9)   
#[1]Evaluates to "scratch"
str1 %>% strsplit('?',fixed=TRUE)
#[1] "A scratch"        " Your arm's off."
# Pipes can be chained as well
str1 %>% paste(str2) %>% toupper()
# Let's see how pipes might work with drawing random variables
# I like to define a function that allows an element by element maximization
vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1, max)
-5:5 %>% vmax
# [1] 0 0 0 0 0 0 1 2 3 4 5
# This is identical to defining the function as:
vmax <- function(x, maximum=0) apply(cbind(x,0), 1, max)
# Notice that the latter formation uses the same number of parenthsize
# and be more readable.
# However recently I was drawing data for a simulation in which I wanted to 
# draw Nitem values from the quantiles of the normal distribution, censor the
# values at 0 and then randomize their order.
Nitem  <- 100
ctmean <- 1
ctsd   <- .5
draws <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
         qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)
# While this looks ugly, let's see how worse it would have been without pipes
draws <- sample(vmax(qnorm(seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)]
# Both functional sequences are ugly though I think I prefer the first which
# I can easily read as seq is passed to qnorm passed to vmax passed to sample
# A few things to note with the %>% operator. If you want to send the value to
# an argument which is not the first or is a named value, use the '.'
mydata <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
          qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%
          data.frame(index = 1:Nitem , theta = .)
# Also not that the operator is not as slow as you might think it should be.
# Thus:
1 + 8 %>% sqrt
# Returns 3.828427
# Rather than
(1 + 8) %>% sqrt
# [1] 3
Created by Pretty R at


  1. Hi Francis,

    Thanks for yet another excellent post! The exact role of the " . " pronoun had not been clear to me until I read this.


    1. Thanks Bob,

      I had some difficulty with the limited examples available as well which was part of my motivation for writing the post. You can see me seeking help on stackoverflow:


  2. Are there any performance advantages with pipes? Readability is nice but it would only be a matter of preference if there were no other advantages.


    1. As far as I know there are no performance advantages though if performance is all we care about then we probably should not be using R at all since Julia seems to be an order of magnitude faster.