## Wednesday, July 30, 2014

### More Readable Code with Pipes in R

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:

1. Reduced requirements of nested parenthesizes
2. Order of functional operations now read from left to right
3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.

require('magrittr')

# Let's play with some strings

str1 = "A scratch? Your arm's off."

str1 %>% substr(3,9)
#[1]Evaluates to "scratch"

str1 %>% strsplit('?',fixed=TRUE)
#[[1]]
#[1] "A scratch"        " Your arm's off."

# Pipes can be chained as well
str1 %>% paste(str2) %>% toupper()

# Let's see how pipes might work with drawing random variables

# I like to define a function that allows an element by element maximization

vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1, max)
-5:5 %>% vmax
# [1] 0 0 0 0 0 0 1 2 3 4 5

# This is identical to defining the function as:
vmax <- function(x, maximum=0) apply(cbind(x,0), 1, max)
vmax(-5:5)

# Notice that the latter formation uses the same number of parenthsize

# However recently I was drawing data for a simulation in which I wanted to
# draw Nitem values from the quantiles of the normal distribution, censor the
# values at 0 and then randomize their order.

Nitem  <- 100
ctmean <- 1
ctsd   <- .5

draws <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>%
qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)

# While this looks ugly, let's see how worse it would have been without pipes
draws <- sample(vmax(qnorm(seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)]
,ctmean,ctsd)),Nitem)

# Both functional sequences are ugly though I think I prefer the first which
# I can easily read as seq is passed to qnorm passed to vmax passed to sample

# A few things to note with the %>% operator. If you want to send the value to
# an argument which is not the first or is a named value, use the '.'

mydata <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>%
qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%
data.frame(index = 1:Nitem , theta = .)

# Also not that the operator is not as slow as you might think it should be.
# Thus:

1 + 8 %>% sqrt
# Returns 3.828427

# Rather than
(1 + 8) %>% sqrt
# [1] 3
1. Hi Francis,

Thanks for yet another excellent post! The exact role of the " . " pronoun had not been clear to me until I read this.

Cheers,
Bob

1. Thanks Bob,

I had some difficulty with the limited examples available as well which was part of my motivation for writing the post. You can see me seeking help on stackoverflow:

http://stackoverflow.com/questions/24956640/passing-named-arguments-through-magrittr

Francis

2. Are there any performance advantages with pipes? Readability is nice but it would only be a matter of preference if there were no other advantages.

-Marcus

1. As far as I know there are no performance advantages though if performance is all we care about then we probably should not be using R at all since Julia seems to be an order of magnitude faster.

Francis

