Monday, September 15, 2014

How do you say π^π^π?

Well, not that you really probably want to know how to say such an absurdly large number. However for those of you who are interested (allowing for rounding) it is:

one quintillion, three hundred forty quadrillion, one hundred sixty-four trillion, one hundred eighty-three billion, six million, three hundred thirty-nine thousand, eight hundred forty

And yes you can find out how to write your very own absurdly long numbers as well in R! I have adapted John Fox's numbers2words function in order to allow for both absurdly long words as well as decimals. We can see some examples below:
 
> number2words(.1)
[1] "one tenths"
> number2words(94.488)
[1] "ninety-four and four hundred eighty-eight thousandths"
> number2words(-12187.1, proper=T)
[1] "minus twelve thousand, one hundred eighty-seven and one tenths"
[2] "12,187.1"                                                      
> number2words(123)
[1] "one hundred twenty-three"
 
You can find the code on github

Tuesday, September 9, 2014

Fun with Bordered Cubes

I am interested in generating 3D reasoning items in R. To this end I have adapted some of the awesome functions built in the rgl library to my ends. My new function is 'cube' and it takes position and automatically sizes itself as a 1x1x1 cube though this can be adjusted through the scale argument.

Find the code on github 
 
# Easy Bordered Cubes in R
 
library('rgl'); library('magrittr')
 
cube <- function(x=0,y=0,z=0, bordered=TRUE, 
                 filled = TRUE, lwd=2, scale=1,
                 fillcol = gray(.95),
                 bordercol ='black', ...) {
 
  mycube <- cube3d()
 
  # Reduce size to unit
  mycube$vb[4,] <- mycube$vb[4,]/scale*2
 
  for (i in 1:length(x)) {
    # Add cube border
    if (bordered) {
      bcube <- mycube
      bcube$material$lwd <- lwd
      bcube$material$front <- 'line'
      bcube$material$back <- 'line'
      bcube %>% translate3d(x[i], y[i], z[i]) %>% shade3d
    }
    # Add cube fill
    if (filled) {
      fcube <- mycube
      fcube$vb[4,] <- fcube$vb[4,]*1.01
      fcube$material$col <- fillcol
      fcube %>% translate3d(x[i], y[i], z[i]) %>% shade3d
    }
  }
}
 
clear3d()
cube(0,0,0)
cube(1,0,0, filled=F)
cube(-1,0,0, bordered=F)
movie3d(spin3d(axis=c(0,0,1), rpm=20), duration=2.95) 
  
# I mapped R using an excel spreadsheet which 
# translated Xs into 2D locations points
clear3d()
y <- c(1,1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,5,5,
       5,5,6,6,6,6,7,7,7,7,7,8,8,8,8,9,9,9,9,10,
       10,10,10,11,11,11,11,11,12,12,12,12,12)
 
x <- c(8,7,6,3,2,1,7,6,3,2,6,5,3,2,6,5,4,3,2,5,4,
       3,2,5,4,3,2,6,5,4,3,2,7,6,3,2,7,6,3,2,7,6,
       3,2,6,5,4,3,2,5,4,3,2,1)
 
z <- cummax(y)*.5
 
length(x)==length(y)
cube(x,y,z)
movie3d(spin3d(axis=c(0,0,1), rpm=20), duration=2.95) 
 
 # Let's see how sin and cos can work together
z <- seq(0,6,.1)
x <- sin(pi*z)*z
y <- cos(pi*z)*z
 
clear3d()
cube(x,y,z*2, scale=.75)
movie3d(spin3d(axis=c(0,0,1), rpm=20), duration=2.95) 
 
 
# Now let's look at some of the prototyping for
# my 3D reasoning items.
clear3d()
vlist <- c(0,0,0)
for (i in 1:15) {
  cube(vlist[1],vlist[2],vlist[3])
  step <- sample(1:3, 1)
  vlist[step] <- vlist[step]+(-1)^rbinom(1,1,.25)
}
rgl.material(shininess=1)
bg3d("white")
clear3d(type = "lights")
 
rgl.viewpoint(theta = 90, phi = 0, fov = 0)
rgl.snapshot("2014-09-09angle1.png") 
 
 
rgl.viewpoint(theta = 0, phi = 0, fov = 0)
rgl.snapshot("2014-09-09angle2.png") 
 
 
rgl.viewpoint(theta = 0, phi = 90, fov = 0)
rgl.snapshot("2014-09-09angle3.png")
 
rgl.light() rgl.viewpoint(theta = 180, phi = 20, fov = 60) rgl.snapshot("2014-09-09cubes3d1.png") 
 
 
rgl.viewpoint(theta = -20, phi = -20, fov = 60)
rgl.snapshot("2014-09-09cubes3d2.png")
 
 
Created by Pretty R at inside-R.org

Friday, September 5, 2014

1.2 Millions Deaths by Ebola projected within Six Months?

The World Health Organization, Samaratins Purse, Doctors Without Borders, and other international medical emergency relief programs are desperately calling for additional resources in the international fight against Ebola that has already killed thousands and is likely to kills thousands more even if a full arsenal of aid was made available.
Figure 1: The lines are projected values while the points are data points.

The World Health Organization released a statement on August 28th that the epidemic could afflict more than 20,000 people before it could be brought under control. This however, assumes full international backing for an intervention to control the deadly outbreak. Failure to fully support the WHO's plan presumably would cause the disease to continue to spread in a similar manner as it already has.

At first a figure as high as 20,000 seems exaggerated especially when looking just at the number of 3,000 cases reported the same day as the announcement. However, I believe that this estimate is vastly too small and is entirely based on an effective and well funded international relief mission. Using a projection from all of the WHO reports to date I calculate that if the disease continues to spread at the rate it currently is then we will have more than 20,000 cases by October 24.

The report also states that it will likely take six to nine months in order to stop the epidemic. However if nothing changes and the epidemic continues to rage as it currently does then my projections estimate that as many as 4.7 million people will have been infected and 1.2 million will have already died.

These are extremely dire predictions and I hope that they are purely extrapolations based on what will later be seen as the scariest month of this epidemic. However, the exponential growth model fits the data very well and at least in the short term should be expected to be fairly accurate.

All of this analysis is done in R and the code can be found on Github.

From 41 CDC Ebola reports I have assembled a small database of cases by country listing the number of 'Suspected or Confirmed Cases', the number of 'Deaths' suspected to be associated with Ebola, and the number of 'Laboratory Confirmed' cases of Ebola. You can find and add to this database as a google spreadsheet here. If running the code for yourself it will import the spreadsheet data directly.

Mapping this data by country and fitting a local polynomial regression to give a fitted line for each country gives us some signs of a very disturbing trend. The country in which the current outbreak originated is Guinea and though the disease continues to claim new victims it is much less worrisome compared with Sierra Leone and Liberia where rates of suspected cases and numbers of deaths are exponentially growing.

Figure 2: The increase of deaths in Liberia is much steeper than the other two heavily afflicted countries of Guinea and Sierra Leone.

Figure 3: The increase of laboratory confirmed cases in Liberia is much less steep than the increase is deaths indicating that the poor medical infrastructure is not able to keep up with the number of diagnoses demanded.

Figure 4: The increase of deaths in Liberia is much steeper than the other two heavily afflicted countries of Guinea and Sierra Leone.
By exponential growth, we mean that whatever the current number of infected people are, we can expect them to infect some additional number of people proportion to the transmission rate. The problem with exponential growth is that while the inclusion of new victims can initially start out small the more victims there are the more are likely to be added to the victim pool each day.
Figure 5: The total number of cases is rising extremely quickly.
When we look at the total numbers of each case summed across country we arrive at the above graph.
From this graph it is clear that a direct linear model cannot fit well at all. Suspecting that the change over time might fit an exponential growth function, I take the natural log of the values mapped above.
Figure 6: A log transformation of the total number of cases creates a relatively linear relationship between time and number of cases reported.

This new transformed graph demonstrates an extremely distributing confirmation that using an exponential growth model would be an appropriate way of modelling the spread of Ebola. In order to estimate the spread of Ebola I define a simple model with a constant and a linear relationship between days since the outbreak was announced and the log of the measure we are concerned with:
$$log(Y)=\alpha+\beta_1 Day$$
And estimate the model using weights to weight the data based on the number of days into the survey so that more recent observations are considered more important. I also discard the observations for the first 21 days because we can expect the preliminary data at that time was less accurate. Using the above model gives:


           Intercept          Day 
Suspected  4.38881946  0.02245505
Deaths     4.00491144  0.02096758
Laboratory 3.86052949  0.02314866


While intercept estimates are generally considered to be less important the coefficients on Day can be directly interpreted as percent changes by day. Thus we can expect from the current data that each day we will have a little over 2% additional suspected cases, deaths, and laboratory confirmations.

In order allow for the model to be a little more flexible in my projections I include a slightly more complex model including a squared term for the days since announcement.
$$log(Y)=\alpha+\beta_1 Day+\beta_2 Day^2$$
I use this model to project suspected cases, deaths, and laboratory results for the next three weeks. The values up until today show the comparison between the expected values estimated from the model (EDeaths, ESusp, and ELab) with that from the data (Death, Susp, and Lab). We can see the model fits the data quite well with all estimates within 100 of the observed while most are much closer. Using this model we can see that the total number of deaths is expected to be around 3,500 by the 24th and 7,200 suspected cases. Things just get worse from there.

           date day   Deaths EDeaths  Susp ESusp     Lab ELab
1   2014-03-25   1       59   89       86  140        0   49
2   2014-03-26   2       60   90       86  141        1   50
3   2014-03-27   3       66   90      103  143        4   51
7   2014-03-31   7       70   94      112  149       24   56
8   2014-04-01   8       80   95      122  151       24   57
9   2014-04-02   9       83   97      127  152       35   59
14  2014-04-07  14       95  102      151  161       52   66
17  2014-04-10  17      101  106      157  167       66   71
24  2014-04-17  24      122  115      197  182      101   83
28  2014-04-21  28      129  122      203  192      109   91
30  2014-04-23  30      136  125      208  197      112   95
37  2014-04-30  37      146  137      221  218      126  112
42  2014-05-05  42      155  148      231  235      127  126
51  2014-05-14  51      157  169      233  270      129  155
60  2014-05-23  60      174  196      258  315      146  190
65  2014-05-28  65      191  213      290  344      170  214
70  2014-06-02  70      199  232      341  377      186  240
73  2014-06-05  73      222  245      425  399      238  257
78  2014-06-10  78      244  269      462  440      253  289
79  2014-06-11  79      261  274      494  449      277  296
86  2014-06-18  86      337  313      528  517      364  348
92  2014-06-24  92      338  353      599  587      441  399
100 2014-07-02 100      467  416      759  700      544  481
105 2014-07-07 105      481  462      779  784      557  540
106 2014-07-08 106      518  472      844  803      626  552
112 2014-07-14 112      539  539      888  925      664  634
114 2014-07-16 114      601  564      964  971      706  665
122 2014-07-24 122      632  677     1048 1183      745  800
126 2014-07-28 126      672  744     1201 1311      814  877
129 2014-07-31 129      728  800     1323 1417      909  941
132 2014-08-03 132      826  860     1439 1533      953 1008
133 2014-08-04 133      887  882     1603 1574     1009 1032
137 2014-08-08 137      961  974     1779 1753     1134 1132
141 2014-08-12 141     1013 1077     1848 1956     1176 1242
142 2014-08-13 142     1069 1105     1975 2011     1251 1271
144 2014-08-15 144     1145 1163     2127 2127     1310 1332
148 2014-08-19 148     1229 1290     2240 2381     1383 1461
150 2014-08-21 150     1350 1360     2473 2522     1460 1530
151 2014-08-22 151     1427 1397     2561 2596     1528 1566
157 2014-08-28 157     1552 1641     3069 3094     1752 1800
158 2014-08-29 158          1686          3188          1842
159 2014-08-30 159          1733          3284          1885
160 2014-08-31 160     1848 1782     3707 3384     1848 1930 Update 9/8
161 2014-09-01 161          1831          3488          1975
162 2014-09-02 162          1883          3595          2021
163 2014-09-03 163          1936          3705          2069
164 2014-09-04 164          1991          3820          2117
165 2014-09-05 165     2097 2047     3944 3939          2167 Updated 9/11
166 2014-09-06 166          2106          4062          2218
167 2014-09-07 167          2166          4189          2270
168 2014-09-08 168          2228          4321          2323
169 2014-09-09 169     2296 2292     4293 4457          2378 Updated 9/13
170 2014-09-10 170          2359          4599          2433
171 2014-09-11 171     2400 2427    
4784 4745          2491 Updated 9/13
172 2014-09-12 172          2498          4897          2549
173 2014-09-13 173          2572          5055          2609
174 2014-09-14 174          2647          5218          2670
175 2014-09-15 175          2725          5386          2733
176 2014-09-16 176          2806          5562          2797
177 2014-09-17 177          2890          5743          2863
178 2014-09-18 178          2976          5931          2930
179 2014-09-19 179          3065          6126          2998
180 2014-09-20 180          3157          6329          3069
181 2014-09-21 181          3253          6539          3141
182 2014-09-22 182          3351          6756          3215
183 2014-09-23 183          3453          6982          3290
184 2014-09-24 184          3559          7217          3367


Falseness of my Model
This model by definition cannot be true globally (into the distant future). This is obvious when we use the model to project out to one year. At one year the number of infected cases is estimated as 436 billion. Since the entire population of the Earth is only 8 billion or so we know that this cannot be true.

However, this kind of model can be a good approximation locally (in the near future). If it is a good approximation locally then the next WHO report is going to list around 2100 deaths and 4060 suspected cases as of today.

So, I ask the question, "is 1.2 million deaths a projection which is either local or global?" I cannot answer this, but it certainly is within the realm of feasibility since the nation of Liberia alone has over 4 million people and Guinea 10 million and Sierra Leone 6 million. The real question becomes, "do we think the ability of Liberia and other afflicted nations to control the spread of Ebola will increase, decrease, or remain the same over time?"

Figure 7: Shows the relationship over time between number of laboratory confirmed cases and suspected cases. If a country is able to apply effective diagnostic tools then this ratio should be high. In most countries we are seeing a rise in the ability to diagnose Ebola except in Liberia where there is a steep decline.

From Figure 7 we can see that Liberia is significantly behind other nations in its ability to diagnose Ebola. This and the well known lack of medical facilities suggests to me that as the crisis escalates the ability of Liberia to maintain any sense of order and with it any hope of controlling the spread of the disease is likely to degrade. If this is the case then it is quite possible that even this horrifying projection is an underestimate of the pain and carnage likely to result from this outbreak.

What to Do?
News reports and the governments they are reporting on seem to have been placing a good deal of emphasis on investing in vaccines and treatment options. However, while all of these options are good, they are long term options (6 to 12 months).  In the meantime, every resource available must be used to contain and restrict the spread of this outbreak.

It is extremely foolish to think that any nation is immune to this disease. So far in the entire history of Ebola outbreaks up until the present less than 10 thousand people have been infected. This relatively low infection count coupled with rapid mortality makes it unlikely that the disease will significantly mutate among the human population.

However, if my projections are anywhere close to accurate then the number of infected people are going to be much higher than has ever occurred previously. This will create many more habitats for which the virus can possible mutate new traits which could increase its transmission rate. These mutations could take the form of longer gestation periods which might lead to a greater time between being infectious and being detectable.

Another possible trait might be the ability to go airborne which would significantly increase its ability ability to be transmitted. Some scientists it very unlikely to become airborne because it is too heavy. This may be the case. However, as the possibility of it becoming airborne could result in a global spread of the disease resulting in unprecedented number of deaths world wide it is more than prudent to heavily invest in controlling the number of new patients infected by this disease.


In addition, even if the disease does not mutate from the state that it is in currently to a new one, it has shown itself to be extremely effective at being transmitted with a large number of health workers becoming infected and dying from the disease. These health workers should have known how to control the spread of the disease and prevent infection. Do we really expect that if the disease were to enter any other nation on Earth that the general population is going to be better prepared to protect themselves than the specialists who have already fallen victim to this disease?

Thus, it is imperative that we do everything within our power to control the spread of this terrible disease. Even if my model only has a ten percent chance of being accurate over the next six months, we would be extremely foolish to risk not responding to this outbreak with every resource within reason humanity can muster.

Thursday, September 4, 2014

Stata: Detecting deviations in input on double entry data

In this post I will present code for detecting deviations in variable values for data that has been entered twice. First I will simulate some data. Then I will detecting deviations.

clear
set obs 300

* Define a class ID
gen cID = ceil(_n/20)
* This will generate a variable that counts twenty values per classroom

* Define a student ID.
* We have two entries per student which are defined
* only within each class from 0 to 9

gen sID = mod(ceil(_n/2)-1, 10)

* Imagine we have some variables that are inconsistently
* recorded at different times

gen a = ceil(_n/3)
* a varies every other student
gen b = ceil(_n/5)
* b varies every third student
gen c = ceil(_n/7)
* c varies every fourth student

* Let's also imagine that variables a and c are strings

tostring a c, replace

* Let's get an idea of what this looks like
list in 1/6
*     | cID   sID   a   b   c |
*     |-----------------------|
*  1. |   1     0   1   1   1 |
*  2. |   1     0   1   1   1 |
*  3. |   1     1   1   1   1 |
*  4. |   1     1   2   1   1 |
*  5. |   1     2   2   1   1 |
*  6. |   1     2   2   2   1 |


list in 21/22
*     | cID   sID   a   b   c |
*     |-----------------------|
* 21. |   2     0   7   5   3 |
* 22. |   2     0   8   5   4 |



* ----------------------------------------------------------------
* ----------------------------------------------------------------* Now let's start what we need to detect deviations

* Specify the group variables

global grouping cID sID

* Specify the variables to check
global checkvariables a b c

egen uniqueID = group($grouping)
* This gives us a total of 150 'groups' which is what we expect
* since students vary.

* Let's imagine that twenty of our entries are missing

forv i=1/20 {
  drop if _n==`=ceil(runiform()*_N)'
  }

* We would like to detect divergences in the variables a b and c
* within a student ID group.

* We would also like to detect if we only have one entry for a student

* First we grab the max uniqueID

quietly: sum uniqueID

* First we will do a student by student level search
forv i=1(1)`r(max)' {
  preserve
  qui keep if uniqueID == `i'
  local divergence
  foreach v of varlist ${checkvariables} {
    quietly: tabulate `v'
    if (r(r)>1) local divergence `divergence' `v'
  }
  local sID = sID
  local cID = cID

  if ("`divergence'" != "") ///
    di "Student `sID' in class `cID' variable(s): `divergence'"
  if (_N==1) ///
    di "Student `sID' in class `cID' has only ONE entry"
  restore
}

* ----------------------------------------------------------------
* ----------------------------------------------------------------
* Output

Student 1 in class 1 variable(s): a
Student 2 in class 1 variable(s): b
Student 3 in class 1 variable(s): c
Student 4 in class 1 variable(s): a
Student 7 in class 1 variable(s): a b
Student 0 in class 2 variable(s): a c
Student 1 in class 2 has only ONE entry
Student 2 in class 2 variable(s): b
Student 3 in class 2 variable(s): a
Student 6 in class 2 variable(s): a
Student 7 in class 2 variable(s): b c
Student 9 in class 2 variable(s): a
Student 2 in class 3 variable(s): a b
Student 4 in class 3 variable(s): c
Student 5 in class 3 variable(s): a
Student 7 in class 3 has only ONE entry
Student 8 in class 3 variable(s): a
Student 9 in class 3 has only ONE entry
Student 1 in class 4 has only ONE entry
Student 2 in class 4 variable(s): b
Student 3 in class 4 has only ONE entry
Student 4 in class 4 variable(s): a
Student 7 in class 4 variable(s): a b
Student 8 in class 4 has only ONE entry
Student 0 in class 5 variable(s): a
Student 2 in class 5 has only ONE entry
Student 3 in class 5 has only ONE entry
Student 5 in class 5 variable(s): c
Student 6 in class 5 variable(s): a
Student 7 in class 5 variable(s): b
Student 9 in class 5 variable(s): a
Student 2 in class 6 variable(s): a b c
Student 5 in class 6 has only ONE entry
Student 7 in class 6 has only ONE entry
Student 8 in class 6 variable(s): a
Student 9 in class 6 variable(s): c
Student 1 in class 7 variable(s): a
Student 2 in class 7 variable(s): b
Student 4 in class 7 variable(s): a
Student 6 in class 7 variable(s): c
Student 7 in class 7 variable(s): a b
Student 0 in class 8 variable(s): a
Student 2 in class 8 has only ONE entry
Student 3 in class 8 variable(s): a c
Student 6 in class 8 variable(s): a
Student 7 in class 8 variable(s): b
Student 9 in class 8 variable(s): a
Student 0 in class 9 variable(s): c
Student 2 in class 9 variable(s): a b
Student 3 in class 9 has only ONE entry
Student 5 in class 9 variable(s): a
Student 7 in class 9 variable(s): b c
Student 8 in class 9 variable(s): a
Student 1 in class 10 variable(s): a
Student 2 in class 10 has only ONE entry
Student 4 in class 10 has only ONE entry
Student 7 in class 10 variable(s): a b
Student 0 in class 11 variable(s): a
Student 1 in class 11 variable(s): c
Student 2 in class 11 variable(s): b
Student 3 in class 11 variable(s): a
Student 6 in class 11 has only ONE entry
Student 7 in class 11 has only ONE entry
Student 8 in class 11 variable(s): c
Student 9 in class 11 variable(s): a
Student 2 in class 12 variable(s): a b
Student 5 in class 12 variable(s): a c
Student 7 in class 12 variable(s): b
Student 8 in class 12 variable(s): a
Student 1 in class 13 variable(s): a
Student 2 in class 13 variable(s): b c
Student 4 in class 13 variable(s): a
Student 5 in class 13 has only ONE entry
Student 7 in class 13 variable(s): a b
Student 9 in class 13 variable(s): c
Student 0 in class 14 variable(s): a
Student 1 in class 14 has only ONE entry
Student 2 in class 14 variable(s): b
Student 3 in class 14 variable(s): a
Student 6 in class 14 variable(s): a c
Student 7 in class 14 variable(s): b
Student 9 in class 14 variable(s): a
Student 2 in class 15 variable(s): a b
Student 3 in class 15 variable(s): c
Student 5 in class 15 has only ONE entry
Student 6 in class 15 has only ONE entry
Student 7 in class 15 variable(s): b
Student 8 in class 15 variable(s): a

Tuesday, August 5, 2014

Stata: Generate a Spatial Moving Average

Often times we may be interested in generating a spatial moving average of a characteristic X. We
may use this moving average to help control for heterogeneity in the population which may be related to the spatial distribution of observations. In order to do that we need to have a method of generating a spatial mean.



I code this manually because I do not have experience with spatial data in Stata and do not know what the built in command is (assuming there is one). If you are just looking for the spatial mean then you may favor the built in command. However, this method is flexible and easily modifiable if for instance you would like to use measures beyond the Euclidean 2D distance formula and would instead prefer the 3D formula or nD formula really. Likewise moving average statistic might easily be replaced by moving variance or any other statistic that could be generated via the egen command. Thus this exercise might be useful to examine even if redundant.

global Nobs = 1000

clear
set obs $Nobs

* Generate 2D coordinates
gen latt  = runiform()*100
gen longg = runiform()*100

* Generate the variable of interest. The variable will
* have a random component and a spatially dependent
* component.

gen X = (latt+longg)/100+rnormal()

two (scatter latt X) (scatter longg X)





* We can see that though there is a general trend to larger values as longitude or latitude increase it is hard to identify any strong pattern.

* Now let's calculate the moving average of X for each
* observation. (There is probably a command for this
* which I do not know).

global meanrange=30

gen Xave = .
gen dist = .

forv i=1/$Nobs {
  * Calculate the distance of all points from obs i
  replace dist = ((latt-latt[`i'])^2+(longg-longg[`i'])^2)^.5
  * Calculate the mean of X if distance is within the range of interest
  egen tempx = mean(X) if dist<$meanrange
  replace Xave = tempx if _n==`i'
  drop tempx
}

drop dist

two (scatter latt Xave) (scatter longg Xave)
* Now, looking at the moving average we can easily visually identify the effect of location on the expected value of X.

Wednesday, July 30, 2014

More Readable Code with Pipes in R

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:
 
1. Reduced requirements of nested parenthesizes
2. Order of functional operations now read from left to right
3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.




require('magrittr')
 
# Let's play with some strings
 
str1 = "A scratch? Your arm's off."
str2 = "I've had worse."
 
str1 %>% substr(3,9)   
#[1]Evaluates to "scratch"
 
str1 %>% strsplit('?',fixed=TRUE)
#[[1]]
#[1] "A scratch"        " Your arm's off."
 
# Pipes can be chained as well
str1 %>% paste(str2) %>% toupper()
# [1] "A SCRATCH? YOUR ARM'S OFF. I'VE HAD WORSE."
 
# Let's see how pipes might work with drawing random variables
 
# I like to define a function that allows an element by element maximization
 
vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1, max)
-5:5 %>% vmax
# [1] 0 0 0 0 0 0 1 2 3 4 5
 
# This is identical to defining the function as:
vmax <- function(x, maximum=0) apply(cbind(x,0), 1, max)
vmax(-5:5)
 
# Notice that the latter formation uses the same number of parenthsize
# and be more readable.
 
# However recently I was drawing data for a simulation in which I wanted to 
# draw Nitem values from the quantiles of the normal distribution, censor the
# values at 0 and then randomize their order.
 
Nitem  <- 100
ctmean <- 1
ctsd   <- .5
 
draws <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
         qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)
 
# While this looks ugly, let's see how worse it would have been without pipes
draws <- sample(vmax(qnorm(seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)]
                  ,ctmean,ctsd)),Nitem)
 
# Both functional sequences are ugly though I think I prefer the first which
# I can easily read as seq is passed to qnorm passed to vmax passed to sample
 
# A few things to note with the %>% operator. If you want to send the value to
# an argument which is not the first or is a named value, use the '.'
 
mydata <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>% 
          qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%
          data.frame(index = 1:Nitem , theta = .)
 
# Also not that the operator is not as slow as you might think it should be.
# Thus:
 
1 + 8 %>% sqrt
# Returns 3.828427
 
# Rather than
(1 + 8) %>% sqrt
# [1] 3
Created by Pretty R at inside-R.org