Monday, June 17, 2013

Zombie Apocalypse Survival Test - R-Powered (using Concerto)

This test is the first attempt to seriously assess the ability of individuals to survive a zombie apocalypse.  This test is administered using the R powered open-source testing platform Concerto  developed at the University of Cambridge.

The test administers a number of questions which are expertly chosen in order to evaluate the likelihood of an individual surviving a zombie apocalypse.  The R code which powers the test be found on can be found on github.  The test makes extensive use of the RConcerto supplementary package that I am working.

The purpose of the package is to facilitate test creation using Concerto.  Many of the functions either create wrappers for concerto functions or build html objects such as the function mk.facebook(link) which takes a link and turns it into a facebook share button.

In order to implement the test, I run the code straight from a permalink created by dropbox which I demonstrate the use of in a previous post in the function dropbox.eval() which is also contained within the RConcerto package.

Please try out this test at:

Zombie Survival Test

Friday, June 14, 2013

Sudoku Automation Solver Challenge - R

On a recent flight I was bored waiting for the plane to land and I tried out the electronic sudoku game that they had offered.  I found the game surprisingly interesting as I realized that it is far more entertaining when you cannot use paper or pencil to augment the cognitive solution seeking process.  In addition, having your score based on your time presented an objective challenge and measure to test yourself against.

Figure 1:a sample Sudoku grid
After that I decided to design a Sudoku generator.  Which I have included:

https://github.com/EconometricsBySimulation/2013-06-14-Sudoku

It has been an entertaining exercise figuring out methods to produce matrices that conform to the specifications of a Sudoku table.  That is a table which distributes sets of 1 to 9 distributed both horizontally, vertically, and in a 3x3 grid without duplicates for all squares in a 9x9 grid.

Figure 2: Ultimate Sudoku
You can see from the R script (and the grid produced by it figure 1) that I am able to accomplish this very easily through a randomized guess and check process (with the computer finding a solution in 1 in 50-500 attempts) that is capable of drawing nearly an infinite number of grids (of course there is a finite though very large number of possible grids).

I have also made a variant that I call Ultimate Sudoku which conforms to Sudoku rules and also adds one more rule to the grid in which each position in a 3x3 set is part of a set of all other 3x3 sets in which there can be no duplicate entries from the set.  That is for instance that the first square in a 3x3 grid cannot have duplicate entries with any of the other first squares in the other grid.  An example table that I generated using R is figure 2.

I don't think it is likely that Ultimate Sudoku will take off but for those hard core Sudoku players this variate would present an option that would allow for more challenging games since players must keep track of four overlapping sets of elements rather than 3.  Under this set up effectively more spaces could be made blank.  The R script is also capable of generating these though it is a bit more challenging to find grids that conform to all four rules (usually between 1 out of 4-10 thousand attempts).

I have also defined a function to strategically remove values from the plotted grid so that players are faced with an incomplete grid.  I have been able to vary the difficulty of these grids as well by specifying the average number of missing elements (from the set of 1 through 9) for each grid value.  You can see examples of problem grids I have generated in figures 3 and 4.

Figures 3 and 4

What I realized though was what my system is still not perfect because it is possible that I can come up with grids which might not have solutions which would present a problem.  What is really needed is an algorithm which is capable of doing what Sudoku players are capable of doing, that is solve the tables.  So rather that taking the next step designing an algorithm to solve my sample tables I thought that perhaps some of my readers out there might be interested in attempting to design an algorithm to solve these potential Sudoku tables and identify which ones are not feasible.

So, that is the challenge!  It should be easy to run the code an generate and map the tables using the Git.

Tuesday, June 11, 2013

More explorations of Shiny

I have continued to explore the functionality of the Shiny package released by the Rstudio team and I have been increasingly impressed.  The code fits together very clean and easy to manipulate or add to.  If you have some knowledge of html or java shiny makes an excellent opportunity for developers of web apps with backgrounds in R.

In this post I present my recent explorations of shiny which include a simple yet effective hit counter, a new screen layout allowing for three panels, and a demonstration of how to add a simple html line break to a shiny app as well as a link.  I will copy excerpts of the code dealing with these features below, though the entire code for the app can be found at https://github.com/EconometricsBySimulation/2013-06-11-Shiny-Exploration
(You will need the shiny package and the shiny incubator package)

1. Make a hit counter
The hit counter is easily made with the following code in the server.R file:
SP <- list() # Server parameters
  # Record the number of poeple who have used the app 
  #    since initiation on the server
  SP$npers <- 0

shinyServer(function(input, output) {
  # shinyServer is Started up every time the domain is called.
  # Use <<- to assign to the global server environment.
  SP$npers <<- SP$npers+1
...
}

# ui.R
...
With the ui.R file there is a single function at the appropriate place:
# Display the total number of hits on the app.
  h5(textOutput("hits")),
...

2. Allow for three panels in the user interface.  In order to accomplish this I simply modified the pageWithSidebar function replacing 'div(class = "row-fluid", sidebarPanel, mainPanel)' with 'div(class = "row-fluid", left,  middle, right)'.  I tried to write a function that more generally took ... but could not figure out the exactly right syntax.  See where the new function is defined below:

# This is a UI page with three panels
threepage <- function(headerPanel,left,middle,right) {
  bootstrapPage(div(class = "container-fluid", div(class = "row-fluid",
  headerPanel), div(class = "row-fluid", left,  middle, right)))}

3. Insert an HTML break and an external link.  Both of these things are extremely easy.  For many HTML tags there are already programmed up shiny functions which handle them.  However, there are quite a few so not all of them have a function associated with it.  A typical HTML function is br which returns:
> br()
[1] "<br/>"
attr(,"html")
[1] TRUE
if called.  In terms of HTML this command will insert a line break.  If you are interested in including HTML in which there is no function already programmed you can use the function HTML
>HTML"<hr/>"
[1] "<hr>"
attr(,"html")
[1] TRUE
In order to insert a link into shiny you could either use the HTML"<a href=>..." type html sequence or one could use the built in functions.
a(Text, href="link.com")

Personally I prefer the built in function because they are very appealing and concise for an R programmer.

Take a look at the app at:
App Link

Saturday, June 8, 2013

Bulk search for domain names using R

# There are several domain name servers that allow 
# for bulk searching of domain names.
# http://www.godaddy.com/bulk-domain-search.aspx
# http://www.namestation.com/bulk-domain-search
# However, they do not provide any wildcard support 
# and instead expect users to supply their own list.
# Which is not a problem if you are an R coder.
 
# This single command will copy to the clipboard a list 
# of mysite1 mysite2 mysite3....mysite500. You can then 
# paste it directly into the godaddy window.
writeClipboard(paste0("mysite", 1:500))

# Using the built in vector letters will produce:
# mysite-a mysite-b mysite-c....mysite-z
writeClipboard(paste0("mysite-", letters))
# I have also written a bulk search command but 
# I am not sure that it is really any better than 
# the above command since it is really not as powerful.
 
# However, it does demonstrate how one might 
# generate a wildcard search algorithm.
bulk.search <- function(intext) {
  # Add spaces before and after
  intext <- paste0(" ", intext, " ")    
  card <- c("|", "?", "#")              # Declare the wildcards
    # Declare the wildcard values
  cardvalues <- list(c(letters,0:9), letters, 0:9)   
    # A vector for saving wildcard detections
  ncards <- NULL                      
    
  for (i in 1:length(card)) ncards[i] <-            
    nchar(gsub(sprintf("[^%s]", card[i]),"",intext))
  stopifnot(sum(ncards)<2) 
 
  # If a wildcard is selected expand the intext
  if (sum(ncards)==1) {              
    intext <- unlist(strsplit(intext, 
                     card[ncards==1], fixed = TRUE))
    intext <- paste0(intext[1], 
             cardvalues[[(1:3)[ncards==1]]],intext[2])
  }
 
  intext <- gsub(" ", "", intext) # Remove the extra spaces
  writeClipboard(intext)
  cat(paste("Coppied to clipboard:\n",
        paste(intext, "\n", collapse=" ")))
  intext
}
bulk.search("mysite?")
bulk.search("mysite|")
bulk.search("mysite#")
Created by Pretty R at inside-R.org

Friday, June 7, 2013

A Shiny App Goes Viral

I am not sure how many of you have seen this Business Insider article.  It is basically about a shiny app created by Joshua Katz as NC State.  It is really fun playing with shiny app.

With nearly a million facebook likes this web app built using R with the shiny is a clear demonstration of how effective and professional shiny can be.

Though surprisingly even though the app is hosted by amazon web server it seems a bit laggy.  Given the immense traffic the article is generating I hope Joshua has some plan to pay amazon.

Though I cannot be sure because I do not have access to the code, the amazing thing is that the whole server side code developed by Joshua was probably no more than 100-200 lines of R script.
The Shiny App


The Article

Thursday, June 6, 2013

Bootstrapping Time Series Data

* Every so often someone asks about how to bootstrapping time series data.

* The answer is not either easy or clear (at least as far as I know though there are many people mroe expert on the subject than I).

* The problem is that time series data is by its nature linearly dependent with itself (auto-correlated).

* Let's see how this looks:

clear

set obs 300

gen t = _n
gen x = 0
gen y = 0
gen u = rnormal()*7.5

local N=_N

* You must fill in the simulated data sequentially
qui forv i=2(1)`N' {
 replace x = rnormal() + .75*x[`i'-1] if t==`i'
 replace y = x*1.5 + .75*y[`i'-1] + u if t==`i'
}

* Let's see what it looks like
* I will first standardize the two.
egen ystd = std(y)
egen xstd = std(x)

two (line ystd t) (line xstd t), name(original, replace) legend(off) ///
          title("We can see that the two series are moving together")



* In order to estimate the relationship between y properly we must account for the linear depenent structure of the data.
reg y x

* Is obviously very biased.
gen yl = y[t-1]

* Now we can see that our estimate on x is much closer to our true (1.5).
reg y yl x

* It is worth noting that at this point it is completely unnecccesary to included any lagged values of x.
* Though this is only because we are assuming contemporaneous exogeneity between y and x.

* Now let's make the obvious mistake and try to boostrap.
bs: reg y yl x

* Wow, our estimates look great!  .... Not quite.

* The problem is we are no longer trully bootstrapping our data because lagged values of y are not lagged values of y in our new distributions.

* Drawing on a previous post:
* http://www.econometricsbysimulation.com/2012/07/write-your-own-bootstrap-command.html

cap program drop bootme
program bootme
    $nq preserve
    syntax anything [, Reps(integer 100) c1(string) c2(string) c3(string) Keep(string) Noisily]
    if "`noisily'" == "" local nq quietly
    if "`noisily'" != "" local nq
    `nq' gen bootmatcher = _n-1
    `nq' gl boot_N=_N
     tempfile drawfile
    `nq' save `drawfile' , replace
  `nq' clear
  `nq' set obs `reps'
  foreach v in `keep' {
   local vtemp=strtoname("v`v'")
  `nq' gen `vtemp'=.
  }
  tempfile returnlist
  `nq' save `returnlist' , replace
  di as text _newline "Bootstrap replications ({result:`reps'})"
  di "----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5"
  forv i=1/`reps' {
  clear
  `nq' set obs $boot_N
  `nq' gen bootmatcher = int(runiform()*$boot_N)
  `nq' sort bootmatcher
  `nq' merge m:1 bootmatcher using `drawfile', keep(3)
  * Making a slight modification to the code her
  gen tempvarzzz = rnormal()
  sort tempvarzzz
  `nq' `c1'
  `nq' `c2'
  `nq' `c3'
  `nq' `c4'
  `nq' `c5'
  `nq' `c6'
  drop tempvarzzz

  `nq' use `returnlist', clear
  foreach v in `keep' {
    local vtemp=strtoname("v`v'")
    `nq' replace `vtemp'=`v' if _n==`i'
  }
  `nq' save `returnlist', replace
  if `i'/50 == int(`i'/50) di ".  " `i'
  else di _continue "."
  }
  sum
  `nq' restore
  gen tempvarzzz = rnormal()
  sort tempvarzzz
  `c1'
  `c2'
  `c3'
  `c4'
  `c5'
  `c6'
  drop tempvarzzz
end

* We can see that we get basically the same results.
bootme 1, r(200) k(_b[x]) c1(reg y yl x)

* However, now let's add a line of code that generates y lagged as a function of the boostrapped data.
bootme 1, r(200) k(_b[x]) c1(gen ylboot = y[t-1]) c2(reg y ylboot x)

* In this formulation, unsurprisingly ylboot has almost no explanatory power.
gen tboot = _n

two (line ystd tboot) (line xstd tboot), name(boot1, replace) ///
       title("Boostrapped data is not autocorrelated")  legend(off)

graph combine boot1 original, rows(2)



* We can see that clearly bootstrapping our results in this way is not going to work.

* One of my classmates is using a method that I think he got from the literature which is a little more complicated.

* First he estimates his model.

reg y yl x
* Then predicts the fitted and residuals

predict yhat
predict uhat, resid

* Then he bootstraps those values and reconstructs the y values from the fitted values by randomly drawing the errors.

* Let's see if we can accomplish this with the previous command.

* First we need to save the fitted values to be grabbed later.
global N=_N
forv i=1(1)$N {
  gl yhat`i' = yhat[`i']
  gl x`i' = x[`i']
}

* Write a small program to load yhat values back into the memory
cap program drop datasetup
program datasetup
  cap gen yhat0 = .
  cap gen ynew = .
  cap gen ynewlag = .
  cap gen xnew = .

  forv i=1(1)$N {
    qui replace yhat0 = ${yhat`i'} if _n==`i'
qui replace xnew = ${x`i'}  if _n==`i'
if `i'>1 qui replace ynew = yhat0+uhat
  }
  replace ynewlag=ynew[_n-1]
end

datasetup
corr y ynew x xnew
* Initially these values will be identical

bootme 1, r(200) k(_b[x]) c1(datasetup) c2(reg ynew ynewlag xnew)
* As with my classmate's results, the new confidence interval for the x coefficient does not even include the original estimate ~1.2.

* Looking at the data generated in this method
egen ynewstd = std(ynew)
egen xnewstd = std(xnew)

replace t=_n
two (line ynewstd t) (line xnewstd t), name(original, replace) legend(off) ///
          title("The new time series")



* Clearly this method is not accomplishing what we would like it to accomplish from a perspective of replicating the estimated coefficients.

* However, interestingly (though not neccessarily believably) the standard errors from the bootstrap are close to the size of the standard errors from the OLS regression above reg y y1 x

* Hope this is helpful.  I will send my friend this link to see if he has anything to add.

Wednesday, June 5, 2013

Running R Scripts Directly From Dropbox

I have written a little function that allows users to run R scripts out of Dropbox directly from any location.  It was aided by this post on biobucket.  The reason I am particularly interested in this feature is because I am often using a server to run my R code.  This otherwise would require me to upload my R script directly to the server every time I make a change.  However, I can instantly test my code on the server by having it run the code remotely since dropbox is very good at detecting changes and updating its database instantly.

# Evaluate code directly from a dropbox file
dropbox.eval <- function(x, noeval=F) {
  require(RCurl)
  # Load the file into memory as a text file with getURL
  intext <- getURL(paste0("https://dl.dropboxusercontent.com/",x), 
                        ssl.verifypeer = FALSE)
  # For some reason \r seem to be frequently inserted into 
  #   the script files that I save.  They present a problem
  #   so I remove them using gsub.
  intext <- gsub("\r","", intext)
  # Evaluate the input file.
  if (!noeval) eval(parse(text = intext), envir= .GlobalEnv)
  # Finally return the dropbox script as text.
  return(intext)
}
 
dropbox.eval("sh/1fjpw58gko634ye/C74hTEkknP/Demo.R")
Highlighted by Pretty R at inside-R.org

I also found that there is a R package built specifically for interfacing with dropbox which looks pretty great at https://github.com/karthikram/rDrop.  I have not yet implemented it.

This command can also be found at
https://github.com/EconometricsBySimulation/RConcerto/blob/master/Package.R

I am starting work on a package (RConcerto) meant to provide tools for helping in the production of Concerto online tests.  It also includes tools for generating HTML objects.  More on this in the future.