Thursday, May 1, 2014

How to Code Something ‘New’ in R

Programming New Things
I currently have been programming in R for more than half a decade and can now fondly look back on the days when I went on a spring break with Venables' little blue book (now out of print).  Back then I was entertained and wowed by R’s object oriented environment and its intuitive language, the way it seemed you could do so much with short precise commands.  I remember being particularly impressed with how R could seamlessly manipulate matrices or vectors and especially how easy it was to write new functions.

But all was not rainbows and sunshine.  I was less impressed by how challenging it was to browse data (for example head(mydata)), estimate common statistical models due to overly complex commands (for example summary(glm(y ~ x1 + x2 + x3 - 1, family = binomial(link = "probit"), data = mydata)), modify elements within datasets.  However, over time I grew more comfortable with R and Google seems to have also learned that R is a word.

In turn, on my blog, over time I began publishing more extensively in R and now have over 100 posts in the language.  Each of these posts is unique and the majority required that I learned new techniques in order to accomplish the tasks I had set out for myself.  As a result, I have done an extensive and prolonged study of how to teach oneself new things in R.  In this post I will describe my process for learning new R techniques.

0. Think through the task
Imagine in your mind each step you want your code to accomplish.  If you can, write out a diagram or step by step process that you think will be sufficient to accomplish the task on hand.  Having a well thought out plan could save you many an hour of frustration.  Work from the diagram.  If you find that you are getting frustrated or sidetracked during your coding go back to your diagram and investigate alternative paths to accomplish your goal.

1. Consult an R guru
If you are so fortunate as to have an R guru already in your life, do not neglect to use this incredible resource.  It is not an act of humility nor does it demonstrate your ignorance to ask someone else for help with a problem in R as R is capable of being used for so many things it seems nearly impossible to me for anybody to have a monopoly on knowledge of R.  That said, do not use up your good graces with your R guru too quickly.  R gurus tend to be extremely well paid and therefore have a high opportunity cost of their time.

2. Figure out what your question is
Often times we have a question but have difficulty finding the right words to express it.  Usually the simplest questions are those most difficult to ask.  If you have an R guru available, he or she can probably tell you what your question is.  However, absent of such a guru, do your best to figure out what it is you want to ask.  Search online for words which are similar to how you would describe what you wish to accomplish.  If you want to join two datasets together by a common variable look for “join”… no, “group”… no, “merge”… Bingo!  In this amazing age of information figuring how to express your question is 90% of solving it.

3. Search Google
Yup, almost all of my questions are directly answered by sticking them into a google search.  If you are asking a question, you probably are not the first one.

4. Search R’s Interactive Documentation
If you know a command in R which is close to the command that you want, then it is often fruitful to search for that command with help(command) or ?command then scroll to the bottom and search through the list of related commands.

5. Search and Possibly Post a Question on
The only reason I did not list Searching Stackoverflow earlier was because if you ask the question sufficiently clearly searching Google will probably bring you to a stack overflow page which answers your question.  However, if your other attempts to find a solution have failed you should strongly consider posting your question on Stackoverflow. 

Stackoverflow is a magical place in which the community of R users is so responsive that the average time I have experiences between posting a question and getting an answer is less than 10 minutes.  Not only are StackOverflow users extremely responsive they are also generally very nice and helpful.  I believe this is because of the reputation point system on the site rewards users for providing answers to your questions.

That said, take care that your question is not already answered on the site when at all possible.    Also, be sure to be as clear as possible when asking questions.  Giving clear examples of what you would like to accomplish is always encouraged.  Please read over the Asking Guidelines before posting.

There is an additional reason to be an active member on Stackoverflow.  Though I have no data to back up this assertion, I believe having a good reputation on the site is a good indicator of programming ability and would make a good résumé item (though I do not have a particularly fantastic reputation as I have only recently become active).

6. Contact Package Specific Help
There are several packages in R such as shiny or ggplot2 in which there is an active community of package specific users groups.  These users can be extremely helpful and timely at responding to questions.

7. Submit to an R Email List
This would definitely rank on the lowest option as I have had mixed success with the R email lists.  Some questions which seem worthy are readily answered while others seem to be ruthlessly ridiculed.  Unfortunately I cannot tell the difference ex-ante how any question will be received as it is entirely within the discretion of whoever seems inclined to respond to the request.  In addition, become a subscribing member seems to result in my mail box quickly being overwhelmed with emails from the list.  That being said, some very generous folks on the R email list have been so good as to answer some very unusual questions I have posed in the past.
Flattr this

No comments:

Post a Comment