Monday, June 4, 2012

A note on Simulations in Stata

At this point I would like to make a a brief comment to anybody who reads too much into my many posts on programming in Stata.  One might think that because of these many posts that I "endorse" Stata over other Statical programs.  This is only partly true!  I believe that Stata is very good for the beginner getting into econometrics who would like a large built in database with easy commands to execute a wide variety of powerful operations.  Likewise, for the advanced user who has spent hundreds of hours learning the oddities of particular Stata commands and operations, Stata is extremely powerful.

However, for the mid-level user Stata or for an "expert" learning a new element of Stata, the way can be extremely frustrating.  In general Stata documentation is scarce on examples, especially for any command or function that is somewhat geared towards programmers.  Thus, it is easy to find examples of how to run ordinary least squares.  Yet figuring out how to write your own program and syntax can be extremely frustrating, with countless hours spent in trial and error.

When I first started writing simulations in Stata, I did not know if the complexity of simulations that I needed were even possible.  To this day I believe I might be the only person to be crazy enough to write such detailed simulations in Stata.  There is good reason for this.  The up front environment in Stata is extremely limited.  Looking at my previous examples of merging in data in my simulations on value added and Malaria it is easy to imagine a different language that allows for multiple data sets to be held in memory simultaneously for instance.  Such a capability would make everything so much easier for the simulation designer like myself. For this reason and others, I strongly recommend programming simulations in R.

R is the superior language for programming simulations in for many reasons.  These reasons include it being free, its documentation in general being superior to that of Stata's, its community being more active, and ultimately the most important thing, its environment being more flexible, especially when it comes to handling multiple data objects and programming user written functions.

I will, however, continue to post simulations written in Stata on this blog.  Mostly, because nobody else is doing this kind of work in Stata publicly so far as I know.  And no matter what I say, people will still continue to use Stata.  I have been a reluctant convert to Stata and unless Stata is able to pull a rabbit out of a hat and majorly reform its environment, I am going to continue to be a reluctant advocate of Stata for use in complex simulations.

Therefore please don't be surprised if some or all of my posts begin to be in R.

1 comment:

  1. As someone who does occasionally totally insane things that Stata wasn't designed to do in Stata, I agree with you that some functionality is missing. The multiple data sets not being held in memory is absolutely aggravating, but there are workarounds. One thing I've done is save the values I need from another dataset in global macros, or somehow develop a way of converting the values into a format that would work in a merge.

    I guess the point is - Stata has some annoying shortcomings, but there's always a way to do anything. I suppose that would be true of BASIC too, though..