Saturday, September 8, 2012

Mata speed gains over Stata

* The inclusion of Mata as an available alternative programming language for Stata users was a great move by Stata.

* Mata in general runs much quicker than programming on the surface level in Stata.

* In Stata each loop that runs is compiled (interpretted into machine code) as it runs creating a lot of work for the machine.

* In Mata on the other hand, the entire loop is compiled prior to running.

* Let's see how this works.

* Let's say we want to add up the square of the numbers 1 through 100000

* Method 1: Surface loop

timer clear 1
timer on 1
local x2 = 0

forv i = 1/1000000 {
  local x2 = `x2'+`i'^2

di `x2'

timer off 1
timer list 1

* On my laptop, this takes about 13.5 seconds

* Method 2: Mata loop
timer clear 1
timer on 1
  // This command can be read as start i at 1,
  // keep looping so long as i is less than 1000000,
  // the third argument looks a little fishy but it is syntax
  // that has been around for a while (at least since C).
  // It would be identical to writing i=i+1, in other words, add 1 to i.
  // Following the for loop we can immediately place a since line command.
  for (i = 1; i <= 1000000; i++) x2=x2+i^2
  // If there is nothing done with the value x2 then mata displays this value.
  // R handles this identically

timer off 1
timer list 1

* In contrast, my computer completed the loop using mata in .27 seconds, many magnitudes of speed faster.

* However this does not mean you need to learn to use mata (since it has its own limitations and syntax) in order to speed up your commands.

* Method 3: Use Stata's data structure to accomplish vector tasks
timer clear 1
timer on 1

set obs 1000000
gen x2 = _n^2

* The sum command will calculate the mean of x2 which is the same as the sum of x2 divided by it's number of observations.
sum x2
* We can reverse that operation easily.
di r(N)*r(mean)

timer off 1
timer list 1
* Using a little knowledge of how Stata stores post command information this method does the same trick in .2 seconds

* Method 4: The speed gains in 3 was as a result of using the vector structure of data columns.  Mata can do very similar things even easier.

timer clear 1
timer on 1
// This command looks a little fishy, but it is easy to understand.
// Order of operations must be taken into account.
// First the 10^6 is evaluated which equals 1000000
// Then the vector 1..10^6 is made which looks like 1 2 3 ... 1000000
// The .. tells mata to make a count vector.
// If I had written :: then mata would have made a column vector instead.
// Once the vector is made then the command :^2 tells stata to do a piece wise squaring of each term in the vector.
// Finally the sum command adds all of the elements of the vector together to generate the result we were looking for.
mata: sum((1..10^6):^2)
timer off 1
timer list 1
* The result is that this command only took .04 seconds to run through efficient coding in Mata.

# As a matter of comparison, this command
# took .04 seconds in R

# And the loop:
system.time(for(i in 1:10^6) x=x+i^2)
# 1.3 seconds

# Thus Mata in this example is significantly faster than Stata and about the same speed as R.

1 comment:

  1. Testing it on my machine, using a scalar instead of a local in method 1 seems faster, though still significantly slower than the other methods.

    In method 3, -summarize- with option -meanonly- should be slightly faster. But I think you can skip -summarize-, which either way is going to do more calculations than you need:

    set obs 1000000
    gen x2 = sum(_n^2)
    di x2[_N]