Thursday, May 23, 2013

A Prime Sieve for Stata As Well

* I feel like I have been neglecting Stata since most of my posts have been in R.

* But like speaking different languages it takes effort to switch mental states and programming styles from Stata to R.

* For really, there is a great deal of differences between programming in Stata and in R.

* Both are powerful but excel in different strengths.

* Stata for providing a unified, cutting edge data analysis tool, and R for providing a powerful and flexible tool with a vibrant and growing open source community.

* As a means of comparison I will program up a prime sieve in Stata that accomplishes the same thing as the prime sieve I programmed in my previous post in R:


* The way a prime sieve works is that it goes through a number line starting at 2 up until some pre-specified maximum.

* In each step the prime sieve removes from the available numbers the first number in the list and keeps that number as a "prime" and removes all remaining numbers in the list divisible by that number.

* Since all smaller primes cannot be divisible by a larger prime, starting from the smallest is the natural way of performing a sieve.


* First lets set out max that we will search for.
set obs 1000

* Now we will set our index
gen i=_n

* Now we generate our global that we will keep our primes in
gen prime=0
replace prime=1 if _n==1

* Sorting will place our primes at the end of the list
sort prime i

* We specify the stopping condition for the sieve that the first element of the list is a prime.
local nextprime = prime[1]

* Loop through all of the elements until our stopping condition is achieved.
while `nextprime'==0 {
  * The first element in our sort will be out prime
  qui replace prime=1 if _n==1

  * Drop anything divisible by that number
  qui drop if i/i[1]==int(i/i[1]) & prime==0

  * Sort to place primes at the end of our list
  sort prime i

  * Set the indicator nextprime to be equal to the prime value of the next value in the list.
  local nextprime = prime[1]

  count if prime==0

* Looking good.
hist i, title("Finding the primes for the first 1000 numbers is no sweat") bin(20)

hist i, title("Finding the primes for the first 100,000 is harder") bin(20)

* Unlike with my R post, I did not calculate the primes for the first 1,000,000 numbers.

* The above code took a while with the 100,000.  It probably runs much slower than R but that is to be expected.

* The above program is not the optimal way of writing such routines.  They should ideally be done using Mata.

* However, despite my best efforts I have been unable to find a sufficiently comprehensive text to learn Mata syntax from.

* The result is programs that run inefficiently.  Which usually in not a big deal because programming speed is typically the least important feature of most Stata problems.

* More typically the most important feature is Stata's ability to handle single data sets (such as panel data) well and efficiently.

* I hope you can see by the above code some of the differences between Stata and R.

* Some thing are intuitive and easy in each language while they seem convoluted or unnecessarily challenging in the other.  Pick your poison, but try not to be wed to it.  Some software is good for some things and other good for other thing.  Be ready to change and learn as the need arises.


  1. the following is more idiomatic and runs about 5 times faster for me:

    set obs 1000
    gen i = _n

    local i 2
    while (`i' < _N) {
    qui drop if (_n > `i') & (i / i[`i']) == int(i / i[`i'])
    local ++i

  2. oh, and the mata one (runs a bit faster than R for me):


    real matrix function primes(real scalar n) {
    prime = 1
    set = 2::n
    while (length(set) > 0) {
    prime = prime \ set[1]
    set = select(set, floor(set / set[1]) :!= set /set[1])


    mata : primes(1000)

    1. Awesome, thanks for the code! I have struggled to learn Mata.