Saturday, July 21, 2012

A note on Temporary Variables in Stata


* It is easy to create temporary variables in Stata that are automatically cleaned from memory as soon as the current do file is completed or program is done.

clear
set obs 10000

* For example
tempvar temp1 temp2 temp3

gen `temp1' = rnormal()
gen `temp2' = rnormal()^2
gen `temp3' = runiform()

sum
* Conviently these variables are cleaned from memory once no longer useful

* However, if we were to try to override a temporary variable with one of the same name then it will not work.

forv i = 1/10 {
  tempvar temp
  gen `temp' = rnormal()
}

sum

* However, it is impossible to target previous temporary variables that were created with the typical `temp' identifier since now it is targetting the most recently created variable.

sum __000003

* However works.

* Which is really not very useful but could potentially be.

* This may not appear to be a problem but when you start looping through commands could potentially result in a large number of temporary variables clogging the active memory things start to slow down.

clear
set obs 10000

* For example:
forv i = 0/10000 {
  tempvar temp
  gen `temp' = rnormal()
  if mod(`i',1000)==0 di "$S_TIME"
}
* For me there is about 2 seconds for every 1,000 variables created

* Easy fix:
clear
set obs 10000

forv i = 1/10000 {
  tempvar temp
  gen `temp' = rnormal()
  * Add drop
  drop `temp'
  if mod(`i',1000)==0 di "$S_TIME"
}
* When cleaning the memory this reduces the run time of all 10,000 (for me) to only 2 seconds.

* This is all fine, but they only problem is that if we are going through all of the work of drawing temporary variables just to later drop them then why not just draw normal variables?

* There is only one reason I cas see.  We do not want to risk our drawn variable names to overlap with our existing variable names.

* However, if this is not a problem then the following commands are clearly that much cleaner:

clear
set obs 10000

forv i = 1/10000 {
  gen temp = rnormal()
  drop temp
  if mod(`i',1000)==0 di "$S_TIME"
}

* In general I do not use temporary variables.

2 comments:

  1. Overall it is not clear what should we learn from your post. Everything is pretty much described in the manual for tempvar. However, here are some comments:

    Quote: "However, if we were to try to override a temporary variable with one of the same name then it will not work."

    That is applicable to all variables, not just temporary.
    sysuse auto
    generate price=7654
    will fail since price already exists.

    No problem, there is a command for that: replace

    In your loop you can reuse the temporary variables as needed. It is a very bad idea to make number of temp variables proportionate to the number of loop iterations.

    Quote: "However, it is impossible to target previous temporary variables that were created..."
    Just append the tempvar name to a list

    sysuse auto, clear
    forval i=1/10 {
    tempvar temp
    local mytemps `"`mytemps' `temp'"'
    generate `temp'=`i'
    sum `mytemps'
    }


    Stata will drop the temp vars when the scope ends. You can however drop them earlier to save memory if needed.

    S.R.

    ReplyDelete
    Replies
    1. I like what you are saying. I see the logic to tempvars.

      My personal opinion is that they introduce a level of opaqueness that I would rather do without. In general I find the notation unnecessarily complex so I shy away from them. But I can definitely see why some people use them.

      Delete