Monday, July 9, 2012

Write your own bootstrap command.

* A similar function is also programmed in R.  See:

* The bootstrap resampling technique is one of the coolest techniques that have developed in econometrics/statistics in the last 50 years (that and maximum likelihood).  Bootstrap releases the user from strong assumptions about the underlying distribution of errors.

* As far as I understand the bootstrapping assumptions follow, so long as the data you draw are reasonable draws from the underlying distribution then resampling from that sample will give you a reasonable estimate of the populations underlying parameters. Note, this method is obviously related to Baysian methods.

* B Efron formalized the technique and termonology of Bootstrap in a paper in 1979.

* First let us define a program that will do our bootstrap
cap program drop bootme
program bootme

* First we will set up the data that we will use as a resource to draw from

  * Preserve the current data
  $nq preserve
  syntax anything [, Reps(integer 100) Command(string) Keep(string) Noisily]

  * Tell stata to display what it is doing or the default is not to
  if "`noisily'" == "" local nq quietly
  if "`noisily'" != "" local nq

  * Generate a variable that counts from 0 to _N-1
  `nq' gen bootmatcher = _n-1

  * This is the number of observations to draw from.
  `nq' gl boot_N=_N

  * We will save the data file to draw from first
   tempfile drawfile
  `nq' save `drawfile' , replace

* Draw file set up

* Clear the memory
`nq' clear

* Set the number of repetitions to that specified by the user
`nq' set obs `reps'

* Keep tells the bootstrap rountine what values from the estiamtion to keep
foreach v in `keep' {
 * Creates emptly variables to hold the coefficients of interest
 local vtemp=strtoname("v`v'")
`nq' gen `vtemp'=.

* Create a data file that will list the results of the bootstrap
tempfile returnlist
`nq' save `returnlist' , replace

* Display the bootstraps progress
di as text _newline "Bootstrap replications ({result:`reps'})"
di "----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5"

* Loop through the number of repetitions in the bootstrap
forv i=1/`reps' {
* First clear the data
* Set the number of observations equal to the number to draw
`nq' set obs $boot_N
* Generate a variable that represents random draws.
`nq' gen bootmatcher = int(runiform()*$boot_N)
* Sort the draws from 1 to _N to assist in the merge command
`nq' sort bootmatcher
* Merge in the drawfile
`nq' merge m:1 bootmatcher using `drawfile', keep(3)

* Now run the user specified command
`nq' `command'

* Now load up the return list file
`nq' use `returnlist', clear

* For each value of the list of keep values loop through
foreach v in `keep' {
  * We need to make sure the variables are readable as names
  local vtemp=strtoname("v`v'")
  * Replace each observation with the estimated value
  `nq' replace `vtemp'=`v' if _n==`i'
`nq' save `returnlist', replace

* If the observation is divisible by 50 insert a space in the progress bar.
if `i'/50 == int(`i'/50) di ".  " `i'
* Otherwise keep adding .s
else di _continue "."
* This summarizes the results
* Restores the data to the previous state
`nq' restore
* Runs the command one last time for display
* End of the Bootme command

set obs 1000

gen x = rnormal()
gen u = rnormal()*10
gen y = x + u

* This is Stata's default bootstrap command
bs _b[x] _se[x] e(F) e(ll), reps(200): reg y x

* Our bootstrap seems to work very similarly though it seems to me that our standard errors seem to be a little wider in general.  I am not sure why.
bootme 1, r(200) k(_b[x] _se[x] e(F) e(ll)) c(reg y x)


  1. Thank you so much, I have been struggling with this myself.

  2. What do you think about sampling from the posterior distribution instead using Markov chain Monte Carlo? Bootstrap might get unwieldy with large datasets.

    1. I don't believe we can resample from the "posterior" distribution under the standard statistical estimators. Though I am no Baysian expert.

  3. Thank you for your resourceful code. I am having trouble interpreting your code, would you recommend me some references for your usage of preserve; you used preserve as "$nq preserve", which I have never seen before. Also, I think `nq' used in the code is related to the $nq preserve. Where could I find study materials on that? I have searched statalist, stata manual with macro, save, preserve keywords. Anyway, thank you so much for your helpful posts.