Econometrics By Simulation: Write your own bootstrap command.

Monday, July 9, 2012

Write your own bootstrap command.

* A similar function is also programmed in R. See: http://www.econometricsbysimulation.com/2012/09/program-your-own-bootstrap-command-r.html

* The bootstrap resampling technique is one of the coolest techniques that have developed in econometrics/statistics in the last 50 years (that and maximum likelihood). Bootstrap releases the user from strong assumptions about the underlying distribution of errors.

* As far as I understand the bootstrapping assumptions follow, so long as the data you draw are reasonable draws from the underlying distribution then resampling from that sample will give you a reasonable estimate of the populations underlying parameters. Note, this method is obviously related to Baysian methods.

* B Efron formalized the technique and termonology of Bootstrap in a paper in 1979.

* First let us define a program that will do our bootstrap
cap program drop bootme
program bootme

********
* First we will set up the data that we will use as a resource to draw from

* Preserve the current data
$nq preserve
syntax anything [, Reps(integer 100) Command(string) Keep(string) Noisily]

* Tell stata to display what it is doing or the default is not to
if "`noisily'" == "" local nq quietly
if "`noisily'" != "" local nq

* Generate a variable that counts from 0 to _N-1
`nq' gen bootmatcher = _n-1

* This is the number of observations to draw from.
`nq' gl boot_N=_N

* We will save the data file to draw from first
tempfile drawfile
`nq' save `drawfile' , replace

* Draw file set up
********

* Clear the memory
`nq' clear

* Set the number of repetitions to that specified by the user
`nq' set obs `reps'

* Keep tells the bootstrap rountine what values from the estiamtion to keep
foreach v in `keep' {
* Creates emptly variables to hold the coefficients of interest
local vtemp=strtoname("v`v'")
`nq' gen `vtemp'=.
}

* Create a data file that will list the results of the bootstrap
tempfile returnlist
`nq' save `returnlist' , replace

* Display the bootstraps progress
di as text _newline "Bootstrap replications ({result:`reps'})"
di "----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5"

* Loop through the number of repetitions in the bootstrap
forv i=1/`reps' {
* First clear the data
clear
* Set the number of observations equal to the number to draw
`nq' set obs $boot_N
* Generate a variable that represents random draws.
`nq' gen bootmatcher = int(runiform()*$boot_N)
* Sort the draws from 1 to _N to assist in the merge command
`nq' sort bootmatcher
* Merge in the drawfile
`nq' merge m:1 bootmatcher using `drawfile', keep(3)

* Now run the user specified command
`nq' `command'

* Now load up the return list file
`nq' use `returnlist', clear

* For each value of the list of keep values loop through
foreach v in `keep' {
* We need to make sure the variables are readable as names
local vtemp=strtoname("v`v'")
* Replace each observation with the estimated value
`nq' replace `vtemp'=`v' if _n==`i'
}
`nq' save `returnlist', replace

* If the observation is divisible by 50 insert a space in the progress bar.
if `i'/50 == int(`i'/50) di ". " `i'
* Otherwise keep adding .s
else di _continue "."
}
* This summarizes the results
sum
* Restores the data to the previous state
`nq' restore
* Runs the command one last time for display
`command'
end
* End of the Bootme command

clear
set obs 1000

gen x = rnormal()
gen u = rnormal()*10
gen y = x + u

* This is Stata's default bootstrap command
bs _b[x] _se[x] e(F) e(ll), reps(200): reg y x

* Our bootstrap seems to work very similarly though it seems to me that our standard errors seem to be a little wider in general. I am not sure why.
bootme 1, r(200) k(_b[x] _se[x] e(F) e(ll)) c(reg y x)

5 comments:

AnonymousJuly 9, 2012 at 11:10 AM
Thank you so much, I have been struggling with this myself.
ReplyDelete
Replies
AnonymousJuly 10, 2012 at 4:58 PM
What do you think about sampling from the posterior distribution instead using Markov chain Monte Carlo? Bootstrap might get unwieldy with large datasets.
ReplyDelete
Replies
AnonymousJuly 11, 2015 at 9:47 PM
Thank you for your resourceful code. I am having trouble interpreting your code, would you recommend me some references for your usage of preserve; you used preserve as "$nq preserve", which I have never seen before. Also, I think `nq' used in the code is related to the $nq preserve. Where could I find study materials on that? I have searched statalist, stata manual with macro, save, preserve keywords. Anyway, thank you so much for your helpful posts.
ReplyDelete
Replies

Add comment