## Thursday, June 6, 2013

### Bootstrapping Time Series Data * Every so often someone asks about how to bootstrapping time series data.

* The answer is not either easy or clear (at least as far as I know though there are many people mroe expert on the subject than I).

* The problem is that time series data is by its nature linearly dependent with itself (auto-correlated).

* Let's see how this looks:

clear

set obs 300

gen t = _n
gen x = 0
gen y = 0
gen u = rnormal()*7.5

local N=_N

* You must fill in the simulated data sequentially
qui forv i=2(1)`N' {
replace x = rnormal() + .75*x[`i'-1] if t==`i'
replace y = x*1.5 + .75*y[`i'-1] + u if t==`i'
}

* Let's see what it looks like
* I will first standardize the two.
egen ystd = std(y)
egen xstd = std(x)

two (line ystd t) (line xstd t), name(original, replace) legend(off) ///
title("We can see that the two series are moving together")

* In order to estimate the relationship between y properly we must account for the linear depenent structure of the data.
reg y x

* Is obviously very biased.
gen yl = y[t-1]

* Now we can see that our estimate on x is much closer to our true (1.5).
reg y yl x

* It is worth noting that at this point it is completely unnecccesary to included any lagged values of x.
* Though this is only because we are assuming contemporaneous exogeneity between y and x.

* Now let's make the obvious mistake and try to boostrap.
bs: reg y yl x

* Wow, our estimates look great!  .... Not quite.

* The problem is we are no longer trully bootstrapping our data because lagged values of y are not lagged values of y in our new distributions.

* Drawing on a previous post:
* http://www.econometricsbysimulation.com/2012/07/write-your-own-bootstrap-command.html

cap program drop bootme
program bootme
\$nq preserve
syntax anything [, Reps(integer 100) c1(string) c2(string) c3(string) Keep(string) Noisily]
if "`noisily'" == "" local nq quietly
if "`noisily'" != "" local nq
`nq' gen bootmatcher = _n-1
`nq' gl boot_N=_N
tempfile drawfile
`nq' save `drawfile' , replace
`nq' clear
`nq' set obs `reps'
foreach v in `keep' {
local vtemp=strtoname("v`v'")
`nq' gen `vtemp'=.
}
tempfile returnlist
`nq' save `returnlist' , replace
di as text _newline "Bootstrap replications ({result:`reps'})"
di "----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5"
forv i=1/`reps' {
clear
`nq' set obs \$boot_N
`nq' gen bootmatcher = int(runiform()*\$boot_N)
`nq' sort bootmatcher
`nq' merge m:1 bootmatcher using `drawfile', keep(3)
* Making a slight modification to the code her
gen tempvarzzz = rnormal()
sort tempvarzzz
`nq' `c1'
`nq' `c2'
`nq' `c3'
`nq' `c4'
`nq' `c5'
`nq' `c6'
drop tempvarzzz

`nq' use `returnlist', clear
foreach v in `keep' {
local vtemp=strtoname("v`v'")
`nq' replace `vtemp'=`v' if _n==`i'
}
`nq' save `returnlist', replace
if `i'/50 == int(`i'/50) di ".  " `i'
else di _continue "."
}
sum
`nq' restore
gen tempvarzzz = rnormal()
sort tempvarzzz
`c1'
`c2'
`c3'
`c4'
`c5'
`c6'
drop tempvarzzz
end

* We can see that we get basically the same results.
bootme 1, r(200) k(_b[x]) c1(reg y yl x)

* However, now let's add a line of code that generates y lagged as a function of the boostrapped data.
bootme 1, r(200) k(_b[x]) c1(gen ylboot = y[t-1]) c2(reg y ylboot x)

* In this formulation, unsurprisingly ylboot has almost no explanatory power.
gen tboot = _n

two (line ystd tboot) (line xstd tboot), name(boot1, replace) ///
title("Boostrapped data is not autocorrelated")  legend(off)

graph combine boot1 original, rows(2)

* We can see that clearly bootstrapping our results in this way is not going to work.

* One of my classmates is using a method that I think he got from the literature which is a little more complicated.

* First he estimates his model.

reg y yl x
* Then predicts the fitted and residuals

predict yhat
predict uhat, resid

* Then he bootstraps those values and reconstructs the y values from the fitted values by randomly drawing the errors.

* Let's see if we can accomplish this with the previous command.

* First we need to save the fitted values to be grabbed later.
global N=_N
forv i=1(1)\$N {
gl yhat`i' = yhat[`i']
gl x`i' = x[`i']
}

* Write a small program to load yhat values back into the memory
cap program drop datasetup
program datasetup
cap gen yhat0 = .
cap gen ynew = .
cap gen ynewlag = .
cap gen xnew = .

forv i=1(1)\$N {
qui replace yhat0 = \${yhat`i'} if _n==`i'
qui replace xnew = \${x`i'}  if _n==`i'
if `i'>1 qui replace ynew = yhat0+uhat
}
replace ynewlag=ynew[_n-1]
end

datasetup
corr y ynew x xnew
* Initially these values will be identical

bootme 1, r(200) k(_b[x]) c1(datasetup) c2(reg ynew ynewlag xnew)
* As with my classmate's results, the new confidence interval for the x coefficient does not even include the original estimate ~1.2.

* Looking at the data generated in this method
egen ynewstd = std(ynew)
egen xnewstd = std(xnew)

replace t=_n
two (line ynewstd t) (line xnewstd t), name(original, replace) legend(off) ///
title("The new time series")

* Clearly this method is not accomplishing what we would like it to accomplish from a perspective of replicating the estimated coefficients.

* However, interestingly (though not neccessarily believably) the standard errors from the bootstrap are close to the size of the standard errors from the OLS regression above reg y y1 x

* Hope this is helpful.  I will send my friend this link to see if he has anything to add.

#### 1 comment:

1. Francis - check out the block bootstrap and its extensions.