## Thursday, June 6, 2013

### Bootstrapping Time Series Data

* Every so often someone asks about how to bootstrapping time series data.

* The answer is not either easy or clear (at least as far as I know though there are many people mroe expert on the subject than I).

* The problem is that time series data is by its nature linearly dependent with itself (auto-correlated).

* Let's see how this looks:

clear

set obs 300

gen t = _n
gen x = 0
gen y = 0
gen u = rnormal()*7.5

local N=_N

* You must fill in the simulated data sequentially
qui forv i=2(1)N' {
replace x = rnormal() + .75*x[i'-1] if t==i'
replace y = x*1.5 + .75*y[i'-1] + u if t==i'
}

* Let's see what it looks like
* I will first standardize the two.
egen ystd = std(y)
egen xstd = std(x)

two (line ystd t) (line xstd t), name(original, replace) legend(off) ///
title("We can see that the two series are moving together")

* In order to estimate the relationship between y properly we must account for the linear depenent structure of the data.
reg y x

* Is obviously very biased.
gen yl = y[t-1]

* Now we can see that our estimate on x is much closer to our true (1.5).
reg y yl x

* It is worth noting that at this point it is completely unnecccesary to included any lagged values of x.
* Though this is only because we are assuming contemporaneous exogeneity between y and x.

* Now let's make the obvious mistake and try to boostrap.
bs: reg y yl x

* Wow, our estimates look great!  .... Not quite.

* The problem is we are no longer trully bootstrapping our data because lagged values of y are not lagged values of y in our new distributions.

* Drawing on a previous post:
* http://www.econometricsbysimulation.com/2012/07/write-your-own-bootstrap-command.html

cap program drop bootme
program bootme
$nq preserve syntax anything [, Reps(integer 100) c1(string) c2(string) c3(string) Keep(string) Noisily] if "noisily'" == "" local nq quietly if "noisily'" != "" local nq nq' gen bootmatcher = _n-1 nq' gl boot_N=_N tempfile drawfile nq' save drawfile' , replace nq' clear nq' set obs reps' foreach v in keep' { local vtemp=strtoname("vv'") nq' gen vtemp'=. } tempfile returnlist nq' save returnlist' , replace di as text _newline "Bootstrap replications ({result:reps'})" di "----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5" forv i=1/reps' { clear nq' set obs$boot_N
nq' gen bootmatcher = int(runiform()*$boot_N) nq' sort bootmatcher nq' merge m:1 bootmatcher using drawfile', keep(3) * Making a slight modification to the code her gen tempvarzzz = rnormal() sort tempvarzzz nq' c1' nq' c2' nq' c3' nq' c4' nq' c5' nq' c6' drop tempvarzzz nq' use returnlist', clear foreach v in keep' { local vtemp=strtoname("vv'") nq' replace vtemp'=v' if _n==i' } nq' save returnlist', replace if i'/50 == int(i'/50) di ". " i' else di _continue "." } sum nq' restore gen tempvarzzz = rnormal() sort tempvarzzz c1' c2' c3' c4' c5' c6' drop tempvarzzz end * We can see that we get basically the same results. bootme 1, r(200) k(_b[x]) c1(reg y yl x) * However, now let's add a line of code that generates y lagged as a function of the boostrapped data. bootme 1, r(200) k(_b[x]) c1(gen ylboot = y[t-1]) c2(reg y ylboot x) * In this formulation, unsurprisingly ylboot has almost no explanatory power. gen tboot = _n two (line ystd tboot) (line xstd tboot), name(boot1, replace) /// title("Boostrapped data is not autocorrelated") legend(off) graph combine boot1 original, rows(2) * We can see that clearly bootstrapping our results in this way is not going to work. * One of my classmates is using a method that I think he got from the literature which is a little more complicated. * First he estimates his model. reg y yl x * Then predicts the fitted and residuals predict yhat predict uhat, resid * Then he bootstraps those values and reconstructs the y values from the fitted values by randomly drawing the errors. * Let's see if we can accomplish this with the previous command. * First we need to save the fitted values to be grabbed later. global N=_N forv i=1(1)$N {
gl yhati' = yhat[i']
gl xi' = x[i']
}

* Write a small program to load yhat values back into the memory
cap program drop datasetup
program datasetup
cap gen yhat0 = .
cap gen ynew = .
cap gen ynewlag = .
cap gen xnew = .

forv i=1(1)$N { qui replace yhat0 =${yhati'} if _n==i'
qui replace xnew = \${xi'}  if _n==i'
if i'>1 qui replace ynew = yhat0+uhat
}
replace ynewlag=ynew[_n-1]
end

datasetup
corr y ynew x xnew
* Initially these values will be identical

bootme 1, r(200) k(_b[x]) c1(datasetup) c2(reg ynew ynewlag xnew)
* As with my classmate's results, the new confidence interval for the x coefficient does not even include the original estimate ~1.2.

* Looking at the data generated in this method
egen ynewstd = std(ynew)
egen xnewstd = std(xnew)

replace t=_n
two (line ynewstd t) (line xnewstd t), name(original, replace) legend(off) ///
title("The new time series")

* Clearly this method is not accomplishing what we would like it to accomplish from a perspective of replicating the estimated coefficients.

* However, interestingly (though not neccessarily believably) the standard errors from the bootstrap are close to the size of the standard errors from the OLS regression above reg y y1 x

* Hope this is helpful.  I will send my friend this link to see if he has anything to add.

#### 1 comment:

1. Francis - check out the block bootstrap and its extensions.