* One of the first simulations I ever wrote was using random walks and seeing how OLS fails when the data generating process does not experience decay.

* A good example of this might be the stock market. We do not expect that because prices go down today that they need go up tomorrow or visa versa. Likewise the currency market (comparing two developed economies is unlikely experience any kind of a decay rate on the previous value. Either the dollar is up relative to the Euro or it is down relative to the Euro, the best we can do to predict the dollar is look at what it was yesterday.

* Let's imagine that we have two random walks and 730 (two years of daily information):

clear

set seed 111

set obs 730

gen t = _n

label var t "Time (day #)"

gen y = 100 if t==1

label var y "% value of dollars to t=0"

gen x = 100 if t==1

label var x "% value of euros to t=0"

* Initially both values are set at the same

replace y = y[_n-1] + rnormal() if _n>1

replace x = x[_n-1] + rnormal() if _n>1

* However now there is no relationship between y and x

two (line y t) (line x t)

* Looking at the two values plotted next to each other we can almost imagine a relationship. Play around with different seeds. If your mind is anything like mine it will start seeing patterns.

reg y x

* We strongly reject the null that there is no relationship. This is no accident of this draw. Almost any seed you set you will get similar results.

* So it the problem with the random draws? Are they not truly random?

* Well, yes they are not trully random but they are as close to random as we need concern ourselves.

* It is simply the nature of the random walk. Fortunately there are tests for that.

tsset t

dfuller y

dfuller x

* The null in this test is that the process follows a random walk (contains a unit root). The test fails to reject our null.

*********************************

* We can do the exact same thing in a panel data setting.

* Imagine you have the quiz scores for 1000 students over 100 periods.

clear

set seed 111

set obs 50

gen id = _n

label var id "Student id"

expand 52

bysort id: gen t = _n

label var t "Year"

* Student first year is the reference year

gen y = 100 if t==1

label var y "student quiz score"

* The explanatory variable also follows a random walk

gen x = 100 if _n==1

label var x "Hours spent watching TV"

* Initially both values are set at the same

replace y = y[_n-1] + rnormal() if t>1

replace x = x[_n-1] + rnormal() if t>1

* However now there is no relationship between y and x

reg y x

* In the panel data case we are much less likely to reject the null. Given that the random walks for each individual is independent of those of the others.

tsset id t

* For the test of the unit root we will use the aptly named madfuller test (just kidding) written by Christopher F Baum and can be found at http://ideas.repec.org/c/boc/bocode/s418701.html.

* The test can be installed via ssc install madfuller

madfuller x, lags(1)

madfuller y, lags(1)

* This test uncovers the truth that there is a random walk process at work.

* Note the biggest limitation of the test is that number of time periods must exceed the number of ids. Thus there is 50 students and 52 quizes.

## No comments:

## Post a Comment