Sunday, April 29, 2012

Seemingly Unrelated Regression - with Correlated Errors


* Stata Simulation and Estimation

* This code simulates a systems OLS: Seemingly Unrelated Regression (SUR) with Panel data
* on three different dependent variables each with different explanatory variables.
* In this case we will image that we want to predict the domestic price of
* wheat, corn, soybeans from the world price of wheat, corn, soybeans in Moldova
* This simple analysis would not be as easy if you wanted to use a large country
* since the price of these products domestically could cause changes in world prices.

* Imagine a 3 equation SUR system with:

* y1 = x1B1 + u1
* y2 = x2B2 + u2
* y3 = x3B3 + u3

* The key in SUR is to assume that E(y1|x1,x2,x2)=E(y1|x1)=x1B1 or in other words that it
* does not create bias in estimating y1 if you were to leave out x2 and x3.  In the example
* this argument explicitly states that you do not need to know the world price of corn to
* predict the domestic price of wheat so long as you are already using the world price of
* wheat in your alanysis.

* Clear the memory
clear
set seed 101

* Image that we have data for 20 years starting in 1985
set obs 20
gen year=1984+_n

* First let's image that over the years there are random price shocks that hit all three domestic
* products such as drought, heat waves, early frosts.  These are captured by the error term v.

gen v=4*rnormal()

* Let's imagine also that the expected price of wheat is 4, corn 3, and soybeans 3.5

* First we generate the shocks to different prices that are not shared plus the shocks that
* are shared (v).

gen u_wheat=rnormal()+v
gen u_corn=rnormal()+v
gen u_soy=rnormal()+v

* Then we generate world prices for year 1985

gen wheat_world=abs(4 + 2*rnormal()) if year ==1985
gen corn_world =abs(3 + 2*rnormal()) if year ==1985
gen soy_world=abs(3.5 + 2*rnormal()) if year ==1985

* Let's assume that world prices are AR1.  That is, last year's price is positively correlated
* with this year's price but not next year's price.

replace wheat_world=abs(4/2 + 2*rnormal() + wheat_world[_n-1]/2) if year !=1985
replace corn_world =abs(3/2 + 2*rnormal() + corn_world[_n-1]/2)  if year !=1985
replace soy_world=abs(3.5/2 + 2*rnormal() + soy_world[_n-1]/2)   if year !=1985

* World prices are also autoregressive to the first degree AR1.  That is controlling for
* previous period prices perfectly predicts current prices.

* The abs function ensures that world prices never become negative.

* In addition, let's say there was a common trend that caused world prices to rise graduatelly
* near the center of our data before falling again.

* We want the year 1 and year 20 to have no trend.  Let's let the trend be a second degree polynomial.
* so t[year]=alpha1*year + beta*year^2.  we want t[1]=0, t[10]=5, t[20]=0

gen t = 0 if year == 2004 | year == 1985
replace t = 5 if year == 1995
gen year2=year^2

reg t year year2
predict trend
drop t

* Now lets add the trend back to world prices

replace wheat_world=wheat_world+trend
replace corn_world =corn_world+trend
replace soy_world=soy_world+trend

* Let's see what the world price look like
twoway (line wheat_world year) (line corn_world year) (line soy_world year)

* Now, let us specify how we want world prices to influence domestic prices.
* In an free global market world prices should be perfectly predictive of local prices.
* So let us say the coefficients on world prices in predicting domestic prices is 1.5.
* To put everything together:

gen wheat_moldova = .75*wheat_world + abs(u_wheat + 2)
gen corn_moldova  = .75*corn_world + abs(u_corn + 2)
gen soy_moldova   = .75*soy_world + abs(u_soy +2)

* Let's see what the world price look like
twoway (line wheat_world year) (line wheat_moldova year) ///
       (line corn_world year)  (line corn_moldova year) ///
       (line soy_world year)   (line soy_moldova year)

* We could do three different regressions:
reg wheat_moldova wheat_world
reg corn_moldova corn_world
reg soy_moldova soy_world

* Which gives us unbiased and consistent estimates.  However, we could also do
* SUR which should allow for correlation between the errors (v).  Thus

sureg (wheat_moldova wheat_world) (corn_moldova corn_world) (soy_moldova soy_world)

* Since we think (know) that the coefficients on the different world prices are the same
* we could specify a constraint to force them to be the same accross equations.

constraint 1 [wheat_moldova]wheat_world=[corn_moldova]corn_world
constraint 2 [wheat_moldova]wheat_world=[soy_moldova]soy_world

sureg (wheat_moldova wheat_world) (corn_moldova corn_world) (soy_moldova soy_world) , const(1 2 )


* To test the relative merits of SUR next to OLS we will use Monte Carlo simulation.
* Same code as previously but now a program needs be defined.  With return statements.
cap program drop sur_test

program define sur_test, rclass
  clear
* set seed 101  - must turn this off or else every repetition will be exactly the same
  set obs 20
  gen year=1984+_n
  gen v=4*rnormal()
  gen u_wheat=rnormal()+v
  gen u_corn=rnormal()+v
  gen u_soy=rnormal()+v
  gen wheat_world=abs(4 + 2*rnormal()) if year ==1985
  gen corn_world =abs(3 + 2*rnormal()) if year ==1985
  gen soy_world=abs(3.5 + 2*rnormal()) if year ==1985
  replace wheat_world=abs(4/2 + 2*rnormal() + wheat_world[_n-1]/2) if year !=1985
  replace corn_world =abs(3/2 + 2*rnormal() + corn_world[_n-1]/2)  if year !=1985
  replace soy_world=abs(3.5/2 + 2*rnormal() + soy_world[_n-1]/2)   if year !=1985
  gen t = 0 if year == 2004 | year == 1985
  replace t = 5 if year == 1995
  gen year2=year^2
  reg t year year2
  predict trend
  drop t
  replace wheat_world=wheat_world+trend
  replace corn_world =corn_world+trend
  replace soy_world=soy_world+trend
  gen wheat_moldova = .75*wheat_world + abs(u_wheat + 2)
  gen corn_moldova  = .75*corn_world + abs(u_corn + 2)
  gen soy_moldova   = .75*soy_world + abs(u_soy +2)
* Same code as before but without the comments up to this point
  reg wheat_moldova wheat_world
* Return, returns a scalar
return scalar wheat_ols=_b[wheat_world]
  reg corn_moldova corn_world
return scalar corn_ols=_b[corn_world]
  reg soy_moldova soy_world
return scalar soy_ols=_b[soy_world]
  sureg (wheat_moldova wheat_world) (corn_moldova corn_world) (soy_moldova soy_world)

* This retrieves the cofficients and sends them out as returns.  There must be a better
* way of doing this.
matrix A= e(b)'
svmat A, names(sureg)

cap return scalar wheat_sureg=sureg[1]
cap return scalar corn_sureg=sureg[3]
cap return scalar soy_sureg=sureg[5]

* Since we think (know) that the coefficients on the different world prices are the same
* we could specify a constraint to force them to be the same accross equations.

constraint 1 [wheat_moldova]wheat_world=[corn_moldova]corn_world
constraint 2 [wheat_moldova]wheat_world=[soy_moldova]soy_world

sureg (wheat_moldova wheat_world) (corn_moldova corn_world) (soy_moldova soy_world) , const(1 2 )

matrix A2=e(b)'
svmat A2, names(sureg2)

cap return scalar wheat_sureg2=sureg2[1]
cap return scalar corn_sureg2=sureg2[3]
cap return scalar soy_sureg2=sureg2[5]

end
* End of the program

sur_test

simulate wheat_ols=r(wheat_ols) wheat_sureg=r(wheat_sureg) wheat_sureg2=r(wheat_sureg2) ///
   corn_ols=r(corn_ols) corn_sureg=r(corn_sureg) corn_sureg2=r(corn_sureg2) ///
   soy_ols=r(soy_ols) soy_sureg=r(soy_sureg) soy_sureg2=r(soy_sureg2), rep(100): sur_test

sum
 
* From the simulation we can see that SUR outperforms ols and SUR2 outperforms sureg.
* This is visible generally by how much closer the average estimate of the cofficients are
* on the true value which is .75 In addition, you can see that SUR and SUR2 are more
* efficient because they have smaller standard errors.

2 comments:

  1. Hi,

    I don't think this code works with panel data. I think it is for cross section data. Can you verify that it works with panel data?

    ReplyDelete
    Replies
    1. I believe the example is using time series data.

      Delete