## Monday, August 13, 2012

### Write your own System IV estimator in R (and SOLS)

# Specify a variable to hold number of observations
obs = 10000

# Create independent variables
z1 = rnorm(obs)
z2 = rnorm(obs)

# Create error
u1 = rnorm(obs)
u2 = rnorm(obs)

# Create endogenous variable
x1 = .5*z1  -   z2 + u1 + rnorm(obs)
x2 = 1.5*z1 - 2*z2 - u2 + rnorm(obs)
x3 = rnorm(obs)

# Create dependent variable
y1 =  5 + 2*x1 - x2 +   x3 + u1*5 - u2*2
y2 = -5 - 2*x1 + x2 - 2*x3 + u2*5 - u1*1

# First let's attempt a nieve SOLS
# Y = XB
# X'Y = X'XB
# (X'X)^-1(X'Y) = Bhat

X = cbind(x1,x2,x3,1)
Y = cbind(y1,y2)

A = solve(t(X)%*%X)
# solve simply finds the inverse of its argument
B = t(X)%*%Y

SOLS = A%*%B
SOLS

# We can see the coefficient on x is upwards biased.

# Now let's construct the system IV

# Y = XB
# Z'Y = Z'XB
# (Z'X)'Z'Y = (Z'X)'Z'XB
# (X'Z)Z'Y  = (X'Z)Z'XB
# ((X'Z)Z'X)^-1 (X'Z)Z'Y = B
# ((X'Z Z'X)^-1 (X'Z Z'Y) = IVhat
# ((X'Z Z'X)^-1 (X'Z Z'Y) = IVhat
# C * D = IVhat

X = cbind(x1, x2, x3, 1)

# It is important to remember that any explanatory variables that are not instrumented for must be included in the instrument.  Thus x3 in Z.
Z = cbind(x3, z1, z2, 1)

C = solve(t(X)%*%Z%*%t(Z)%*%X)
D = t(X)%*%Z%*%t(Z)

SIV = C%*%D%*%Y
SIV

# System IV seems to be working pretty well
# Alternative non-system IV on the two equations would be

IV1 = C%*%D%*%y1
IV1

IV2 = C%*%D%*%y2
IV2

# We can see that there is no difference in coefficient estimates between seperate IV estimates and SIV.  This, I believe, is because like in the SOLS model if the regressors are the same in both equations and there is no cross equation restrictions then the system produces identical results to the non-system estimates.  This however is not true when estimating standard errors.