Tuesday, March 18, 2014

Omitting Constant may Introduce Biased Coefficients

It is well known that dropping the constant in regression analysis may introduce bias. However, bias is really not the deeper issue.  The deeper issue is that by omitting the constant, you are specifying a very specific form for the relationship between y and x.  In particular this form is that when x equals 0, the expected value of y must also be equal to 0.  It is very hard to think of examples outside of geometric proofs when this should be the case.

Let's see this in action.

set obs 10000

gen x=abs(rnormal())
gen u=rnormal()*3

gen y=10+x+u

reg y x, nocon

graph twoway (lfit y x) (scatter y x)

scatter y x || lfit y x , estopts(nocons)  /// 
   title("Without constant, the best fit line must pass through origin") 
 

 
* Notice, this only leads to large biased values
* if x does not have a mean of zero.

gen x2=rnormal()

gen y2=10+x2+u

reg y2 x2, nocon
reg y2 x2

graph twoway (lfit y2 x) (scatter y2 x)

two (scatter y2 x2) (lfit y2 x2 , estopts(nocons)) (lfit y2 x2 ),  /// 
   title("Without constant, the best fit line must pass through origin") ///
   legend(off)
 
 
 
* We can see that because x has a mean of zero the coefficient estimate
* on x is close to that of the scenario when a constant is included.


Formatted By Econometrics by Simulation

No comments:

Post a Comment