It is well known that dropping the constant in regression analysis may introduce bias. However, bias is really not the deeper issue. The deeper issue is that by omitting the constant, you are specifying a very specific form for the relationship between y and x. In particular this form is that when x equals 0, the expected value of y must also be equal to 0. It is very hard to think of examples outside of geometric proofs when this should be the case.
Let's see this in action.
Let's see this in action.
set obs 10000
gen x=abs(rnormal())
gen u=rnormal()*3
gen y=10+x+u
reg y x, nocon
graph twoway (lfit y x) (scatter y x)
scatter y x || lfit y x , estopts(nocons) ///
title("Without constant, the best fit line must pass through origin")
* Notice, this only leads to large biased values
* if x does not have a mean of zero.
gen x2=rnormal()
gen y2=10+x2+u
reg y2 x2, nocon
reg y2 x2
graph twoway (lfit y2 x) (scatter y2 x)
two (scatter y2 x2) (lfit y2 x2 , estopts(nocons)) (lfit y2 x2 ), ///
title("Without constant, the best fit line must pass through origin") ///
legend(off)
* We can see that because x has a mean of zero the coefficient estimate
* on x is close to that of the scenario when a constant is included.
Formatted By Econometrics by Simulation
set obs 10000
gen x=abs(rnormal())
gen u=rnormal()*3
gen y=10+x+u
reg y x, nocon
graph twoway (lfit y x) (scatter y x)
scatter y x || lfit y x , estopts(nocons) ///
title("Without constant, the best fit line must pass through origin")
* Notice, this only leads to large biased values
* if x does not have a mean of zero.
gen x2=rnormal()
gen y2=10+x2+u
reg y2 x2, nocon
reg y2 x2
graph twoway (lfit y2 x) (scatter y2 x)
two (scatter y2 x2) (lfit y2 x2 , estopts(nocons)) (lfit y2 x2 ), ///
title("Without constant, the best fit line must pass through origin") ///
legend(off)
* We can see that because x has a mean of zero the coefficient estimate
* on x is close to that of the scenario when a constant is included.
Formatted By Econometrics by Simulation
No comments:
Post a Comment