* When I first started taking stats there was some discussion between the merits of R2 measures and that of adjusted R2.

* People were concerned that including any additional estimators by definition increased the R2 measure so the need to come up with a measure that did not depend on number of regressors.

* In this small command I generates a dependent variable then generate independent explanatory variables and see what happens to the r2 and adjusted r2 when we increase the number of explanatory variables.

cap program drop R2r

program define R2r

clear

set obs `2' // The second argument is the number of observations

tempvar y

gen `y' = rnormal() // Generate the dependent variable

forv i=1(1)`1' { // Loop from 1 to the number of variables defined as the first argument.

tempvar v`i'

gen `v`i'' = rnormal()

}

reg `y' `v1'-`v`1'' // Do the estimation.

end

set seed 1

R2r 2 10000 /// The r-squared is quite small with only two dependent variables

R2r 20 10000 /// The r-squared is much larger

* But we should not take the results of just two simulations lets try this using the simulate command

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 2 10000

sum

* The R2 is a little less than .2%

* The adjusted R2 is a little less than 0

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 20 10000

sum

* Now the R2 using 20 observations is close to .2%

* The adjusted R2 is very close to zero

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 200 10000

sum

* Almost identical results with the 2 squared on average being around 2%

* Using 1000 observations the r2 is more sensitive

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 2 1000

sum

* Now the R2 is a little greater than .2%

* The adjusted R2 is little greater than .02%

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 20 1000

sum

* Now the R2 using 20 observations is 2%

simulate r2=e(r2) r2a=e(r2_a), rep(200): R2r 200 1000

sum

* Now the average R2 squared is greater tha 20%

* Overall, the take away seems to be, only worry about the R2 when the number of observations are low and the number of regressors are large.

## No comments:

## Post a Comment