## Monday, November 19, 2012

### t-tests and F-tests and rejection rates

Original Code

* When I was first learning about t-tests and f-tests I was told that a t-test estimated the probability of falsely rejecting the null for a single estimator.

* While the f-test estimated the probability of rejecting the null that the model explained nothing significant.

* It was also stated that sometimes the t-tests can fail to reject the null when the f-test will reject the null and this is the result primarily of correlation among explanatory variables.

* All of these things I believe are well understood.

* However, what I always wanted to know was, if the t-test rejected did that mean that the f-test must also reject?

* This seems intuitively to me to be true.

* That is, if one part of the model seems to be statistically significant, mustent the whole model be statistically significant?

* Now thinking back on the question, I think the answer must be no.

* Assuming we are using rejection rates of 10%, the reason I argue is that if the f-test assumptions are met then it should falsely reject the null 10% of the time.

* Likewise, if the t-test's assumptions are met it should reject the null 10% of the time.

* However, if we are estimating two ts and they are independent then the probability that neither of them reject the null at 10% is 1-(1-.10)^2=19%

* Thus if the f-test rejects at 10% then there must be a range for which one or more t-stat can reject but the f-stat will fail to reject.

* Let's see if we can see this in action through simulation.

cap program drop ft_test
program define ft_test, rclass

clear
set obs 1000

gen x1=rnormal()
gen x2=rnormal()
gen y=rnormal()

reg y x?

* Calculate the p-stats for the individual coefficients.
* We multiply by 2 because the ttail is initially one sided and we are interested in the two sided alternative.
return scalar pt1 = 2 * ttail(e(df_r), abs(_b[x1]/_se[x1]))
return scalar pt2 = 2 * ttail(e(df_r), abs(_b[x2]/_se[x2]))

* We also want to extract the F stat
return scalar pF  = Ftail(e(df_m),e(df_r),e(F))

end

ft_test
ft_test
ft_test
* Running the simulated regression on the data a few times, I can easily see how the P-stat for the t-tests diverge from the f-stat fairly frequently.

simulate pt1=r(pt1) pt2=r(pt2) pF=r(pF), reps(1000): ft_test

gen rt1 = (pt1<=.1)
gen rt2 = (pt2<=.1)
gen rF  = (pF<=.1)

sum r*
* All of the p tests seem to be rejecting at the right level

* It might be the case that we always reject the null for the f if the rejection of the null for the t-tests are correlated.
pwcorr rt1 rt2, sig

* There does not appear to be correlation between the two t-tests rejections.

* By now we should already know the answer to the question.

* But let's check directely.

gen rtF = 0 if rt1 | rt2
replace rtF = 1 if rF == 1 & rtF == 0
sum rtF

* Thus the probability of rejecting the f-null given that we have rejected at least one of the t-nulls is only a little above 50%.

* It does make sense that the f and t rejections be correlated.

* That is, when the individual coefficients seem to be explaining the unknown variance then overall the model seems to be working relatively well.

pwcorr rF rt1 rt2, sig

* There is one more thing to check.  How frequently do we reject the null for the F but not for either of the ts.
gen rFt = 0 if rF
replace rFt = 1 if rt1 | rt2 & rFt == 0

sum rFt
* In this simulation, we only reject the F-stat when at least one of the t-stats rejects.

* We could therefore argue that the F-stat is a more conservative test than the t-stats.

* However, I do not believe this to be entirely the case.

* As mentioned before, I think it is possible for the t-stat to fail to reject when the explanatory variables are correlated when the F-stat does reject.

* Let's see if we can simulate this.

cap program drop ft_test2
program define ft_test2, rclass

clear
set obs 1000

gen x1=rnormal()
gen x2=rnormal()+x1*3
* This will cause x1 and x2 to be strongly correlated.
gen y=rnormal()

reg y x?

* Calculate the p-stats for the individual coefficients.
* We multiply by 2 because the ttail is initially one sided and we are interested in the two sided alternative.
return scalar pt1 = 2 * ttail(e(df_r), abs(_b[x1]/_se[x1]))
return scalar pt2 = 2 * ttail(e(df_r), abs(_b[x2]/_se[x2]))

* We also want to extract the F stat
return scalar pF  = Ftail(e(df_m),e(df_r),e(F))

end

simulate pt1=r(pt1) pt2=r(pt2) pF=r(pF), reps(1000): ft_test2

* Same analysis as previously
gen rt1 = (pt1<=.1)
gen rt2 = (pt2<=.1)
gen rF  = (pF<=.1)

sum r*
pwcorr rt1 rt2, sig
* The rate of rejection between ts is highly correlated.

gen rtF = 0 if rt1 | rt2
replace rtF = 1 if rF == 1 & rtF == 0
sum rtF
* Under this setup, the rejection rate of the null for the F is about 45% of the time when one of the ts is rejected.

pwcorr rF rt1 rt2, sig
* We can see that now the rejection rates by component is still very strong.

* There is one more thing to check.  How frequently do we reject the null for the F but not for either of the ts?
gen rFt = 0 if rF
replace rFt = 1 if rt1 | rt2 & rFt == 0

sum rFt
* Now we can see the result as discussed previously.  About 25% of the time the f-stat is rejecting the null even though neither t-stat is rejecting the null.

* Thus it may be informative to use a F-stat to check for model fit even when the t-stats do not suggest statistical significance of the individual components.

* The ultimate result of this simulation is to emphasize for me the need to do tests of model fit.

* If, I were to look only at the t-tests in this example then I would falsely assume that the model fits well nearly twice as frequently as if I were to look at the F-stat only.

#### 1 comment:

1. Francis - I have been trying to leave this comment on your post from Monday, but apparently I am a robot!
Here's the comment:

Francis: you might be interested in the following 2 papers:
The article by Geary and Leser in "The American Statistician" (1968):
http://www.jstor.org/stable/pdfplus/2681875.pdf

and the one by Duchan, in the same journal (1969):
http://www.jstor.org/stable/pdfplus/2682578.pdf

Best,

Dave Giles