Friday, July 6, 2012
Logit: Logistic regression on a factor variable
* Logistic regression on a factor variable
* A reader recently contacted me with a request. I want to run logistic regressions to examine if a person ever visited a dentist in the last year.
* The example of the data she sent looked something like the following:
/* A12 | Freq. Percent Cum.
------------------------+-----------------------------------
In the last 4 weeks | 2,028 20.28 20.28
Between 1 and 12 months | 2,036 20.36 40.64
1-2 years ago | 1,997 19.97 60.61
More than 2 years ago | 1,963 19.63 80.24
Never | 1,976 19.76 100.00
------------------------+-----------------------------------
Total | 10,000 100.00 */
* Let us first generate data that looks something like her data.
clear
set obs 10000
gen A12 = int(runiform()*5)+1
label define dental 1 "In the last 4 weeks" ///
2 "Between 1 and 12 months" ///
3 "1-2 years ago" ///
4 "More than 2 years ago" ///
5 "Never"
label values A12 dental
tab A12
* The problem is that the dependent variable is coded as a factor variable but the logistic regression takes a binary varailble.
* First we want to figure out what the label book on the A12 varaible is.
desc A12
* But this might not be the case that A12 is a factor variable. We might find that A12 is actually a string variable.
* Let us generate string duplicate
decode A12, gen(A12b)
tab A12b
* We can see that the tab commands are identical except in the order that the items are listed.
* Thus we can infer that the original data is in factor form.
* Though just looking at the desc command tells us as well. If the storage type is not string then it must be a factor variable.
* This tells us that A12 has the label dental applied to it.
label list
* Here is a detailed post on how to convert factor variables to dummies:
* http://www.econometricsbysimulation.com/2012/06/convert-factor-variables-dummy-lists.html
* However, it might be a bit of overkill for this problem. Instead we can manually convert the factor variables as we need to.
* Generate first an empty variable
gen dental_yr1 = 0
label var dental_yr1 "Went to the dentist in the last year"
replace dental_yr1 = 1 if A12 == 1
* We know from the label list that A12 == 1 is "In the last 4 weeks"
replace dental_yr1 = 1 if A12 == 2
* We know from the label list that A12 == 1 is "Between 1 and 12 months"
sum dental_yr1
* Everything is looking good.
* Now in order to do a logistic regression we need to have some explanatory variables so let's generate some independent ones for now.
gen indepvar1 = rnormal()
gen indepvar2 = rnormal()
* Finally:
logit dental_yr1 indepvar1 indepvar2
* Unsprisingly the independent vars are not statistically significant.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment