Econometrics By Simulation: Logit: Logistic regression on a factor variable

Friday, July 6, 2012

Logit: Logistic regression on a factor variable

* Logistic regression on a factor variable

* A reader recently contacted me with a request. I want to run logistic regressions to examine if a person ever visited a dentist in the last year.

* The example of the data she sent looked something like the following:

/* A12 | Freq. Percent Cum.
------------------------+-----------------------------------
In the last 4 weeks | 2,028 20.28 20.28
Between 1 and 12 months | 2,036 20.36 40.64
1-2 years ago | 1,997 19.97 60.61
More than 2 years ago | 1,963 19.63 80.24
Never | 1,976 19.76 100.00
------------------------+-----------------------------------
Total | 10,000 100.00 */

* Let us first generate data that looks something like her data.

clear
set obs 10000
gen A12 = int(runiform()*5)+1
label define dental 1 "In the last 4 weeks" ///
2 "Between 1 and 12 months" ///
3 "1-2 years ago" ///
4 "More than 2 years ago" ///
5 "Never"

label values A12 dental

tab A12
* The problem is that the dependent variable is coded as a factor variable but the logistic regression takes a binary varailble.

* First we want to figure out what the label book on the A12 varaible is.
desc A12
* But this might not be the case that A12 is a factor variable. We might find that A12 is actually a string variable.

* Let us generate string duplicate
decode A12, gen(A12b)

tab A12b
* We can see that the tab commands are identical except in the order that the items are listed.
* Thus we can infer that the original data is in factor form.

* Though just looking at the desc command tells us as well. If the storage type is not string then it must be a factor variable.

* This tells us that A12 has the label dental applied to it.

label list

* Here is a detailed post on how to convert factor variables to dummies:
* http://www.econometricsbysimulation.com/2012/06/convert-factor-variables-dummy-lists.html

* However, it might be a bit of overkill for this problem. Instead we can manually convert the factor variables as we need to.

* Generate first an empty variable
gen dental_yr1 = 0
label var dental_yr1 "Went to the dentist in the last year"
replace dental_yr1 = 1 if A12 == 1
* We know from the label list that A12 == 1 is "In the last 4 weeks"
replace dental_yr1 = 1 if A12 == 2
* We know from the label list that A12 == 1 is "Between 1 and 12 months"

sum dental_yr1
* Everything is looking good.

* Now in order to do a logistic regression we need to have some explanatory variables so let's generate some independent ones for now.
gen indepvar1 = rnormal()
gen indepvar2 = rnormal()

* Finally:
logit dental_yr1 indepvar1 indepvar2
* Unsprisingly the independent vars are not statistically significant.

Econometrics By Simulation

Friday, July 6, 2012

Logit: Logistic regression on a factor variable

No comments:

Post a Comment

Blog Archive