Econometrics By Simulation: Principal Component Analysis

Saturday, October 26, 2013

Principal Component Analysis

* Principal component analysis is a very interesting method that allows for 
* one to attempt to identify the underlying driving factor or compenent in
* the observable values in data.

* Imagine that you have data on demographic information about people.  This
* data tells you stuff like class rank, number of
* sports played in, salary, married, number of children, etc.

* Now let's imagine that individual data observations are a function of
* underlying latent personal traits. These traits include: intelligence,
* athleticism, and family_orientation

set seed 101

clear
set obs 1000

* Latent traits
gen inte = rnormal()
gen athl = rnormal()
gen famo = rnormal()

* Observable traits
gen class_rank = 2*inte   - .1*athl +  1*famo   + rnormal()
gen nsports    = -.5*inte +  2*athl + .5*famo   + rnormal()
gen salary     = 1*inte   + .5*athl -  1*famo   + rnormal()
gen married    = .1*inte  + .5*athl +  1.5*famo + rnormal()
gen children   = -.5*inte +  0*athl +  2*famo   + rnormal()

* Now let us attempt to identify our latent traits

pca  class_rank nsports salary married children

screeplot


predict lt1 lt2 lt3
* This will generate a variable that respresents the latent trait
* estimates from the principal component analysis.

corr lt1 lt2 lt3 inte athl famo
* By correlating the latent traits with the the pca generated 
* variables we are able to test how well the pca analysis is working.

* We can see that the first latent component identified is famo 
* (family orientation) followed by intelligence and then athletics.

* It is important to note that while in practice family orientation,
* intelligence, and athletics can be correlated principal component
* analysis would have difficulty identify them if it did since it
* importantly relies upon identifying orthogonal components.

Formatted By Econometrics by Simulation

1 comment:

AnonymousMay 3, 2014 at 5:44 AM
What you are describing is not PCA, it is in fact Factor analysis. A similar but different method that is appropriate in different settings compared to PCA.
ReplyDelete
Replies

Add comment