* Path analysis is an interesting statistical method that can be used to indentify complex relationships beween variables and an outcome variable.
* As with all statistical methods the modelling framework is essential to derive reasonable results.
* Conviently, I am only interested in simulating data so as usual my data will perfectly conform to the model's specifications.
* Imagine the following model.
* All of the boxes are observable variables. The arrows indicate the causal direction of the effects.
* There are two exogenous variables: A and D. These variables are not influenced by any other variables in the model.
* All other variables are endogenous.
* Each of the variables represents a direct effect of a one unit change in one variable on that of the other variable.
* This framework is convient because it allows us to indentify a "total effect" which is a combined result of both the direct and indirect effects of variables on the outcome variable.
* The variable of primary interest in explaining is H.
* The variable G has only a direct effect on H (pHG).
* While the variable C only has an indirect effect on H (pFC*pHF).
* The reason the indirect effect is a product is because C has a pFC effect on F, and F has a pHF effect on H, thus a change in H as a result of a change in C is how much F changes as a result of C and how much that change effects H.
* Variables can have both and indirect and direct effect.
* B for instance has the direct effect: pHB
* Indirect effects: pCB*pFC*pHF + pEB*pHE
* Total effect: pHB + pCB*pFC*pHF + pEB*pHE
* The key feature about this particular example is that all of the arrows are one directional.
* Making a great deal of inference possible that otherwise would not be possible.
* Usually we cannot say that when trying to explain H with explanatory variables A through G that A causes B and B causes H.
* However, if we do the work to indentify reasonable pathways then this type of analysis could be quite interesting.
* Let's generate out data.
clear
* Let's imagine 6000 youth in our sample.
set obs 6000
* Let's first specify our effects
* pEA = .3
* pEB = .13
* pHA = .2
* pHB = .2
* pHE = .3
* pHG = 1.1
* pHF = .2
* pBA = .5
* pCB = .2
* pCD = .1
* pGD = .2
* pFC = .76
* pFB = .4
* For B we can calculate our true effects:
* B Direct: pHB = .2
* Indirect effects: pCB*pFC*pHF + pEB*pHE
* Indirect effects:.2*.76*.2 + .13*.3 = .0694
* Total effect: .0694+.2 = .2694
gen A = rnormal()
gen B = A*.5 + rnormal()
gen D = rnormal()
gen C = B*.2 + D*.1 + rnormal()
gen E = A*.3 + B*.13 + rnormal()
gen F = B*.4 + C*.76 + rnormal()
gen G = D*.2 + rnormal()
gen H = E*.3 + A*.2 + B*.2 + F*.2 + G*1.1 + rnormal()
* Simualtion Done
* In order to generate our different effects we simply run OLS for each endogenous variable.
reg A B
local pBA = _b[B]
reg C B D
local pCB = _b[B]
local pCD = _b[D]
reg C B D
local pCB = _b[B]
local pCD = _b[D]
reg G D
local pGD = _b[D]
reg F C B
local pFB = _b[B]
local pFC = _b[C]
reg E A B
local pEA = _b[A]
local pEB = _b[B]
reg H A B E F G
local pHA = _b[A]
local pHB = _b[B]
local pHE = _b[E]
local pHF = _b[F]
local pHG = _b[G]
* In order to estimate the indirect effect say of B on H.
* We just plug our estimates into the equation.
* B direct effect: pHB
* Indirect effects: pCB*pFC*pHF + pEB*pHE
* Total effect: pHB + pCB*pFC*pHF + pEB*pHE
di "B's estimated indirect effect = `pCB'*`pFC'*`pHF' + `pEB'*`pHE'"
di "B's estimated indirect effect = " `pCB'*`pFC'*`pHF' + `pEB'*`pHE'
* Which turns out to be close to our true value.
di "B's total estimated effect on H is " `pHB' + `pCB'*`pFC'*`pHF' + `pEB'*`pHE'
* It is possible to use the user written command pathreg to make things easier.
* Install it by typing the following command. findit pathreg
pathreg (H E B F G) (G D) (C B D) (B A) (E A B) (F B C)
* This command does not currently calculate out all of the indirect and direct effects.
* I am not sure the best way to calculate the standard errors of the different effect estimates.
* My guess is that since this is just a series of fast OLS regressions the easiest thing to do would be to boostrap the entire process.
* This would require slightly more code but definitely easy to do from this point.
Stata has (relatively new, I think) SEM features:
ReplyDeletehttp://www.stata.com/stata12/structural-equation-modeling/
Yeah I know, I am a version behind the times :D
Delete