Tuesday, January 22, 2013

Use Expand to Manually Reshape Data

do file

* Imagine that you have a wide data set that you would like to convert to a long data set.

* Your data set is structured in the following manner.

clear
set obs 1000

gen id = _n

forv i=1/4 {
  gen var1_`i' = rnormal()
  gen var2_`i' = rnormal()
  gen var3_`i' = rnormal()
  gen var`=`i'+3' = rbinomial(1,.5)
}

order _all, alphabetic

* You should now have data in which there are three variables 1,2, and 3 which have four different records and four variables var4-var7 which are time invariant.

* We can use expand to reshape the data.

expand 4

* Now we have four instances of each of the original observations.

* We want now to create variables var1 var2 var3 and var4 which represent each different values for each panel period.

* First let's sort and label our different panel periods.

* Clump all of the same id's together
bysort id: gen year=_n

* Now we just copy our variables so that they only occur in the appropriate time period.

* I will do this manually though it would be very easy to do by macros.

gen var1 = var1_1 if year == 1
replace var1 = var1_2 if year == 2
replace var1 = var1_3 if year == 3
replace var1 = var1_4 if year == 4

gen var2 = var2_1 if year == 1
replace var2 = var2_2 if year == 2
replace var2 = var2_3 if year == 3
replace var2 = var2_4 if year == 4

gen var3 = var3_1 if year == 1
replace var3 = var3_2 if year == 2
replace var3 = var3_3 if year == 3
replace var3 = var3_4 if year == 4

* Now we just need to drop the extra variables.

drop var?_?

************************************************
* This can also be accomplished with the reshape command:

clear
set obs 1000

gen id = _n

forv i=1/4 {
  gen var1_`i' = rnormal()
  gen var2_`i' = rnormal()
  gen var3_`i' = rnormal()
  gen var`=`i'+3' = rbinomial(1,.5)
}

order _all, alphabetic

reshape long var1_ var2_ var3_, i(id)

* The tricky thing to remember with reshape is that it requires exact syntax on your variables to be converted.

* That is, reshape is looking for a number at the end of each variable name.  This number it will turn into the j variable.

No comments:

Post a Comment