Thursday, November 29, 2012

Pre and Post Test Data Merge/Append

* Stata Script

* Imagine that you have two sets of data.

* 1 pre-test
* 2 current

* You would like to merge the data together.

* You have two options.

* 1 Create a tall data set (using the append command)
* 2 Create a wide data set (using the merge command)

* Both forms of joined data are common.  Let's see hwo to do this.

* First let's create our pretest data.

clear
set obs 1000

* We have some unique identifier on each person/student
gen ID = _n

gen score = rnormal()

* Now let's generate say 88 variables (v11 through v99)
forv i=11/99 {
  gen v`i' = runiform()
    label var v`i' "Random uniform variable `i' - pretest"
}

save pretest, replace

* Now we have our pretest data saved

* Let's create our "current" data

* Let's imagine that there is some kind of treatment
gen treatment = rbinomial(1,.3)

* It has a positive effect
replace score = score+.4*treatment

* Now let's generate say 88 variables (v11 through v99)
forv i=11/99 {
  local change = rbinomial(1,.1)
  * There is a 10% chance that one of your other variables will change

  if `change'==1 replace v`i' = v`i'+runiform()
      label var v`i' "Random uniform variable `i' - current"

}

save current, replace

clear

* END DATA SIMULATION
******************************************************************
* Begin file management

* Now we have two data sets with different variable in each.
* Let's start by appending the data together into a long file.

use pretest, clear

* First let's generate a variable to indicate pretest
gen phase = "Pretest"

* Now, let's append the data together:
append using current

sum
* We can see that now we have 2000 observations as expected.

* Now our data is in a tall format

* We might also be interested in making sure the "treatment" variable is duplicated for every observation.
bysort ID: egen treat = mean(treatment)

* Drop the old treatment variable
drop treatment
rename var treat treatment

clear
*************
* Alternatively, we may want to put our data into a long format.

* First we need to rename variables so that they will not be overwritten.
use pretest, clear

foreach v of varlist * {
  rename `v' `v'_0
}

* We need to make sure only that the merging variable keeps the same name.

rename ID_0 ID

save pretest_rename, replace

* Now we load the current test data.

use current, clear

* We will rename the variables in the current as well
foreach v of varlist * {
  rename `v' `v'_1
}

* Making sure to change ID_1 back to ID
rename ID_1 ID

* Now we merge the data together
merge 1:1 ID using pretest_rename

* Now our data is wide.  Note, the reshape command could be use to change data from wide to tall or tall to wide.

* It might be useful to order all of the variables from before and after next to each other.

order *, alphabetic

1 comment:

  1. Just great here, keep sharing! I will look out more from that.

    ReplyDelete