* Imagine that you have two sets of data.
* 1 pre-test
* 2 current
* You would like to merge the data together.
* You have two options.
* 1 Create a tall data set (using the append command)
* 2 Create a wide data set (using the merge command)
* Both forms of joined data are common. Let's see hwo to do this.
* First let's create our pretest data.
clear
set obs 1000
* We have some unique identifier on each person/student
gen ID = _n
gen score = rnormal()
* Now let's generate say 88 variables (v11 through v99)
forv i=11/99 {
gen v`i' = runiform()
label var v`i' "Random uniform variable `i' - pretest"
}
save pretest, replace
* Now we have our pretest data saved
* Let's create our "current" data
* Let's imagine that there is some kind of treatment
gen treatment = rbinomial(1,.3)
* It has a positive effect
replace score = score+.4*treatment
* Now let's generate say 88 variables (v11 through v99)
forv i=11/99 {
local change = rbinomial(1,.1)
* There is a 10% chance that one of your other variables will change
if `change'==1 replace v`i' = v`i'+runiform()
label var v`i' "Random uniform variable `i' - current"
}
save current, replace
clear
* END DATA SIMULATION
******************************************************************
* Begin file management
* Now we have two data sets with different variable in each.
* Let's start by appending the data together into a long file.
use pretest, clear
* First let's generate a variable to indicate pretest
gen phase = "Pretest"
* Now, let's append the data together:
append using current
sum
* We can see that now we have 2000 observations as expected.
* Now our data is in a tall format
* We might also be interested in making sure the "treatment" variable is duplicated for every observation.
bysort ID: egen treat = mean(treatment)
* Drop the old treatment variable
drop treatment
rename var treat treatment
clear
*************
* Alternatively, we may want to put our data into a long format.
* First we need to rename variables so that they will not be overwritten.
use pretest, clear
foreach v of varlist * {
rename `v' `v'_0
}
* We need to make sure only that the merging variable keeps the same name.
rename ID_0 ID
save pretest_rename, replace
* Now we load the current test data.
use current, clear
* We will rename the variables in the current as well
foreach v of varlist * {
rename `v' `v'_1
}
* Making sure to change ID_1 back to ID
rename ID_1 ID
* Now we merge the data together
merge 1:1 ID using pretest_rename
* Now our data is wide. Note, the reshape command could be use to change data from wide to tall or tall to wide.
* It might be useful to order all of the variables from before and after next to each other.
order *, alphabetic
Just great here, keep sharing! I will look out more from that.
ReplyDelete