Thursday, June 21, 2012
Social network analysis simulation - contagious depression
* Social network analysis simulation - contagious depression
* We would like to simulate a panel data set in which we have students who might be related to each other as friends.
* If one of the persons becomes depressed then that person will increase the likelihood of friends becoming depressed.
***** *****
* Model Parameters *
***** *****
clear
* Set the number of students to be simulated
local Num_students = 50
* Set the minimum probability that any two people are friends despite distance
local min_friend_prob = .01
* Set the max friend probability
local max_friend_prob = .75
* Set the distance coefficient. P(j&i being Friends)=max(`max_friend_prob'-`alpha'*distance(i,j),`min_friend_prob')
local alpha = 1
* Initial likelihood of facing depression P(depression)~base_likelihood + beta*#_friends_w_depression
local base_likelihood=.15
local beta = .1
***** *****
* Generate Data *
***** *****
set obs `Num_students'
gen stud_id = _n
* Lets imagine there is uniform three dimensional space x, y, and z (-1/2,1/2) in which students are connected to each other.
* This space is unobservable yet for the purposes of this simulation let's give it labels.
gen x = runiform()-.5
label var x "Punk/Goth/Alternative (conformity scale)"
gen y = runiform()-.5
label var y "Socio-economic status (wealth scale)"
gen z = runiform()-.5
label var z "Attractiveness (physical appearance scale)"
* Now let's say the likelihood of any two people being friends is equal to (50% less the Euclidean distance from the other person with a minimum of 1%).
* To do this we will have a recursive loop that loops through all of the students and all of their potential "friendship" matches.
* First let's generate indicator variables to indicate if the student is friends with the other students
forv i=1(1)`Num_students' {
gen fri_`i' = 0
label var fri_`i' "This student is friends with student `i'"
}
* Count the number of friends
gen num_friends = 0
gen connection = 0
label var connection "The connection number of the current friendship"
local num_connections = 0
* Now let's generate friendship levels
forv i=1(1)`Num_students' {
* This recursively loops all of the students up to this point
forv j=1(1)`i' {
* A student cannot be friends with himself
if `i'!=`j' {
* calculate the euclidean distance between student i and j
local dist_ij = ((x[`i']-x[`j'])^2+(y[`i']-y[`j'])^2+(z[`i']-z[`j'])^2)^.5
* Calculate the probability that i and j are friends
local prob_ij = max(`max_friend_prob'-`alpha'*`dist_ij',`min_friend_prob')
* Draw a bernoulli result for friendship
local friends_ij = rbinomial(1,`prob_ij')
di "`i'&`j' are " string(`dist_ij',"%9.2f") " apart ; friends prob=" string(`prob_ij',"%9.2f") " ; they are friends <`friends_ij'> 1=yes, 0=no"
* This changes the variable indicators of how many friends students have.
qui if `friends_ij' == 1 {
local num_connections = `num_connections' + 1
replace num_friends = num_friends+1 if stud_id==`i' | stud_id==`j'
replace fri_`i' = 1 if stud_id==`j'
replace fri_`j' = 1 if stud_id==`i'
}
}
}
}
di "--- `num_connections' connections (friendships) made ---"
tab num_friends
* My prediction is that the farther students get from the centers in any/all of the categories the less friends they will have.
reg num_friends x y z
* We cannot see this prediction hold true with this specification.
* That is because the effect is mirrored whether x y and z get larger than zero or less than zero.
* This simple OLS should fail.
* However, if we break the data into two observations positive and negative then I think we will come up with more interesting results.
foreach i in x y z {
gen `i'_pos = max(0,`i')
label var `i'_pos "Variable `i''s positive values"
gen `i'_neg = min(0,`i')
label var `i'_neg "Variable `i''s negative values"
}
* Now let's see what the results yeild
reg num_friends *pos *neg
* This is because in a uniform distribution (like almost all distributions) the closest any observation can get to the expected position of all other observations is the center of the distribution.
***** *****
* Spread of depression *
***** *****
* We would like to simulate the spread of depression through our network.
* First let us calculate the original level of depression:
* Initial likelihood of facing depression P(depression)~base_likelihood + beta*#_friends_w_depression
gen depression = rbinomial(1,.15)
label var depression "Initial students depressed"
two (scatter y x if depression==0) (scatter y x if depression==1) , legend(label(1 "Healthy") label(2 "Depressed"))
gen depression2=0
* Next we will loop through every student to see if that student is not depressed if that student becomes depressed
forv i=1(1)`Num_students' {
* If the student is depressed skip that student
if depression[`i']==0 forv j=1(1)`Num_students' {
* This checks if j is friends with i
qui if fri_`j'[`i'] == 1 & depression[`j']==1 {
local depression_spread=rbinomial(1,`beta')
if `depression_spread' == 1 {
replace depression2=1 if _n==`i'
noi di "Student `i' becomes depressed as a result of `j''s depression"
break
}
}
}
}
two (scatter y x if depression==0) (scatter y x if depression==1) ///
(scatter y x if depression2==1) , legend(label(1 "Healthy") ///
label(2 "Depressed") label(3 "Newly depressed"))
***** *****
* Generate network graphs *
***** *****
* Can we plot one of those awesome social network graphs in Stata?
gl two_list
* Person
forv i=1(1)`Num_students' {
local x_pos = x[`i']
local y_pos = y[`i']
qui gen conct_x_`i' = `x_pos' if _n==1
qui gen conct_y_`i' = `y_pos' if _n==1
* This creates two variables for every person in the network.
* It allows in effect a separate network for each person.
local count_position = 1
forv j=1(1)`i' {
* This checks if j is friends with i
qui if fri_`j'[`i'] == 1 {
replace conct_x_`i' = x[`j'] if _n==`count_position'+1
replace conct_y_`i' = y[`j'] if _n==`count_position'+1
replace conct_x_`i' = `x_pos' if _n==`count_position'+2
replace conct_y_`i' = `y_pos' if _n==`count_position'+2
* This draws a connection from the person i to the person j before returning to person i.
local count_position = `count_position'+2
* This moves the position up 2 spaces to make space for the next expansion of the set.
}
gl two_list ${two_list} (line conct_y_`i' conct_x_`i', mcolor(gs8))
* This adds a entry in the list of graphs to be graphed for each person.
}
}
* Warning this graph can tak
two ${two_list} (scatter y x if depression==0) (scatter y x if depression==1) ///
(scatter y x if depression2==1) , legend(off)
* Note, this is not how network data is typically stored and created.
* A typical way of storing network data is as edgelists (see: http://en.wikipedia.org/wiki/Doubly_connected_edge_list).
* We can create an edge list from our data with the following code:
gen edge1 = .
gen edge2 = .
local count_position = 1
* This will loop through all of the students and add to the edge list one observation for every connection in each direction
forv i=1(1)`Num_students' {
forv j=1(1)`i' {
* This checks if j is friends with i
qui if fri_`j'[`i'] == 1 {
replace edge1 = `i' if _n==`count_position'
replace edge2 = `j' if _n==`count_position'
local count_position = `count_position'+1
noi di "edge1 edge2"
* This will makge sure we do not run out of observations in the data set to record our edges
if `count_position'==_N set obs `=`count_position'+1'
}
}
}
* netplot is a use written program that allows social network plots to be easily drawn from edgelist data.
* It works much faster than my network mapping code generated above.
cap netplot edge?
* If you do not have netplot installed the following code should install it.
if _rc!=0 {
ssc install netplot
netplot edge*
}
* This option allows network plots to be drawn as a circle.
cap netplot edge*, type(circle)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment