## Thursday, June 21, 2012

### Social network analysis simulation - contagious depression

* Social network analysis simulation - contagious depression

* We would like to simulate a panel data set in which we have students who might be related to each other as friends.

* If one of the persons becomes depressed then that person will increase the likelihood of friends becoming depressed.

*****          *****
* Model Parameters *
*****          *****

clear

* Set the number of students to be simulated
local Num_students = 50

* Set the minimum probability that any two people are friends despite distance
local min_friend_prob = .01

* Set the max friend probability
local max_friend_prob = .75

* Set the distance coefficient. P(j&i being Friends)=max(max_friend_prob'-alpha'*distance(i,j),min_friend_prob')
local alpha = 1

* Initial likelihood of facing depression P(depression)~base_likelihood + beta*#_friends_w_depression
local base_likelihood=.15
local beta = .1

*****       *****
* Generate Data *
*****       *****

set obs Num_students'

gen stud_id = _n

* Lets imagine there is uniform three dimensional space x, y, and z (-1/2,1/2) in which students are connected to each other.

* This space is unobservable yet for the purposes of this simulation let's give it labels.

gen x = runiform()-.5
label var x "Punk/Goth/Alternative (conformity scale)"

gen y = runiform()-.5
label var y "Socio-economic status (wealth scale)"

gen z = runiform()-.5
label var z "Attractiveness (physical appearance scale)"

* Now let's say the likelihood of any two people being friends is equal to (50% less the Euclidean distance from the other person with a minimum of 1%).

* To do this we will have a recursive loop that loops through all of the students and all of their potential "friendship" matches.

* First let's generate indicator variables to indicate if the student is friends with the other students
forv i=1(1)Num_students' {
gen fri_i' = 0
label var fri_i' "This student is friends with student i'"
}

* Count the number of friends
gen num_friends = 0

gen connection = 0
label var connection "The connection number of the current friendship"
local num_connections = 0

* Now let's generate friendship levels
forv i=1(1)Num_students' {
* This recursively loops all of the students up to this point
forv j=1(1)i' {
* A student cannot be friends with himself
if i'!=j' {
* calculate the euclidean distance between student i and j
local dist_ij = ((x[i']-x[j'])^2+(y[i']-y[j'])^2+(z[i']-z[j'])^2)^.5
* Calculate the probability that i and j are friends
local prob_ij = max(max_friend_prob'-alpha'*dist_ij',min_friend_prob')
* Draw a bernoulli result for friendship
local friends_ij = rbinomial(1,prob_ij')
di "i'&j' are " string(dist_ij',"%9.2f") " apart ; friends prob=" string(prob_ij',"%9.2f") " ; they are friends <friends_ij'> 1=yes, 0=no"

* This changes the variable indicators of how many friends students have.
qui if friends_ij' == 1 {
local num_connections = num_connections' + 1
replace num_friends = num_friends+1 if stud_id==i' | stud_id==j'
replace fri_i' = 1 if stud_id==j'
replace fri_j' = 1 if stud_id==i'
}
}
}
}

di "--- num_connections' connections (friendships) made ---"

tab num_friends

* My prediction is that the farther students get from the centers in any/all of the categories the less friends they will have.
reg num_friends x y z

* We cannot see this prediction hold true with this specification.

* That is because the effect is mirrored whether x y and z get larger than zero or less than zero.

* This simple OLS should fail.

* However, if we break the data into two observations positive and negative then I think we will come up with more interesting results.

foreach i in x y z {
gen i'_pos = max(0,i')
label var i'_pos "Variable i''s positive values"
gen i'_neg = min(0,i')
label var i'_neg "Variable i''s negative values"
}

* Now let's see what the results yeild
reg num_friends *pos *neg

* This is because in a uniform distribution (like almost all distributions) the closest any observation can get to the expected position of all other observations is the center of the distribution.

*****              *****
*****              *****

* We would like to simulate the spread of depression through our network.

* First let us calculate the original level of depression:
* Initial likelihood of facing depression P(depression)~base_likelihood + beta*#_friends_w_depression

gen depression = rbinomial(1,.15)
label var depression "Initial students depressed"

two (scatter y x if depression==0) (scatter y x if depression==1) , legend(label(1 "Healthy") label(2 "Depressed"))

gen depression2=0

* Next we will loop through every student to see if that student is not depressed if that student becomes depressed
forv i=1(1)Num_students' {
* If the student is depressed skip that student
if depression[i']==0 forv j=1(1)Num_students' {
* This checks if j is friends with i
qui if fri_j'[i'] == 1 & depression[j']==1  {
if depression_spread' == 1 {
replace depression2=1 if _n==i'
noi di "Student i' becomes depressed as a result of j''s depression"
break
}
}
}
}

two (scatter y x if depression==0) (scatter y x if depression==1) ///
(scatter y x if depression2==1) , legend(label(1 "Healthy") ///
label(2 "Depressed") label(3 "Newly depressed"))

*****                 *****
* Generate network graphs *
*****                 *****

* Can we plot one of those awesome social network graphs in Stata?
gl two_list

* Person
forv i=1(1)Num_students' {
local x_pos = x[i']
local y_pos = y[i']
qui gen conct_x_i' = x_pos' if _n==1
qui gen conct_y_i' = y_pos' if _n==1
* This creates two variables for every person in the network.
* It allows in effect a separate network for each person.

local count_position = 1

forv j=1(1)i' {
* This checks if j is friends with i
qui if fri_j'[i'] == 1  {
replace conct_x_i' = x[j'] if _n==count_position'+1
replace conct_y_i' = y[j'] if _n==count_position'+1
replace conct_x_i' = x_pos' if _n==count_position'+2
replace conct_y_i' = y_pos' if _n==count_position'+2
* This draws a connection from the person i to the person j before returning to person i.
local count_position = count_position'+2
* This moves the position up 2 spaces to make space for the next expansion of the set.
}
gl two_list ${two_list} (line conct_y_i' conct_x_i', mcolor(gs8)) * This adds a entry in the list of graphs to be graphed for each person. } } * Warning this graph can tak two${two_list} (scatter y x if depression==0) (scatter y x if depression==1) ///
(scatter y x if depression2==1) , legend(off)

* Note, this is not how network data is typically stored and created.

* A typical way of storing network data is as edgelists (see: http://en.wikipedia.org/wiki/Doubly_connected_edge_list).

* We can create an edge list from our data with the following code:

gen edge1 = .
gen edge2 = .

local count_position = 1

* This will loop through all of the students and add to the edge list one observation for every connection in each direction
forv i=1(1)Num_students' {
forv j=1(1)i' {
* This checks if j is friends with i
qui if fri_j'[i'] == 1  {
replace edge1 = i' if _n==count_position'
replace edge2 = j' if _n==count_position'
local count_position = count_position'+1
noi di "edge1 edge2"
* This will makge sure we do not run out of observations in the data set to record our edges
if count_position'==_N set obs =`count_position'+1'
}
}
}

* netplot is a use written program that allows social network plots to be easily drawn from edgelist data.
* It works much faster than my network mapping code generated above.
cap netplot edge?
* If you do not have netplot installed the following code should install it.
if _rc!=0 {
ssc install netplot
netplot edge*
}

* This option allows network plots to be drawn as a circle.
cap netplot edge*, type(circle)