Thursday, December 10, 2015

Cause of Death: Melanin | Evaluating Death-by-Police Data

Widespread attention towards the death of black men by police has sparked protests and public outrage in many a city. Recently, the Federal Government has launched an investigation into the conduct of the entire Chicago police department. However, by protests and videos popping up all over the country, the apparent problems of police abuse do not appear to be contained.

As part of this movement, Black Lives Matters (BLM) protesters have drawn much attention from the media over the last couple of years as they have caused large protests in cities such as Toronto, Chicago, San Francisco, and Ferguson. These protests have also drawn international attention especially in how they led to riots in Baltimore, Maryland.

How BLM has been received by the general American public has been mixed. Unfortunately, the violence of the protest has distracted from the message that the BLM movement has been attempting to convey. That is, that black people are disproportionately likely to be abused or killed by police than their white counterparts and that this abuse is unjustified.

But is this true? The answer is not immediately obvious as data about people getting killed by police is not systematically collected by any federal agency as police departments report deaths only voluntarily to the FBI. What partially comprehensive data exists, is instead collected by the efforts of activists such as those at FatalEncounters (, see discussion of data issues below).

The folks at FatalEncounters say that their data should not be used to analyze racial differences due to the high rate of incompleteness of reports regards race. However, making two simple assumptions we are able to start making a lot of inferences from our data (see discussion of assumption issues below):
A1. Data is missing at random without regard to the race.
A2. Data is missing at random without regard to state.

Before getting any further into this I would like to point out that the likelihood of an individual officer being involved in a killings is relatively rare (9% over the last 16 years or 0.56%/year [1]). Furthermore, without stretching truth we can probably assume the vast majority of such are justifiable to varying degrees.

Thus the average person probably should not be concerned for one's life when encountering police officers. That said, what is the case for the average person might not be the case for all segments of the population. The BLM movement would argue that young black males should be justifiably much more concerned about being killed by police than other segments of the population. So let's get into the data and see what we can find.

Q1. Are black males more likely to be killed by police than their white counterparts?

Table 1:  Shows that the relative likelihood, for a 16-30 year old, the likelihood of being killed nationally proportional to your representation within the population is 7 times greater than that of a white person (2.1/.3) using assumption A1. Loosening the assumption and assigning all of the unknown race kills to that of white people (an extreme assumption) we get column Kill2. Even under this extreme assumption, the likelihood of being killed by a police officer is 2.3 times greater for a black person than that of a white person (2.1/.9).

     Race  Kills  Kill%  Pop%  Kill%/Pop%  | Kill2 Kill2/Pop%
1   black    937    27%   13%         2.1  |   27%        2.1
2 unknown   1202    35%    0%           -  |    0%          -
3   white    712    21%   63%         0.3  |   56%         .9
4   other     79     2%    8%         0.2  |    2%         .2
5  latino    527    15%   17%         0.9  |   15%         .9

Q2. How does the risk of being killed by police for black people relative to that of white people vary by state?

Figure 1: The relative likelihood of a black person being killed in states to that of a white person. Grayed out states are those in which no black people were killed by police.
From figure 1 we can see that the relative risk of being killed by police vary significantly by state. Surprisingly those states south of the Mason-Dixon line seem to have more equalized rates of being killed by police than some of those states north of the line with a Illinois, Ohio, and Massachusetts in which the likelihood of being killed by police for a black person is over 7 times higher than that for a white person.

Q3. Is poverty the driving force causing black people to be at higher risk than their white counterparts?

Figure 2: The relative risk of being poor for black people relative to white against that of being killed for black people relative to that of white.

From Figure 2 we can see that there does appear to be a relationship between the risk of being black and poor and the risk of being killed by a police officer relative to that of your white counterpart (with the exception of OH and IL which have much higher risk than should otherwise be expected). If you regress relative risk of being killed by relative risk of being poor and relative risk of failing to complete high-school, for each 1 times more likely a black person is of being poor that person is 2.8 times higher risk of being killed by police.
Figure 3: The relative risk of being poor mapped against the relative rates of high school completion.
From Figure 3 we can see that poverty and failure to complete high-school are correlated with some states such as Minnesota and Wisconsin having less than 25% completion rates among blacks relative to their white peers while simultaneously having the highest rates of poverty among blacks relative to white in all of the data.

Q4. So is it wise for black people to move to southern states if they would like to avoid police violence?

Figure 4: The overall risk of anybody getting killed by police.
From Figure 4 we can see the story is not quite so simple. Louisiana, Mississippi, Alabama, and Nevada, four states which had the lowest relative risk for black people to be killed by police have some of the highest rates of any state in terms of likelihood of any resident of the state being killed. In contrast, Illinois, Indiana, and Vermont now have some of the lower rates in the country in terms of likelihood of being killed by a police officer.

Figure 5: Likelihood of a black person being killed by police.
From Figure 5 we can see that black people in the Northwest of the United States have the lowest possible chance of being killed by the police since in the last 16 years there is no record of any deaths by police. However, this might be comforting to few since the population of black people in this part of the country is very small. Strangely Vermont and Nevada both have very high rates of deaths for black people. The high rate in Vermont could be explained by the state having the second smallest black population while the even higher rate in Nevada cannot easily be explained as it has the 19th largest population of black people in the country.

Q5: What about white people?

Figure 6: Likelihood of a white person being killed by police.
From Figure 6 we can see that Nevada is also the highest risk state for being killed by police. Interestingly, states above the Mason-Dickson line now seem to be the safest place for white people when it comes to being at risk from the police. Overall, we should not be looking at Figure 6 compared with Figure 5 and saying "oh well white people are killed by the police also".

The reason is that we need to keep the scale of the legend in mind. In Figure 5 the scale goes all the way up to nearly 20 deaths per 100 thousand people while in Figure 6 the scale goes only to around 8 deaths per 100 thousand people. This means that consistent with the national level data, black people are significantly more likely to be killed by police than white people.

Q6: So what? Black people are at higher risk of committing crime as well! Aren't they just getting their fair share of the risk inherent in criminal activity?

The blog FactCheck discusses this point, especially with regards the higher rate of black males killed by police officers than that of white males potentially due to black males being much more likely than white males of engaging in violent crime. I will not discuss at this time theories as to why black people might be at a higher risk of committing crimes except to say that I think poverty is a better explanation for these differences.

That said, I think the Fatal Encounters data can get us a further into examining the nature of police violence towards black people than the FBI data used by FactCheck. In particular the Fatal Encounter data set has a brief description of the circumstances involved in the deaths. The text in these circumstances can be quantified in order to help naive data explorers better understanding of what is going on.

If no racial discrimination exists in how police force is used against individuals then there should be an equal proportion of unarmed blacks killed as that of unarmed whites killed as from those killed. If however, black who are killed are more likely to be killed unjustifiably then we should see that evidenced in the increased likelihood of being killed .

In the data there are 301 cases in which "unarmed" appears in connection with a death resulting from interaction with a police officer. 104 of the 1934 black people killed were unarmed while 60 of the 2578 white people who died at the hands of police were killed. This means that the proportion of unarmed people killed by police among blacks (104/1934=5.4%) is 2.3 times greater than that of white people killed by police (60/2578=2.3%). In probability notation:

"Unarmed": 315 Cases
P(Unarmed|Black & Killed)/P(Unarmed|White & Killed)=2.3

In other words this means that the likelihood of police killing someone who is unarmed is 130% greater when they are encountering a black person than when encountering a white person.

Now you might say, that this is just the case for the term "unarmed". However, it is really hard to find any term associated with police brutality which does not appear disproportionately higher for blacks than it does for whites.

Table 2: Table shows the number of cases and frequency by which key words were reported for black people's death relative to that of whites.

Descriptor          Cases  (Word|Black)/P(Word|White)
Unarmed             315     2.3
Naked               67      1.6
Toy                 41      3.6
Cooperative         23      1.7

Excessive Force

Age 9 or less       54      1.4
Age 10-15           131     3.1
Shot 10+ times      168     1.7
Wrongful Death      83      1.4
Indictment          106     1.5

Cause of Death

Trauma/beating/etc. 105     1.7
Taser               372     1.7
Asphyxiation        115     1.8
Medical Compli.     173     2.0
Gunshot            7450     1.0
Vehicle            1225     0.9

Justified Homicide keywords
"Reaching...gun"    109     1.6
"Robb(ed/ing)"     1006     1.4

Violent Crime Act
"Hostage"           160     0.8
"Standoff"          360     0.6

From Table 2 we can see that black people are 130% more likely to be killed unarmed, 60% more likely to be killed while naked, 260% more likely to be killed in association with a toy (probably a toy handgun), and 70% more likely to be killed even when the word "cooperative" was used to describe the situation.

Black people who are killed are are also 40% more likely to be reported as aged less than 10 and 210% more likely to be between 11 and 15 than their white counterparts. When killed black people are 70% more likely to be shot 10 or more times. Furthermore, black people who are killed are 40% more likely to give rise to a wrongful death suit and 50% more likely to result in some kind of indictment against the officers invovled.

When it comes to cause of death, black people are 70% more likely to die from trama such as beating, stabbing, falling, etc. while also being 70% more likely to have being tased as a contributing cause of death. Likewise, black people to die of asphyxiation is 80% more likely and medical complications 100% more likely than that of their white counterparts.

Overall, blacks are equally likely as whites to die by gunshot while being 10% less likely than whites to die by vehicle. The killing of black people is often justified as resulting from them either being in the act of robbing or stealing 60% more likely than whites or "reaching for a gun", 40% more likely than whites.

However among serious crimes that result in hostages blacks are 20% less likely to be killed while involved in and 40% less likely to develop a situation with police that results in a standoff ending in death.

Conclusion: So, what are we to take from this?

Black people are clearly much higher risk of being killed by police officers than that of white people. This statistics does not immediately lead to the conclusion that there is discrimination by police. However it does raise some red flags. Attempting to look more closely at state level data we can see in states above the Mason-Dixon line black people are often at much higher relative risk than their white peers of being killed by police while certain states below the Mason-Dixon line have generally much higher rates of citizenry being killed by police.

Critics of any kind of simple analysis of death by police for black people argue that black people are more likely to commit violent crimes and therefore should be more likely to suffer violent deaths. Those who live by the sword, die by the sword and all that.

If that were the case then we should expect higher likelihoods of deaths of black people by police than white (which is what we see) but there to be more or less equivalent rates or lower rates of black people getting killed accidentally or while vulnerable than that of their white counterparts. Vulnerability is difficult to measure but text flags for "unarmed", "naked", age less than 10 etc. or shot more than 10 times attempt to get at this.

In the data we see exactly the opposite with black people much more likely to be killed while vulnerable. We also see that the rates of unusual or brutal deaths (result of beatings, asphyxiation, trauma, excessive or ineffective taser usage, etc.)  resulting from police contact tend to be much higher among blacks.

We also see that black people seem to be at higher risk of dying while committing robbery or while "reaching of a gun" while whites are more likely to die while participating in more serious sounding situations such as taking a hostage or after a standoff with police.

Overall, the picture that the data paints is that of a population which is more likely to commit crimes but also much more likely to suffer excessive force resulting from police prejudice. If there were evidence that excessive force leads neighborhood reforms, lower crime, and rehabilitation then these kinds of actions might be seen as justifiable. However, there is no evidence that excessive force does anything but reinforce racial stereotypes resulting in more cyclical crime and poverty.

You can find my code to do your own analysis here. Apologize that it is messy. I have been working on this "quick one-day-post" for two weeks now and need to get it out so that I can focus on my dissertation work.

Footnote: [1] Total number of killings * Average number of police involved / Number of police nationally = 20,000 killings/16 years * lets say 4 police / 890,000 police = 9%/16 years = 0.56%/year. 
* Fatal Encounters Data Issues

This data has some obvious issues. Primarily that it is based on what can be gathered from spotty and inconsistent newspaper reports. Newspaper reports are problematic because they often fail to provide important information such as the ethnicity of victims or the cause of death or how the case was resolved such as if any disciplinary action was taken with regards the officers involved.

Yet even more problematic, newspaper data cannot be assumed to be complete. That is some portion of deaths are not reported or if they are reported the article is only kept for a limited time frame. FatalEncounters has attempted to remedy this problem by going through government records. However, as they say many government records are only required to be kept for only a few years. Since the records are so sparse in two indicators (race, disposition and mental state) they suggest not using this records for analysis.

This, however is cutting ourselves short as all we need is to make some basic assumptions in order to let the data work for us. The biggest issue with the data is that though it has nearly ten thousand cases, the data set maintainer D. Brian Burghart believes that it is about 46 percent complete. This is based on assuming a constant rate of deaths for all 16 years of data collection. For the two years with the most complete information 2013 and 2014 there are 1,257 and 1,292 deaths respectively.

* Simplifying assumption discussions

The first assumption might be debatable since some might argue that race might be more likely to be reported for black people or for white people. However, since our data deals with an abundance of black deaths relative to that of white the only type of under-reporting of race that might make our analysis invalid is that of under-reporting of white race. Fortunately we can test our analysis in that case by looking at what would happen if we assumed all unknown races were white. 

Race breakdown:
black latino other unknown white 1934 1122 213 3624 2578 

We can see that the number of unknowns is large but not that large. If we assume all of them are white people then this more than doubles the number of white people who died from police.

Wednesday, November 18, 2015

The Unreported War On America's Poor

The Democratic firebrand Bernie Sander's keeps harping on this point about income inequality in the United States, yet I have to wonder, how bad is it really and do we care?

First off, there is a legitimate reason to ask, if we should care. After all, throughout history, nations have gotten more wealthy with some people getting a bit more of the wealth over time yet things seem to keep going. So long as individuals feel that they can have reasonable access to opportunities for income growth and that a significant portion of the population is not living in poverty, things are okay, right?

Yes, but maybe there is a point at which income inequality can grow so vast and its influence over government so enormous that that which used to be a democracy ruled by the people becomes an oligarchy ruled by the few.

But politics is not what I am here to talk about. I am here to ask if inequality is getting worse or better and what are the effect of inequality on the less fortunate in society. In order to address this question, I will use US survey data from 1980, 1990, 2000, 2005, 2010, and 2013 sampling 200,000 individuals from each year.

The first figure is more or less what we would expect. We can see that the top 1% has increased in earning capacity massively since the 1980s. In terms of income in contemporary dollars income seems to be increasing throughout the population.
Figure 1: Family Average Income by Bracket. Looks like more or less everybody is increasing over time. Remember for the top 1%, incomes are top coded. This means that this is likely the lower limit of the true value for those top income makers. The red dotted line is median income. 

We can see that the income being made by the top 1% has grown significantly over time. Looking at the next graph we can see that compared with the median household income this growth has been huge.
Figure 2: Income relative to that of median household income.
From Figure 2 we can see that the top 1% have increased in earnings significantly since the 1980s from making on average a minimum around 4 times that of the median household per year to that of minimum 10 times. The reason I say minimum is because the census does not release true earnings numbers but rather median earnings which are a lower estimate on a strongly skewed value.

But what does this all meaning for buying power and the ability individuals have to pay for living expenses and that of their families?

Figure 3: Incomes by Bracket adjusting for inflation (based on Consumer Price Index). The red dotted line is median income.

Figure 3 starts to show us that things are not necessarily as good for everybody involved as they appear in Figure 1. First off, median income seems to be holding constant. Second, though it is difficult to see, the bottom 25% of families seem to be loosing income. How significant is that loss?

Figure 4: Total Family Income for individuals at different age groups.
Looking at Figure 4 we can see that for the poorest 25%, all age groups have experienced a dramatic reduction in total family income since the 1980s. Those 60+ are somewhat immune from the worst of these falls in income probably due to social support programs such as social security.

So is the cost of living decreasing to match the falling income?
Figure 5: The cost of rent for the bottom 25% of the population sliced by age group.

From Figure 5 we can see that the cost of rent has significantly increased since 1980 with all age groups being hit hard. Though information is not available in the census the cost of higher education has increased a staggering 538% since 1985 while medical costs have increased 286% relative to increases in the CPI of only 121% (Source).

So how has the decrease in earning power and increase in cost of living affected vulnerability to poverty?
Figure 6: The likelihood of being in poverty by age group at time of survey.
From Figure 6 we can see that the affect of decreased income coupled with increased costs has created a situation in which a higher proportion of the lower 25% are at risk of being classified in poverty as at any time since the 1980s. Different age groups suffer from different levels of risk though clearly minors are at the highest risk with over 70% of them being raised in poverty in 2013.

So what is the deal?

Why are America's less fortunate so much more vulnerable to poverty than before?

Are the poor giving up on school?
Figure 7: Highest level of attainment for the lowest 25% of the income spectrum.
From Figure 7 we can see that the lowest 25% of society has higher high-school and college completion rates than at any time ever in our data.

Is it just that people are poor because they can't find employment?

Figure 8: The likelihood of being poor if you work 40 hours or more a week.
From Figure 8 we can see that working 40 hours or more a week does not insulate one from poverty if you are among the bottom 50% of the poorest families. Whether you are a male or a female you are more likely to be in poverty while working full time today of any time since 1980.

Is there any way to measure the effect of poverty on the health of families and individuals within our dataset? Unfortunately the census data is surprisingly lacking any indicators mental state or that of physical or mental health. What small measure we can look at is that of marital status inferring that happy people are more likely to stay married while unhappy ones, especially those under a lot of stress are less likely to stay married.

Figure 9: Likelihood of being divorced by income bracket.
Now it seems to be frequently argued that marriage is under attack within society in general. However, when looking at Figure 9, we can see that marriage seems to be more under attack for the poorest segments of society with the bottom 25% more than 10x more likely to be divorced or separated at the time of the survey than that of the top 1%.

The decreased earning power of the bottom 25% is also reflected in a decreased ability to purchase and retain capital in the form of home ownership.
Figure 10: Home ownership over time by income bracket.
From Figure 10 we can see that home ownership, apart from a brief spike in 2005 has decreased dramatically for the bottom 20% of the population since 1980. If you look more recently, home ownership has decreased dramatically within the last decade for all segments of the population.


Democratic presidential candidate Bernie Sanders has claimed that income inequality is leading to the rich getting richer and the poor getting poorer. Nothing in the results presented here have contradicted this claim. Since 1980 the wealthiest segments of our population have increased in income level from making on average 4 times that of the median income to that of 10 times.

At the same time, the poorest 25% of society now how less purchasing power than they did in the 1980s while the cost of rent, education, and health care have all risen. Despite having the odds stacked against them, the poorest 25% are better educated than at any time in history.  Yet, their incomes have continued to falter while their divorce rates have grown steadily higher.

We must therefore conclude that the society we live in today is less fair to the poor than it was 35 years ago despite the earnest efforts of the poor to become more educated.

Data Source

The source of the data is from IPUMS-USA.

In order control the effect of disproportionate representations of ages across years each year sample has been reduced so that a constant proportions of all ages have been represented for each year.

To create these graphs on your own you can find my code here. You will need to request your own data from IPUMS.

Friday, November 13, 2015

What it means to be a US Veteran Today

Six easy graphs that tell a big story:

1. You represent a much small portion of the American people than veterans in the 1980s. The different lines represent different income quantiles with the 1st being the lowest income and the 4th being the highest income. We can see that veterans formerly had the highest representation among the top quartile. This has changed significantly since 2005 with the middle two quartiles representing the largest two bodies of veterans.

2. You currently have the highest risk of being classified as poor for any time period since 1980. Since 2005, the rate of poverty among veterans has nearly doubled. That makes you still better off than the general population but not by much.

 3. You are now less likely to own your own home than any time since 1980.

4. On a positive note, you are more likely to have completed high-school than at any other time in history.

5. On a not so positive note, as a veteran in 2010+ you are much less likely to have completed four or more years of college than those with no military service. Unfortunately the current world is not friendly to those without a college degree.

6. And to top it off. The strain of being a veteran has negatively affected your marriage. Starting in the 1990s and getting worse over time, the likelihood of being separated or divorced from your spouse is significantly higher than that of the no-military-service population.

So what is the takeaway?

Vote for a president you know is going to support your issues. 


In this quick analysis, I look at the census records of 470 thousand random adults between the ages of 18 and 65 sampled each of the years (1980,1990,2000,2005,2010,2013). The source of the data is from IPUMS-USA.

In order control the effect of disproportionate representations of ages across years each year sample has been reduced so that a constant proportions of all ages have been represented for each year.

PUMS-USA, University of Minnesota,

Find My Code Here

Friday, October 30, 2015

The Traveling Vampire Problem

Let's say you are a vampire and you would like to figure out the shortest route to visit the supple

necks of N maidens. But, there is only so much time in any night!

You can fly from location to location, ignoring barriers.

With a few maidens, the problem is trivial.

However, as you entice more and more maidens you find the task of route management increasingly complex.

You buy a computer but find that using a blanket search algorithm to check all possible routes quickly becomes very time consuming as each additional maiden is added to the optimization.

The problem you realize is that each additional maiden increases the number of routes significantly. This is because there is a number of routes is equal to the permutation of N select N = N!.

Four maidens, an easy problem.
So the number of routes:
1 maiden:   1=1
2 maidens: 1*2=2 (for example 1,2 or 2,1)
3 maidens: 1*2*3=6 (for example 1,2,3 or 1,3,2 or 2,1,3 or 2,3,1 or 3,2,1 or 3,1,2)
4 maidens: 1*2*3*4=24
5 maidens: 1*2*3*4*5=120
6 maidens: 1*2*3*4*5*6=720
7 maidens: 1*2*3*4*5*6*7=5,040
8 maidens: 1*2*3*4*5*6*7*8=40,320
9 maidens: 1*2*3*4*5*6*7*8*9=362,880
10 maidens:  1*2*3*4*5*6*7*8*9*10=3,628,800

As you start getting more and more maidens your algorithm to select the best route becomes extremely slow. You realize that using R your are going to face a practical limitation of spending as much time running the optimization as you will actually sucking necks. You know of Julia (which can run up to 500x faster than R) but you quickly realize that this is just postponing the problem. Even if you were running 500 times faster. Running the same algorithm on Julia is going to be four times faster after two more maidens (11*12/500=.26) but three times slower after 3 more maidens (11*12*13/500=3.4).
Seven Maidens. Getting a bit more tricky.

You consider hibernating for a hundred years to see if computational speed increases will simplify the problem but also realize that if you keep approaching the problem using a binary computer with the same strategies as previously, you will always face similar computational limits. Eventually, and even very far into the future you will run out of computer speed long before you run out of maidens.

Being a clever vamp, you decide to start looking into alternative strategies to solving this kind of problem. But that is for another day.


For what it is worth, I wrote a traveling vamp optimizer allowing for an arbitrary number dimensions to be specified. The most complex problem it solved was a 10 maiden problem and took a little over an hour.

Two solutions for a 10 maiden problem. Top is shortest route while bottom is longest.
Find the code here.

Friday, October 23, 2015

Diagnosing the Multilevel Marketting Trap: How MLM Survives Only through New Entrants

Over the years I have been amazed by how many friends of mine who seem otherwise very intelligent have gotten involved in Multilevel Marketing (MLM).

And, as most people who have been involved with these organizations, all of my friends involved in these organization have ended up after many hours and sometimes years of effort with less return for their efforts than what the paid to be part of the system. Many of these MLMs call participants "Independent Business Owners" (IBOs).

The problem with multilevel marketing is that it sounds like it should work!

You, the IBO, buy products at wholesale price that you are in turn sell to someone else at retail price. You get the majority of the revenue from the sales of these products. Some smaller portion of the revenue is distributed to the person who recruited you and some portion to the person who recruited that person all the way up to the top. Simple right?

All you have to do is go out there and start recruiting others and selling some goods yourself and before you know it you will have flocks of people underneath you selling and recruiting others and you can sit back and retire, right?

Pretty picture that just does not seem to work out for just about anybody.

But why?

The problem is that the system is inherently unstable. What one finds is that communities are rapidly saturated with IBOs. Once saturated, it is extremely difficult to recruit new IBOs from such a community. Not only that, but once recruited, IBOs who fail to recruit others risk becoming discouraged and dropping out of the system causing you to potentially become discouraged and drop out of the system.

Thus, MLM is really much more like a house of cards. The first ones in have a good position, but the later you enter the system the more vulnerable you become.

But don't take my word for it. You can simulate it fairly easily. (Well not so easy but I will show you how to. Find the code here)

The simulation works as follows:
1. First there is one IBO, that IBO is able to interact with a number of other agents. Some of those agents the IBO will sell to based on a random probability (in this case 20%) and some of those agents the IBO will infect or convert to other IBOs base on probability again (10%).
2. A successful sales yields a return of $10.
3. Each round costs $10 to remain an IBO.
4. An IBO also makes commission on sales from anybody that IBO has recruited or that IBO has recruited. The commission rate is 25% meaning each upstream person makes an extra 25% on the sales of the downstream person.
5. Each IBO actively engages in sales and recruiting and meets 30 people per time period.
6. Once approached by one IBO in a time period additional approaches will be all rejected for that time period.
7. IBOs cannot recruit or sell to each other or former IBOs (defined as "immune" from the MLM)
8. Each encounter costs the IBO $1 of effort.

Replacement Rate:
In order to give a "fair" representation of the model we must allow for replacement of the population with new population. This of course is only fair in the long run as the replacement rate in the US and most industrialized societies is less than 2% per year. This of course assumes that parents do not pass on immunity to their young.
A. No Replacement: 0%
B. Replacement: 2%
C. Replacement: 20%

Other Factors:
Which can be varied but do not affect the big picture. Results not presented here.
1. Sales rate: You can change how effective individuals are at selling dramatically without having any significant effect on the long term sustainability of the model. If individuals are less effective at selling then participation will increase vulnerability to dropout within the MLM model resulting in faster burn-out rates.
2. Recruitment rate: You can increase the recruitment rate or decrease it and this will only cause the population of IBO to peak either faster or slower. Either way always leading to the same end result eventually.
3. Population size. Increasing the population larger than 100 thousand tends to slow down my machine but I did run it once with a population of 1 million up to round 40. The result is nearly identical to that of a population of 100 thousand. The only difference is that the peak recruitment occurs one round later. Because the MLM model relies upon an exponential growth model you would need to continue to increase the size of the population exponentially each round to make it sustainable.
4. Return from sales: It does not matter how much profit each sales is worth, eventually the market will get saturated and people will start dropping out.
5. Encounters per round: Like sales rate and recruitment rate this only changes the rate at which the system peaks but does not change the end result.

So here are graphs illustrating the results by "level" within the organization or rather time period of entry into the model.


Results: Replacement of 0%

Figure 1: This graph shows how many IBO's exist at each level during each time period. The first period of a each level is by definition the maximum number of IBOs in that level. The majority of the time they drop out of the model. In period around 12-15 there are the most new IBOs. But as the market gets saturated, they quickly drop out.

Figure 2: And why drop out, you ask? The problem is that the market is saturated. The above graph shows anybody who has not entered by period 6 is going to have difficulty making a living. Remember the dropout conditions are pretty strict. You need to loose $40 (your participation fee $10 and $30 in sales efforts) during a period in order to drop out. This only occurs if you pay you mandatory membership see and fail to make a single sale.
Figure 3: But what about the "top" people in their level? Aren't they doing well. The above graph shows the maximums for each level in terms of profitability. It is pretty much the same picture as the last. Unless you entered very early, you have to be very lucky to make any money. Remember that for these late entrants, there are thousands of them and this is the maximum profitability for the entire level.
Figure 4: But some people are making money! How much money is made by each group of people in total? The above graph shows that the total amount of money for each level over the entire 50 periods of the simulation changes rather abruptly around period 11. This is the point when the market gets saturated. After that point people who enter are extremely unlikely to make any money.

Results: Replacement of 2%

Fine! But there is replacement in the world! New young people grow up and enter the system and replace those who have dropped out of the system. How does that change things?

Figure 5: At a 2% population replacement rate we can see that recruitment while still down from early peak is more robust in the later periods of the model.

Figure 6: Does this mean that late entries are going to do better? This figure shows clearly that on average late entries are still going to do very poorly. The fact is that the market is saturated and clearing up 2% new replacement is not going to significantly affect the long term outcomes.
Figure 7: What about the best seller/recruiter? The story is basically the same even with some replacement. Even the very best seller/recruiter is going to struggle if they did not enter as one of the top 4 levels.
Figure 8: So what about the total profitability of those new waves of recruits? The story is essentially the same as with no replacement. Up to a certain point (round 11) there is money to be made but after that the people who generally loose money. The overall profitability of the model is unsustainable.

Results: Replacement of 20%!

So, is the problem one of replacement? Perhaps if we had higher replacement rates then the model would work? For the following graphs, I set the replacement rate at 20%. This is a huge number but perhaps not impossible if you are thinking about developing nations where people are rapidly moving from poverty to middle class.
Figure 9: Does high replacement make the model more sustainable? This figure shows that this is not the case! The problem is that the market becomes saturated still in round 12-15 and though 20% of the population gets replaced, it remains saturated leading to massive and systematic dropouts in every period as new entrants find it nearly impossible to make sales.
Figure 10: Well what about for those who stick it out? Does it become any easier? This figure answers that question. No, if you did not enter within the first 5 levels population replacement is not going to help. Why? Because the market is saturated! No matter how successful you are, the people you recruit will compete with the ones they recruit and in the end very few are going to make sales.
Figure 11: What about the best seller/recruiter in each period? The story is largely the same. There are the guys with big homes and fancy cars on the top and that is because they entered early. Nearly everybody else is going to struggle with thousands of others to make very little income. And remember, there are a lot of new faces constantly entering the market to compete with (See Figure 9).

Figure 12: And this is where it becomes clear that this model is just that of a pyramid scheme. Except for a very few IBOs who enter early, the vast majority of IBOs are paying a lot of out of pocket costs and time in order to loose money overall! 


Though the MLM business model is inherently and irredeemably broken, there are hundreds of MLM business out there in the world "making" billions of dollars. Searching for "multilevel marketing" on Google you will find a lot of people who say they only lost money, time, and friends by participating in MLM but they still generally believe that if you stick it out and work hard enough it could have worked. They have bought into the lie that if they were just better at selling or better at recruiting they could have been successful.

But the truth that these simulations have convinced me of is that this is not the case. Those who are going to make money from MLM have already established themselves within the organization in order to make money. Everybody else, unfortunately are the suckers who do the hard work of making sales and recruiting new suckers.

You will hear people make analogies about how some established industries such as insurance are also commission based systems and therefore just another form of MLM. However, I have interacted with many insurance agents over the years and none of them have ever tried to recruit me to be an insurance agent as well.

Going into this project, I was uncertain about how "bad" MLM was. Now, I am certain. The only way MLM models are marginally sustainable is by continually feeding new bodies into the machine. This results in early entrants making big money and everybody else loosing out.

But don't take my word for it! Play with my simulation! Tweak the parameters however you want. The only way you make the system sustainable is if you make the cost of attempting to sell or recruit equal to zero and you set the cost of membership equal to zero. Of course these assumptions do not and can not reflect the real world or any MLM model!

Thus, there is no possible way to build a model that allows late entrants to be profitable!

Monday, October 12, 2015

Debunking Magical 9s - Numerology or Numerical Illiteracy?

The other day one of my friends on facebook posted this video about the mystical nature of the number 9. Being a skeptical of all things hokey, I decided to experiment with numbers to see how special 9 really is.

There are four claims made in the video about the magic of nine.
1. Partition a circle as many times and its digits add up to nine
2. Add the sides of a regular polygon together and their digits sum to 9
3. Add all of the digits up to 9 and they sum to 9
4. Add 9 to any other digit and it returns that digit.

In this post I will address all of these patterns and demonstrate conclusively that 9 is not special but only a feature of using a base 10 system (that is we count to 9 before starting over again for example 9 10 11 or 19 20 21 etc).

This may seem like a silly issue to address. However, a lot of people have watched this video (6 million plus either on youtube or the original facebook post). In this post I will support my arguments by use of some custom R functions built for this task. You need not have R or run the functions to understand the results.

1: Magic 9 is embedded in the circle

At the beginning of the video, the author suggests that there is something special about 9 because when you divide the degrees of a circle in half all of the digits add up to 9. No only that but when you divide each of those halves each digit adds up to nine.

360   ... 3+6+0=9
180   ... 1+8+0=9
90     ... 9+0=9
45     ... 4+5=9
22.5  ... 2+2+5=9

Up to double digits which you then add together. So the pattern continues.

5.625 ... 5+6+2+5=18 ... 1+8=9

At 150 splits (ignoring 0s and decimals) you get a series of numbers that look like this:


And when you add them all together they equal: 396 ... 3+9+6=18 ... 1+8=9

So, as far as I am willing to explore, the pattern seems to hold.

First look at this, I said, "okay, but is this pattern just for 9s?"

After some exploration I came to the conclusion "yes" (at least with a base 10 system). There seems to be some other patterns but nothing as straightforward as the 9s.

In order to accomplish this efficiently I programmed a "splitter" algorithm in R. This will split any number in half, take the digits and add them together. See below:

splitter <- function(X, it, nest, noisy=TRUE) {
  Ys <- matrix(NA, nrow=it, ncol=nest)
  esum <- function(x) 
    x %>% toString %>% sub(".", "", ., fixed = TRUE) %>% strsplit("") %>% unlist %>% as.numeric
  for (i in 0:(it-1)) {
    x <- as.bigz(X)
    x <- x*10^(i)/2^i
    Y <- x %>% esum
    if (noisy) print(sprintf("%s: %s -> sum(%s)=%s",i, x ,paste(Y, collapse=" "), sum(Y)))
    Ys[i+1, 1] <- sum(Y)
    for (j in 2:nest) Ys[i+1, j] <- Ys[i+1, j-1] %>% esum %>% sum
# So let's first examine 9
splitter(X=9, it=150, 3)
# The first column is the sum of the digits
# The second column is the sum of the sum of the digits
# The third column is the sum of the sum of the sum of the digits
# Yes the first 4 halves produce a situation in which the digits all add up to 9
# If you combine the next 5 through 30 splits then the sum of the sum must be 
# added together to also produce the designated 9.
# As we get deeper there is no reason to suspect that this will not carry to the
# next level.
splitter(8, 50, 3, noisy = FALSE)
splitter(7, 50, 3, noisy = FALSE)
splitter(6, 50, 3, noisy = FALSE)
splitter(5, 50, 3, noisy = FALSE)
splitter(4, 50, 3, noisy = FALSE)
splitter(3, 50, 3, noisy = FALSE)
splitter(2, 50, 3, noisy = FALSE)
splitter(1, 50, 3, noisy = FALSE)
# Looking at 1-8 we do not ever get the same number out as with 9.
# Does this make 9 unique, special, or even magical?
Created by Pretty R at

So, does this mean there is something to 9s? Well, maybe, but maybe it is a pattern that naturally emerges because we are using a base 10 system. What would happen if we switched to base 9? Or base 8?

In order to test this idea, I first programmed a function to switch numbers from base 10 to any other base.

base10to <- function(x, newbase=10, sep='') {
  if (length(dim(x))==0) xout <- rep("", length(x))
  if (length(dim(x))==2) xout <- matrix("", dim(x)[1], dim(x)[2])
  for (j in 1:length(x)) {
    x2 <- x[j]
    digits <- ((1+x2) %>% as.bigz %>% log(newbase) %>% floor)
    d <- rep(NA, digits+1)
    for (i in 0:(digits))  {
      d[i+1] <- (x2/newbase^(digits-i)) %>% as.numeric %>% floor
      x2 <- x2-d[i+1]*newbase^(digits-i)
    xout[j] <- paste(d, collapse=sep)
x <- matrix(1:100, 10, 10)
base10to(x, 5)
base10to(x, 9)
base10to(x, 2) 
Created by Pretty R at

Seems to be working
Note, it does not work with decimals

Then I integrated it with my switcher:

# Now let's redefine our splitter allowing for non-base 10

splitter2 <- function(X, it, nest, noisy=TRUE, base=10) {
  Ys <- matrix(NA, nrow=it, ncol=nest)
  esum <- function(x, base) 
    x %>% base10to(base) %>% strsplit("") %>% unlist %>% as.numeric
  for (i in 0:(it-1)) {
    x <- as.bigz(X)
    x <- (x*10^(i)/2^i)
    Y <- x %>% esum(base)
    if (noisy) 
      print(sprintf("%s: %s -> sum(%s)=%s base %s",
                    i,   x,
                    paste(Y, collapse=" "), 
                    base10to(sum(Y), base),
    Ys[i+1, 1] <- sum(Y) %>% base10to(base)
    for (j in 2:nest) Ys[i+1, j] <- Ys[i+1, j-1] %>% as.numeric %>% esum(10) %>% 
      sum %>% base10to(base)
Created by Pretty R at

splitter2(9, 15, 3, noisy = TRUE)


      [,1] [,2]
 [1,] "9"  "9" 
 [2,] "9"  "9" 
 [3,] "9"  "9" 
 [4,] "9"  "9" 
 [5,] "18" "9" 
 [6,] "18" "9" 
 [7,] "18" "9" 
 [8,] "18" "9" 
 [9,] "27" "9" 
[10,] "36" "9" 
[11,] "45" "9" 
[12,] "36" "9" 
[13,] "45" "9" 
[14,] "45" "9" 
[15,] "45" "9"

splitter2(8, 15, 2, noisy = TRUE, base=9)


      [,1] [,2]
 [1,] "8"  "8" 
 [2,] "8"  "8" 
 [3,] "8"  "8" 
 [4,] "8"  "8" 
 [5,] "26" "8" 
 [6,] "26" "8" 
 [7,] "17" "8" 
 [8,] "17" "8" 
 [9,] "35" "8" 
[10,] "26" "8" 
[11,] "26" "8" 
[12,] "35" "8" 
[13,] "35" "8" 
[14,] "53" "8" 
[15,] "35" "8" 

splitter2(7, 15, 2, noisy = TRUE, base=8)

      [,1] [,2]
 [1,] "07" "07"
 [2,] "07" "07"
 [3,] "16" "07"
 [4,] "16" "07"
 [5,] "16" "07"
 [6,] "25" "07"
 [7,] "34" "07"
 [8,] "25" "07"
 [9,] "34" "07"
[10,] "34" "07"
[11,] "34" "07"
[12,] "43" "07"
[13,] "52" "07"
[14,] "52" "07"
[15,] "70" "07"

splitter2(6, 15, 2, noisy = TRUE, base=7)
      [,1] [,2]
 [1,] "6"  "6" 
 [2,] "6"  "6" 
 [3,] "6"  "6" 
 [4,] "6"  "6" 
 [5,] "24" "6" 
 [6,] "24" "6" 
 [7,] "24" "6" 
 [8,] "33" "6" 
 [9,] "33" "6" 
[10,] "24" "6" 
[11,] "33" "6" 
[12,] "33" "6" 
[13,] "51" "6" 
[14,] "60" "6" 
[15,] "51" "6"

splitter2(1, 15, 4, noisy = TRUE, base=2)

     [,1]    [,2]  [,3] [,4]
 [1,] "01"    "01"  "01" "01"
 [2,] "10"    "01"  "01" "01"
 [3,] "11"    "10"  "01" "01"
 [4,] "110"   "10"  "01" "01"
 [5,] "101"   "10"  "01" "01"
 [6,] "110"   "10"  "01" "01"
 [7,] "0111"  "11"  "10" "01"
 [8,] "1000"  "01"  "01" "01"
 [9,] "1100"  "10"  "01" "01"
[10,] "1101"  "11"  "10" "01"
[11,] "1011"  "11"  "10" "01"
[12,] "01111" "100" "01" "01"
[13,] "1101"  "11"  "10" "01"
[14,] "1110"  "11"  "10" "01"
[15,] "10001" "10"  "01" "01"

Now we can see that by changing the base, the same pattern emerges. With base 9, the "magic" number is 8. With base 8, 7 etc. All the way down to base 2 where the "magic" number is 1.

2. Sides of the polygon

What about the other claims put forward in the video? That is, that all sides of all polygons angles add up to 9. That is for a regular polygon
triangle 60+60+60=180...1+8+0=9
square 90+90+90+90=360...3+6=9
pentagon 108+108+108+108+108=540...5+4=9

We can think of any equilateral polygon forming n identical triangles. The tips of those triangles meet at the center with angle 360/n where n is the number of sides of the polygon. To find the other angles we reflect on knowing that the two other angles are equal and since all of the sides of a triangle add up to 180 we can figure their size is (180-360/n)/2. Nowever, the triangles formed by the polygons only represent half the triangle's edge so we need to double that giving:
$$\theta(n)=180-(360/n) =180(1-2/n)$$ with n being the number of sides.

We can see that this could be written: 9*20(1-2/n):

So the question is, using an alternative base system will we get the same pattern?

Let's try base 9 instead of 10. Let's define the angles of the polygon as now: 8*20(1-2/n) base 10. Making a circle now 320 degrees base 10 or 385 base 9. Now let's see about the sum of the sides.

triangle 53+53+53=160 base 10 or 187 base 9 ... 1+8+7 = 16 base 10 or 17 base 9 ... 1+7=8
square 80+80+80+80=320 base 10 or 385 base 9 ... 3+8+5 = 16 ... 8
pentagon 96+96+96+96+96=480 base 10 or 583 base 9 ... 5+8+3 = 16 ... 8

Hail the magical 8!

Do I need to do this again with a different base?

3. All of the digits less than 9 add up to 9 (1+2+3+4+5+6+7+8=36...3+6=9)

Base 10 magic 9:1+2+3+4+5+6+7+8=36 base 10 ... 3+6=9 YES!
Base 9 magic 8: 1+2+3+4+5+6+7=28 base 10 or 31 base 9... 3+1=4 nope!
Base 8 magic 7: 1+2+3+4+5+6=21 base 10 or 25 base 8 ... 2+5=7 YES!
Base 7 magic 6: 1+2+3+4+5=15 base 10 or 21 base 7 ... 2+1=3 nope!
Base 6 magic 5: 1+2+3+4=10 base 10 or 14 base 6 ... 1+4=5 YES!

I think the pattern is pretty obvious here as well.

4. Nine plus any digit returns that digit (9+7=16...1+6=7)

Base 10 magic 9: 9+5=14...1+4=5
Base 9 magic 8: 8+5=13 base 10 or 14 base 9...1+4=5
Base 8 magic 7: 7+5=12 base 10 or 14 base 8...1+4=5
Base 7 magic 6: 6+5=11 base 10 or 14 base 7...1+4=5
Base 6 magic 5: 5+5=10 base 10 or 14 base 6...1+4=5

Need I say more?

5. The Magical Marvelous Mystery of Base 10

Clearly, we can see that there is nothing special about 9. If there is any mystery, it is in the base 10 since as soon as we change the system from base 10 to base 9 the "magic" moves to another number.

We must ask ourselves therefore, "why we are using base 10?"

This is an excellent question! Other systems have developed such as the binary and corresponding hexidecimal system which are powerful systems that intuitively are more consistent than the tens system. With hexidecimals, everything can be written as a series of four binaries reducing all communication to 0 or 1 signals. If you think about it, this is a much more intuitive communication system than a 10 digit one. 

As for why we are using the 10 digit system. My best guess is that we are using base 10 because most people have 10 fingers and it is therefore easier to teach someone to count on a 10 digit system.

So, yes, 9 is special but only because 10 is special and that is only special because our hands developed in such a way as that we have typically have 5 fingers on each hand summing to 10.