Friday, October 30, 2015

The Traveling Vampire Problem

Let's say you are a vampire and you would like to figure out the shortest route to visit the supple

necks of N maidens. But, there is only so much time in any night!

You can fly from location to location, ignoring barriers.

With a few maidens, the problem is trivial.

However, as you entice more and more maidens you find the task of route management increasingly complex.

You buy a computer but find that using a blanket search algorithm to check all possible routes quickly becomes very time consuming as each additional maiden is added to the optimization.

The problem you realize is that each additional maiden increases the number of routes significantly. This is because there is a number of routes is equal to the permutation of N select N = N!.

Four maidens, an easy problem.
So the number of routes:
1 maiden:   1=1
2 maidens: 1*2=2 (for example 1,2 or 2,1)
3 maidens: 1*2*3=6 (for example 1,2,3 or 1,3,2 or 2,1,3 or 2,3,1 or 3,2,1 or 3,1,2)
4 maidens: 1*2*3*4=24
5 maidens: 1*2*3*4*5=120
6 maidens: 1*2*3*4*5*6=720
7 maidens: 1*2*3*4*5*6*7=5,040
8 maidens: 1*2*3*4*5*6*7*8=40,320
9 maidens: 1*2*3*4*5*6*7*8*9=362,880
10 maidens:  1*2*3*4*5*6*7*8*9*10=3,628,800

As you start getting more and more maidens your algorithm to select the best route becomes extremely slow. You realize that using R your are going to face a practical limitation of spending as much time running the optimization as you will actually sucking necks. You know of Julia (which can run up to 500x faster than R) but you quickly realize that this is just postponing the problem. Even if you were running 500 times faster. Running the same algorithm on Julia is going to be four times faster after two more maidens (11*12/500=.26) but three times slower after 3 more maidens (11*12*13/500=3.4).
Seven Maidens. Getting a bit more tricky.

You consider hibernating for a hundred years to see if computational speed increases will simplify the problem but also realize that if you keep approaching the problem using a binary computer with the same strategies as previously, you will always face similar computational limits. Eventually, and even very far into the future you will run out of computer speed long before you run out of maidens.

Being a clever vamp, you decide to start looking into alternative strategies to solving this kind of problem. But that is for another day.


For what it is worth, I wrote a traveling vamp optimizer allowing for an arbitrary number dimensions to be specified. The most complex problem it solved was a 10 maiden problem and took a little over an hour.

Two solutions for a 10 maiden problem. Top is shortest route while bottom is longest.
Find the code here.

Friday, October 23, 2015

Diagnosing the Multilevel Marketting Trap: How MLM Survives Only through New Entrants

Over the years I have been amazed by how many friends of mine who seem otherwise very intelligent have gotten involved in Multilevel Marketing (MLM).

And, as most people who have been involved with these organizations, all of my friends involved in these organization have ended up after many hours and sometimes years of effort with less return for their efforts than what the paid to be part of the system. Many of these MLMs call participants "Independent Business Owners" (IBOs).

The problem with multilevel marketing is that it sounds like it should work!

You, the IBO, buy products at wholesale price that you are in turn sell to someone else at retail price. You get the majority of the revenue from the sales of these products. Some smaller portion of the revenue is distributed to the person who recruited you and some portion to the person who recruited that person all the way up to the top. Simple right?

All you have to do is go out there and start recruiting others and selling some goods yourself and before you know it you will have flocks of people underneath you selling and recruiting others and you can sit back and retire, right?

Pretty picture that just does not seem to work out for just about anybody.

But why?

The problem is that the system is inherently unstable. What one finds is that communities are rapidly saturated with IBOs. Once saturated, it is extremely difficult to recruit new IBOs from such a community. Not only that, but once recruited, IBOs who fail to recruit others risk becoming discouraged and dropping out of the system causing you to potentially become discouraged and drop out of the system.

Thus, MLM is really much more like a house of cards. The first ones in have a good position, but the later you enter the system the more vulnerable you become.

But don't take my word for it. You can simulate it fairly easily. (Well not so easy but I will show you how to. Find the code here)

The simulation works as follows:
1. First there is one IBO, that IBO is able to interact with a number of other agents. Some of those agents the IBO will sell to based on a random probability (in this case 20%) and some of those agents the IBO will infect or convert to other IBOs base on probability again (10%).
2. A successful sales yields a return of $10.
3. Each round costs $10 to remain an IBO.
4. An IBO also makes commission on sales from anybody that IBO has recruited or that IBO has recruited. The commission rate is 25% meaning each upstream person makes an extra 25% on the sales of the downstream person.
5. Each IBO actively engages in sales and recruiting and meets 30 people per time period.
6. Once approached by one IBO in a time period additional approaches will be all rejected for that time period.
7. IBOs cannot recruit or sell to each other or former IBOs (defined as "immune" from the MLM)
8. Each encounter costs the IBO $1 of effort.

Replacement Rate:
In order to give a "fair" representation of the model we must allow for replacement of the population with new population. This of course is only fair in the long run as the replacement rate in the US and most industrialized societies is less than 2% per year. This of course assumes that parents do not pass on immunity to their young.
A. No Replacement: 0%
B. Replacement: 2%
C. Replacement: 20%

Other Factors:
Which can be varied but do not affect the big picture. Results not presented here.
1. Sales rate: You can change how effective individuals are at selling dramatically without having any significant effect on the long term sustainability of the model. If individuals are less effective at selling then participation will increase vulnerability to dropout within the MLM model resulting in faster burn-out rates.
2. Recruitment rate: You can increase the recruitment rate or decrease it and this will only cause the population of IBO to peak either faster or slower. Either way always leading to the same end result eventually.
3. Population size. Increasing the population larger than 100 thousand tends to slow down my machine but I did run it once with a population of 1 million up to round 40. The result is nearly identical to that of a population of 100 thousand. The only difference is that the peak recruitment occurs one round later. Because the MLM model relies upon an exponential growth model you would need to continue to increase the size of the population exponentially each round to make it sustainable.
4. Return from sales: It does not matter how much profit each sales is worth, eventually the market will get saturated and people will start dropping out.
5. Encounters per round: Like sales rate and recruitment rate this only changes the rate at which the system peaks but does not change the end result.

So here are graphs illustrating the results by "level" within the organization or rather time period of entry into the model.


Results: Replacement of 0%

Figure 1: This graph shows how many IBO's exist at each level during each time period. The first period of a each level is by definition the maximum number of IBOs in that level. The majority of the time they drop out of the model. In period around 12-15 there are the most new IBOs. But as the market gets saturated, they quickly drop out.

Figure 2: And why drop out, you ask? The problem is that the market is saturated. The above graph shows anybody who has not entered by period 6 is going to have difficulty making a living. Remember the dropout conditions are pretty strict. You need to loose $40 (your participation fee $10 and $30 in sales efforts) during a period in order to drop out. This only occurs if you pay you mandatory membership see and fail to make a single sale.
Figure 3: But what about the "top" people in their level? Aren't they doing well. The above graph shows the maximums for each level in terms of profitability. It is pretty much the same picture as the last. Unless you entered very early, you have to be very lucky to make any money. Remember that for these late entrants, there are thousands of them and this is the maximum profitability for the entire level.
Figure 4: But some people are making money! How much money is made by each group of people in total? The above graph shows that the total amount of money for each level over the entire 50 periods of the simulation changes rather abruptly around period 11. This is the point when the market gets saturated. After that point people who enter are extremely unlikely to make any money.

Results: Replacement of 2%

Fine! But there is replacement in the world! New young people grow up and enter the system and replace those who have dropped out of the system. How does that change things?

Figure 5: At a 2% population replacement rate we can see that recruitment while still down from early peak is more robust in the later periods of the model.

Figure 6: Does this mean that late entries are going to do better? This figure shows clearly that on average late entries are still going to do very poorly. The fact is that the market is saturated and clearing up 2% new replacement is not going to significantly affect the long term outcomes.
Figure 7: What about the best seller/recruiter? The story is basically the same even with some replacement. Even the very best seller/recruiter is going to struggle if they did not enter as one of the top 4 levels.
Figure 8: So what about the total profitability of those new waves of recruits? The story is essentially the same as with no replacement. Up to a certain point (round 11) there is money to be made but after that the people who generally loose money. The overall profitability of the model is unsustainable.

Results: Replacement of 20%!

So, is the problem one of replacement? Perhaps if we had higher replacement rates then the model would work? For the following graphs, I set the replacement rate at 20%. This is a huge number but perhaps not impossible if you are thinking about developing nations where people are rapidly moving from poverty to middle class.
Figure 9: Does high replacement make the model more sustainable? This figure shows that this is not the case! The problem is that the market becomes saturated still in round 12-15 and though 20% of the population gets replaced, it remains saturated leading to massive and systematic dropouts in every period as new entrants find it nearly impossible to make sales.
Figure 10: Well what about for those who stick it out? Does it become any easier? This figure answers that question. No, if you did not enter within the first 5 levels population replacement is not going to help. Why? Because the market is saturated! No matter how successful you are, the people you recruit will compete with the ones they recruit and in the end very few are going to make sales.
Figure 11: What about the best seller/recruiter in each period? The story is largely the same. There are the guys with big homes and fancy cars on the top and that is because they entered early. Nearly everybody else is going to struggle with thousands of others to make very little income. And remember, there are a lot of new faces constantly entering the market to compete with (See Figure 9).

Figure 12: And this is where it becomes clear that this model is just that of a pyramid scheme. Except for a very few IBOs who enter early, the vast majority of IBOs are paying a lot of out of pocket costs and time in order to loose money overall! 


Though the MLM business model is inherently and irredeemably broken, there are hundreds of MLM business out there in the world "making" billions of dollars. Searching for "multilevel marketing" on Google you will find a lot of people who say they only lost money, time, and friends by participating in MLM but they still generally believe that if you stick it out and work hard enough it could have worked. They have bought into the lie that if they were just better at selling or better at recruiting they could have been successful.

But the truth that these simulations have convinced me of is that this is not the case. Those who are going to make money from MLM have already established themselves within the organization in order to make money. Everybody else, unfortunately are the suckers who do the hard work of making sales and recruiting new suckers.

You will hear people make analogies about how some established industries such as insurance are also commission based systems and therefore just another form of MLM. However, I have interacted with many insurance agents over the years and none of them have ever tried to recruit me to be an insurance agent as well.

Going into this project, I was uncertain about how "bad" MLM was. Now, I am certain. The only way MLM models are marginally sustainable is by continually feeding new bodies into the machine. This results in early entrants making big money and everybody else loosing out.

But don't take my word for it! Play with my simulation! Tweak the parameters however you want. The only way you make the system sustainable is if you make the cost of attempting to sell or recruit equal to zero and you set the cost of membership equal to zero. Of course these assumptions do not and can not reflect the real world or any MLM model!

Thus, there is no possible way to build a model that allows late entrants to be profitable!

Monday, October 12, 2015

Debunking Magical 9s - Numerology or Numerical Illiteracy?

The other day one of my friends on facebook posted this video about the mystical nature of the number 9. Being a skeptical of all things hokey, I decided to experiment with numbers to see how special 9 really is.

There are four claims made in the video about the magic of nine.
1. Partition a circle as many times and its digits add up to nine
2. Add the sides of a regular polygon together and their digits sum to 9
3. Add all of the digits up to 9 and they sum to 9
4. Add 9 to any other digit and it returns that digit.

In this post I will address all of these patterns and demonstrate conclusively that 9 is not special but only a feature of using a base 10 system (that is we count to 9 before starting over again for example 9 10 11 or 19 20 21 etc).

This may seem like a silly issue to address. However, a lot of people have watched this video (6 million plus either on youtube or the original facebook post). In this post I will support my arguments by use of some custom R functions built for this task. You need not have R or run the functions to understand the results.

1: Magic 9 is embedded in the circle

At the beginning of the video, the author suggests that there is something special about 9 because when you divide the degrees of a circle in half all of the digits add up to 9. No only that but when you divide each of those halves each digit adds up to nine.

360   ... 3+6+0=9
180   ... 1+8+0=9
90     ... 9+0=9
45     ... 4+5=9
22.5  ... 2+2+5=9

Up to double digits which you then add together. So the pattern continues.

5.625 ... 5+6+2+5=18 ... 1+8=9

At 150 splits (ignoring 0s and decimals) you get a series of numbers that look like this:


And when you add them all together they equal: 396 ... 3+9+6=18 ... 1+8=9

So, as far as I am willing to explore, the pattern seems to hold.

First look at this, I said, "okay, but is this pattern just for 9s?"

After some exploration I came to the conclusion "yes" (at least with a base 10 system). There seems to be some other patterns but nothing as straightforward as the 9s.

In order to accomplish this efficiently I programmed a "splitter" algorithm in R. This will split any number in half, take the digits and add them together. See below:

splitter <- function(X, it, nest, noisy=TRUE) {
  Ys <- matrix(NA, nrow=it, ncol=nest)
  esum <- function(x) 
    x %>% toString %>% sub(".", "", ., fixed = TRUE) %>% strsplit("") %>% unlist %>% as.numeric
  for (i in 0:(it-1)) {
    x <- as.bigz(X)
    x <- x*10^(i)/2^i
    Y <- x %>% esum
    if (noisy) print(sprintf("%s: %s -> sum(%s)=%s",i, x ,paste(Y, collapse=" "), sum(Y)))
    Ys[i+1, 1] <- sum(Y)
    for (j in 2:nest) Ys[i+1, j] <- Ys[i+1, j-1] %>% esum %>% sum
# So let's first examine 9
splitter(X=9, it=150, 3)
# The first column is the sum of the digits
# The second column is the sum of the sum of the digits
# The third column is the sum of the sum of the sum of the digits
# Yes the first 4 halves produce a situation in which the digits all add up to 9
# If you combine the next 5 through 30 splits then the sum of the sum must be 
# added together to also produce the designated 9.
# As we get deeper there is no reason to suspect that this will not carry to the
# next level.
splitter(8, 50, 3, noisy = FALSE)
splitter(7, 50, 3, noisy = FALSE)
splitter(6, 50, 3, noisy = FALSE)
splitter(5, 50, 3, noisy = FALSE)
splitter(4, 50, 3, noisy = FALSE)
splitter(3, 50, 3, noisy = FALSE)
splitter(2, 50, 3, noisy = FALSE)
splitter(1, 50, 3, noisy = FALSE)
# Looking at 1-8 we do not ever get the same number out as with 9.
# Does this make 9 unique, special, or even magical?
Created by Pretty R at

So, does this mean there is something to 9s? Well, maybe, but maybe it is a pattern that naturally emerges because we are using a base 10 system. What would happen if we switched to base 9? Or base 8?

In order to test this idea, I first programmed a function to switch numbers from base 10 to any other base.

base10to <- function(x, newbase=10, sep='') {
  if (length(dim(x))==0) xout <- rep("", length(x))
  if (length(dim(x))==2) xout <- matrix("", dim(x)[1], dim(x)[2])
  for (j in 1:length(x)) {
    x2 <- x[j]
    digits <- ((1+x2) %>% as.bigz %>% log(newbase) %>% floor)
    d <- rep(NA, digits+1)
    for (i in 0:(digits))  {
      d[i+1] <- (x2/newbase^(digits-i)) %>% as.numeric %>% floor
      x2 <- x2-d[i+1]*newbase^(digits-i)
    xout[j] <- paste(d, collapse=sep)
x <- matrix(1:100, 10, 10)
base10to(x, 5)
base10to(x, 9)
base10to(x, 2) 
Created by Pretty R at

Seems to be working
Note, it does not work with decimals

Then I integrated it with my switcher:

# Now let's redefine our splitter allowing for non-base 10

splitter2 <- function(X, it, nest, noisy=TRUE, base=10) {
  Ys <- matrix(NA, nrow=it, ncol=nest)
  esum <- function(x, base) 
    x %>% base10to(base) %>% strsplit("") %>% unlist %>% as.numeric
  for (i in 0:(it-1)) {
    x <- as.bigz(X)
    x <- (x*10^(i)/2^i)
    Y <- x %>% esum(base)
    if (noisy) 
      print(sprintf("%s: %s -> sum(%s)=%s base %s",
                    i,   x,
                    paste(Y, collapse=" "), 
                    base10to(sum(Y), base),
    Ys[i+1, 1] <- sum(Y) %>% base10to(base)
    for (j in 2:nest) Ys[i+1, j] <- Ys[i+1, j-1] %>% as.numeric %>% esum(10) %>% 
      sum %>% base10to(base)
Created by Pretty R at

splitter2(9, 15, 3, noisy = TRUE)


      [,1] [,2]
 [1,] "9"  "9" 
 [2,] "9"  "9" 
 [3,] "9"  "9" 
 [4,] "9"  "9" 
 [5,] "18" "9" 
 [6,] "18" "9" 
 [7,] "18" "9" 
 [8,] "18" "9" 
 [9,] "27" "9" 
[10,] "36" "9" 
[11,] "45" "9" 
[12,] "36" "9" 
[13,] "45" "9" 
[14,] "45" "9" 
[15,] "45" "9"

splitter2(8, 15, 2, noisy = TRUE, base=9)


      [,1] [,2]
 [1,] "8"  "8" 
 [2,] "8"  "8" 
 [3,] "8"  "8" 
 [4,] "8"  "8" 
 [5,] "26" "8" 
 [6,] "26" "8" 
 [7,] "17" "8" 
 [8,] "17" "8" 
 [9,] "35" "8" 
[10,] "26" "8" 
[11,] "26" "8" 
[12,] "35" "8" 
[13,] "35" "8" 
[14,] "53" "8" 
[15,] "35" "8" 

splitter2(7, 15, 2, noisy = TRUE, base=8)

      [,1] [,2]
 [1,] "07" "07"
 [2,] "07" "07"
 [3,] "16" "07"
 [4,] "16" "07"
 [5,] "16" "07"
 [6,] "25" "07"
 [7,] "34" "07"
 [8,] "25" "07"
 [9,] "34" "07"
[10,] "34" "07"
[11,] "34" "07"
[12,] "43" "07"
[13,] "52" "07"
[14,] "52" "07"
[15,] "70" "07"

splitter2(6, 15, 2, noisy = TRUE, base=7)
      [,1] [,2]
 [1,] "6"  "6" 
 [2,] "6"  "6" 
 [3,] "6"  "6" 
 [4,] "6"  "6" 
 [5,] "24" "6" 
 [6,] "24" "6" 
 [7,] "24" "6" 
 [8,] "33" "6" 
 [9,] "33" "6" 
[10,] "24" "6" 
[11,] "33" "6" 
[12,] "33" "6" 
[13,] "51" "6" 
[14,] "60" "6" 
[15,] "51" "6"

splitter2(1, 15, 4, noisy = TRUE, base=2)

     [,1]    [,2]  [,3] [,4]
 [1,] "01"    "01"  "01" "01"
 [2,] "10"    "01"  "01" "01"
 [3,] "11"    "10"  "01" "01"
 [4,] "110"   "10"  "01" "01"
 [5,] "101"   "10"  "01" "01"
 [6,] "110"   "10"  "01" "01"
 [7,] "0111"  "11"  "10" "01"
 [8,] "1000"  "01"  "01" "01"
 [9,] "1100"  "10"  "01" "01"
[10,] "1101"  "11"  "10" "01"
[11,] "1011"  "11"  "10" "01"
[12,] "01111" "100" "01" "01"
[13,] "1101"  "11"  "10" "01"
[14,] "1110"  "11"  "10" "01"
[15,] "10001" "10"  "01" "01"

Now we can see that by changing the base, the same pattern emerges. With base 9, the "magic" number is 8. With base 8, 7 etc. All the way down to base 2 where the "magic" number is 1.

2. Sides of the polygon

What about the other claims put forward in the video? That is, that all sides of all polygons angles add up to 9. That is for a regular polygon
triangle 60+60+60=180...1+8+0=9
square 90+90+90+90=360...3+6=9
pentagon 108+108+108+108+108=540...5+4=9

We can think of any equilateral polygon forming n identical triangles. The tips of those triangles meet at the center with angle 360/n where n is the number of sides of the polygon. To find the other angles we reflect on knowing that the two other angles are equal and since all of the sides of a triangle add up to 180 we can figure their size is (180-360/n)/2. Nowever, the triangles formed by the polygons only represent half the triangle's edge so we need to double that giving:
$$\theta(n)=180-(360/n) =180(1-2/n)$$ with n being the number of sides.

We can see that this could be written: 9*20(1-2/n):

So the question is, using an alternative base system will we get the same pattern?

Let's try base 9 instead of 10. Let's define the angles of the polygon as now: 8*20(1-2/n) base 10. Making a circle now 320 degrees base 10 or 385 base 9. Now let's see about the sum of the sides.

triangle 53+53+53=160 base 10 or 187 base 9 ... 1+8+7 = 16 base 10 or 17 base 9 ... 1+7=8
square 80+80+80+80=320 base 10 or 385 base 9 ... 3+8+5 = 16 ... 8
pentagon 96+96+96+96+96=480 base 10 or 583 base 9 ... 5+8+3 = 16 ... 8

Hail the magical 8!

Do I need to do this again with a different base?

3. All of the digits less than 9 add up to 9 (1+2+3+4+5+6+7+8=36...3+6=9)

Base 10 magic 9:1+2+3+4+5+6+7+8=36 base 10 ... 3+6=9 YES!
Base 9 magic 8: 1+2+3+4+5+6+7=28 base 10 or 31 base 9... 3+1=4 nope!
Base 8 magic 7: 1+2+3+4+5+6=21 base 10 or 25 base 8 ... 2+5=7 YES!
Base 7 magic 6: 1+2+3+4+5=15 base 10 or 21 base 7 ... 2+1=3 nope!
Base 6 magic 5: 1+2+3+4=10 base 10 or 14 base 6 ... 1+4=5 YES!

I think the pattern is pretty obvious here as well.

4. Nine plus any digit returns that digit (9+7=16...1+6=7)

Base 10 magic 9: 9+5=14...1+4=5
Base 9 magic 8: 8+5=13 base 10 or 14 base 9...1+4=5
Base 8 magic 7: 7+5=12 base 10 or 14 base 8...1+4=5
Base 7 magic 6: 6+5=11 base 10 or 14 base 7...1+4=5
Base 6 magic 5: 5+5=10 base 10 or 14 base 6...1+4=5

Need I say more?

5. The Magical Marvelous Mystery of Base 10

Clearly, we can see that there is nothing special about 9. If there is any mystery, it is in the base 10 since as soon as we change the system from base 10 to base 9 the "magic" moves to another number.

We must ask ourselves therefore, "why we are using base 10?"

This is an excellent question! Other systems have developed such as the binary and corresponding hexidecimal system which are powerful systems that intuitively are more consistent than the tens system. With hexidecimals, everything can be written as a series of four binaries reducing all communication to 0 or 1 signals. If you think about it, this is a much more intuitive communication system than a 10 digit one. 

As for why we are using the 10 digit system. My best guess is that we are using base 10 because most people have 10 fingers and it is therefore easier to teach someone to count on a 10 digit system.

So, yes, 9 is special but only because 10 is special and that is only special because our hands developed in such a way as that we have typically have 5 fingers on each hand summing to 10.


Friday, October 9, 2015

Flip a fair coin 4x. Probability of H following H is 40%???

A recent working paper has come out arguing for the existence of Hot Hands (in basketball), a concept psychologists had dismissed decades ago. Hot hands is where a player is thought to have a higher likelihood of scoring the next basket if the last three baskets where shot successfully. (In NBA Jam, that is when you hands caught on fire).

The first paragraph of the paper reads, "Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this empirical probability of heads to be equal to the true probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample one million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4."

In other words $$P(H_t|H_{t-1}) \ne .5$$ even though $$P(H)=.5$$

If you are anything like me, you will say, "WTF, that can't be true!"

Before getting any further into the logic of the paper, let us do some simulations.

First off, could this be the result of looking only after heads values?

That is, perhaps by selecting only heads values we are reducing the number of available heads.

But, no, this does not make sense!

# x is a binary random variable drawn from 1000^2 draws with a .5 probability  of success.

x <- rbinom(10^6,1,.5)   mean(x[-length(x)][x[-1]==1])

# Pretty much as close as you can get to 50% as possible    

# To make this feel more concrete for me I ended up flipping a coin 252 times 
# with the following results (1=heads)

myx <- c(1,1,0,1,1,1,1,1,0,1,1,1,0,1,0,1,1,1,1,1,1,1,0,0,0,1,0,1, 1,1,1,0,1,0,1,1,0,1,1,1,1,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0, 0,1,1,0,0,0,0,1,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,1,0,0, 1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,1,0,1,0,1,1,1,0,0,0,1,0, 1,0,1,1,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,1, 0,1,1,1,1,0,0,1,0,0,0,0,1,0,1,1,1,0,1,1,1,0,1,0,1,0,0,1, 0,0,1,1,0,1,1,0,0,1,0,1,1,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0, 0,1,0,0,0,0,0,1,0,0,1,1,0,1,0,1,1,0,1,1,1,1,0,1,0,0,1,0, 0,1,0,0,1,1,1,1,0,1,1,1,0,0,0,0,0,1,1,1,1,0,1,1,1,0,1,0)   # We can see we drew 131 heads or 51.9% mean(myx);sum(myx)   # This is very close to 50%. What is the likelihood of heads coming 
# up this many times on a fair coin?
plot(100:152,pbinom(100:152, 252, .5), type='l', lwd=3, 
     main="CDF of binomial(252,.5)", 
     xlab="# of Heads")
abline(v=sum(myx), lwd=3, lty=2) 
# The likelihood of gettings 131 or higher given the coin is fair is 
# about 24.4%
1-pbinom(sum(myx), 252, .5)
# I am fairly confident therefore that the coin is therefore fair. 
# However, it could be somewhat unfair and still achieve this outcome 
# without straining the bounds of probability. 
# But this is not the point.
# Now let's look at the claim. The probability of observing a heads 
# after tossing a heads is argued to be .4.
# So let's imagine a million fair coins sequences of length 4. Say set Y.
# This set will be composed of the outcomes of 10^6 probabilities 
# where those probabilities are the likelihood that 
# the coin value that follows a heads is a head.
# I write function nheads for this purpose
nhead <- function(flips=4, N=10^6, prev=1)  {
  Y <- rep(NA,N)
  for (i in 1:N) {
    # Draw four binomial draws
    x <- rbinom(flips,1,.5)
    # Now keep only draws after any heads
    Y[i] <- mean(x[-length(x)][x[-1]==prev]) 
# Now let's take the average across the coins
mean(nhead(4,10^6), na.rm=TRUE)
# Damn, sure enough! The expected number of heads is 40.8%
# Let's see if the empirical results adhere. We can think of the 
# 252 previous coinflips as 63 four coin flip trials.
myset <- matrix(myx, 4)
myY <- Y <- rep(NA,63)
# Now let's calculate the likelihood of heads after the first heads in each set
for (i in 1:ncol(myset)) myY[i] <- mean(myset[-4,i][myset[-1,i]==1])
# Pooling across sets
mean(myY, na.rm=TRUE)
# We get 34.2% which could be interpretted as an unluckily low deviation 
# from 50% except it is surprisingly close to 40%.
# That is 19.5 heads 
sum(myY, na.rm = TRUE)
# However the total number of trials that returned a result is 57
# If we believed that the true expected number of heads was .5 then the 
# likelihood of the 19 heads or less appearing is less than 1%.
pbinom(sum(myY, na.rm = TRUE), sum(!, .5)
# This seems compelling and surprising evidence before even looking at 
# the arguments presented in the paper.
# Looking at the paper, Table 1 is quite convincing. It is organized 
# by # of Heads.
#    Heads            Sequence        P(H|H)
#1   0                TTTT            -
#2   1                TTTH            -
#3   1                TTHT            0
#4   1                THTT            0
#5   1                HTTT            0
#6   2                TTHH            1
#7   2                THTH            0
#8   2                THHT            1/2
#9   2                HTTH            0
#10  2                HTHT            0
#11  2                HHTT            1/2
#12  3                THHH            1
#13  3                HTHH            1/2
#14  3                HHTH            1/2
#15  3                HHHT            2/3
#16  4                HHHH            1
# To come up with the P(H|H) we take the average of the feasible outcomes:
# Sure enough! 40.47
# So what is driving this? Selection bias certainly! But from where?
# Is it the two values that cannot be computed because no heads are generated?
# Even if we specify those as heads this does not fix the probabilities.
# 48.9
# So it must be a feature of the series since we know that 
# if we sample 6 million then we are spot on (near) 1/2.
# Let's see what we can discover
I <- 60
Y <- rep(NA, I-1)
for (i in 2:I) Y[i-1] <- mean(nhead(i,10^5), na.rm=TRUE)
plot(2:I, Y, cex=1, lwd=2, 
     xlab="# of coins")
abline(h=.5, lwd=3) 
# This graph can be quite disorienting. 

It shows that only in the case of flipping two coins is the probability of observing a head after the first head  equal to .5 (well .5 with measurement error).

There is still a significant divergence from .5 even in the case of 60 coints being flipped!

This is so non-intuative as to be downright disorienting.

Does this mean that if someone is flipping four coins and one of them pop up heads you can bet them $10 that the next result is a tail and feel confident the odds are in your favor?

We can see this by looking at our probability table with each sequence arranged by the first head up to that point.

The answer is not quite. I have added up the likelihood of seeing a heads after the previous heads that appears in both sequences (H|1H, H|2H, H|3H).

Heads        Sequence        P(H|H)     H|1H   H|2H   H|3H
  0              TTTT           -
  1              TTTH           -
  2              TTHH           1        1
  1              TTHT           0        0

  2              THTH           0        0
  1              THTT           0        0
  3              THHH           1        1      1
  2              THHT           1/2      1      0

  1              HTTT           0        0
  2              HTTH           0        0
  2              HTHT           0        0      0
  3              HTHH           1/2      0      1
  3              HHTH           1/2      1      0
  2              HHTT           1/2      1      0
  3              HHHT           2/3      1      1      0
  4              HHHH           1        1      1      1 

Thus we can see that according to this, no matter that the sequence the likelihood of the next flip being a head after the first is .5. That is, equal number of heads as tails.

How about the second head (H|1H)? Looking over the table we can see the same thing. As well as for the 3rd head.

From this we can see that in part our intuition is correct. Knowing the previous outcome in the sequence does not provide information on future outcomes. THANK THE GODS!

To get some insight into what is happening, you have to think about how we are scoring each sequence. After each flip of a head we score either 0 or 1. If this is the end of a sequence then we stop there and things work fine: TH (-), TT (-), HT (0), HH (1) -> P(H|H)=1/2.

If it is not the end of a sequence let's see what happens.
THT (0), THH (1), TTT (-), TTH (-), {this part looks okay so far}

HTT (0), HTH (0), HHT ((1+0)/2=1/2), HHH (1).

Now we can begin to see what is happening with the coin flip.

The previously unscorable series TH now becomes scorable. No problem.

The unscorable series TT remanes unscorable.

And the zero scored series HT remains zero scoring!

To review, we now have a previous sequence TH which was unscoreable and now has a probability of Heads of .5 (confirming our expectations). We had a sequence TT which was discarded and remains discarded. And we had a sequence HT which was scored 0 and remains scored 0.

So all of the action must be happening in the HH part of the series!

And we can now think of what is happening. The problem is in the averaging across heads within a series. In this case, by flipping HH this means that the next flip must be evaluated. With HT the next flip does not matter, the sequence will be scored 0 no matter what.

However, with HH, we know that the next flip can be either H or T which means HHT (scored (1+0)/2=.5) and HHH (scored (1+1)/2=1) and $$ mean(H|HH)=.75$$. Thus effectively all other probabilities in the sequence remain the same except what was previously a P(H|HH)=1 is now a P(H|HH)=-.75.

But why? Well, there is a selection issue going on. In the case of HH we are asking what the next score is and rounding our previous positive result to match that. But in the case of say HT we not asking what our next result is and rounding. Thus an isolated few negative outcomes are kept while a few positive outcomes are downgraded resulting in an overall selection effect which reduces the likelihood of seeing a head overall after seeing a previous head.

Now that we can see why this is happening, why does it ever go away? The truth is, that it never goes away. The deviation from .5 gets really small as the sample size gets large. This is because the selective scoring (last value scored) is only happening on a smaller and smaller portion of the series.

I am tempted to say something about the larger implications on series analysis. However, I don't really know how well this sampling issue is known to time series people. I do not recall hearing of it in my meager studies of the matter, but frankly that could mean very little.