## Tuesday, July 17, 2012

### The Problem with Probabilities - how many rolls does it take to get a 1!

* So, you roll a fair 6 sided die once.  What is the probability of getting a 1 on the first roll?

* Simple:

di 1/6

* How about the possibility of not getting a one:

di 1-1/6

* or

di 5/6

* Ok, easy right.  Now is when things get ugly.

* What happens if you want to know the possibility of rolling a 1 (at least 1 or more) with either of two fair dice?

* 1/6 + 1/6? right?  I mean, you roll one die, 1 in 6 times you will get a 1 and you roll the other die and one in six times you will get a 1.  2/6=1/3 Right?

di 1/6 + 1/6

* Wrong!

* Weirdly.  It is not obvious with 2 dice why this statement is not true.  However, start adding dice to the equation.  3/6 (three dice) 2/3 (four dice) etc. 6/6 = 1 (six dice) wait a second!  I know that if I roll six dice, there is a possibility that one of them will not be a 1.  I mean, it might be small but it exists!  Then what happens when we go to 7 dice?  7/6 > 1 and thus not a probability.

* The reason why the intuitive guess does not work, is because the intuition applies to the question of "how many 1s will be rolled on x number of dice?"  To see this we find the expected value in terms of successes of one die roll:
* E(x=1)=1/6*1[x1=1] + 5/6*0[x1!=0]=1/6
* likewise: E(x=2)=1/6*1[x1=1] + 5/6*0[x1!=0] + 1/6*1[x2=1] + 5/6*0[x2!=0]=2/6=1/3
* The difference between this and the previous question is that in the previous question if you get a one, both rolls of the dice it counts as 1 while in the expected number of 1s count if you roll two ones then they count as 2.

* Thus we realize the utter failure of our intuition.

* The things about independent outcomes is that they are in a sense conditional upon each other.  You roll two dice and you only care about the possibility that at least one of them is a 1.  Thus you say, what is the possibility that the first die is a 1 = 1/6 + what is the possibility that the second die is a 1 (1/6) given that the first die is not a 1 (5/6) -> 1/6*5/6

* So the total probability that either die are ones is:
di 1/6 + 1/6*5/6

* Not quite 33%

* It becomes more exaggerated the more dice you add to the mix

*  P(a1=1) + P(a2=1 | a1 != 1) + P(a3=1 | a1!=1 & a2!=1)
di 1/6 + 1/6*5/6 + 1/6*5/6*(1-1/6*5/6)

* Okay, one more:
*  P(a1=1) + P(a2=1 | a1 != 1) + P(a3=1 | a1!=1 & a2!=1) + P(a4=1 | a1!=1 & a2!=1 & a3!=1)
di 1/6 + 1/6*5/6 + 1/6*5/6*(1-(1/6 + 1/6*5/6)) + 1/6*(1-(1/6 + 1/6*5/6 + 1/6*5/6*(1-(1/6 + 1/6*5/6))))

* An easier way to do these things is: taking 1 less the probability that the event never occurs
di 1-(5/6)^4
* I thought these equations are equivalent.  They probably are, just a more of a rounding error I would suspect.

* So it is easy to say this stuff but do we really believe it?

* Fortunately we can test our intuition with simulation!

* Lets generate some random six sided draws

clear
set obs 100000

* First the draws
gen draw1 = ceil(runiform()*6)

tab draw1

gen draw2 = ceil(runiform()*6)
gen draw3 = ceil(runiform()*6)
gen draw4 = ceil(runiform()*6)

* Now let's generate some indicator variables of different outcomes.

gen draw1eq1 = 0
replace draw1eq1 = 1 if draw1==1

gen draw1or2eq1 = 0
replace draw1or2eq1 = 1  if draw1==1 | draw2==1

gen draw1or3eq1 = 0
replace draw1or3eq1 = 1  if draw1==1 | draw2==1 | draw3==1

gen draw1or4eq1 = 0
replace draw1or4eq1 = 1  if draw1==1 | draw2==1 | draw3==1 | draw4==1

sum draw*eq1
* Thus we can see, contrary to our intuition, the probability of a 1 being rolled on at least 1 of four dice is closer to 50% rather than our intuitive guess of 75% (4/6).

* This could make a big difference in an intense game of D&D!    :)