* Probability Theory by Example
* Often times some of the rules of probability theory can seem quite abstract.
* Hopefully simulating data will yield a clearer understanding of how these rules work.
* This is stata code
version 11
set seed 11
clear
set obs 100
* First off if A & B are independent then the expected value of A*B is equal expected value of a times that of the expected value of B.
* E(A*B)=E(A)E(B) if A and B are independent.
gen A=rnormal()+9
gen B=runiform()
gen AB = A*B
sum A
local A=r(mean)
sum B
local B=r(mean)
di "mean_A * mean_B = `=`A'*`B''"
sum ab
* Se can see that the mean of A times the mean of B is close to that of the mean of the two variables.
* They do not need to be equal because all of the measures are noisy because we are dealing with a sample.
* As the sample gets larger they should get closer.
clear
* Increase sample size to 100,000 observations
set obs 100000
gen A=rnormal()+9
gen B=runiform()
gen AB = A*B
sum A
local A=r(mean)
sum B
local B=r(mean)
di "mean_A * mean_B = `=`A'*`B''"
sum AB
* However, simply because E(AB)=E(A)*E(B) does not imply independence.
clear
* Increase sample size to 100,000 observations
set obs 100000
gen A=rnormal()
gen B=abs(A)
* B equals the absolute value of A
gen AB=A*B
sum A
local A=r(mean)
sum B
local B=r(mean)
di "mean_A * mean_B = `=`A'*`B''"
sum AB
* The means are pretty close.
* However, there closeness clearly does not imply independence since: we know the value of B if we know what A is.
* Another way of thinking about independence is "does knowing A tell us anything about B or does knowing B tell us anything about A?"
* Clearly A tells us a lot about B
* Let's look at Bayes Theorem briefly
* Bayes Theorem states
* P(A|B)=P(B|A)P(A)/P(B)
* If A and B are independent then P(B|A)=P(B) and P(A|B)=P(A)
* Let's try simulating an example:
clear
* Increase sample size to 100,000 observations
set obs 100000
gen A=rbinomial(1,.6)
* The probability of A. P(A=1)=.6 or in our sample
sum A
di r(mean)
gen B=rbinomial(1,.1+A*.7)
* The probability of B given A [P(B=1|A=1)] = .1+A*.7 = .1+.7=.8
* Therefore P(B=1)= .1+E(A)*.7=.1+.6*.7=.1+.42=.52
* So P(A=1|B=1)= P(B=1|A=1)*P(A=1)/P(B=1)
di "P(A=1|B=1)=".8*.6/.52
* Lets see if the data confirms
sum A if B == 1
* This gives us the probability of A == 1 given B==1 (because A is a dummy variable)
* The mean which is what we are looking at is pretty close
* Likewise:
di "P(A=1|B=0)="(1-.8)*.6/.52
sum A if B == 0
* Thus Bayes rule seems to work pretty well in our generated data.
No comments:
Post a Comment