Bayes
$$ {P(A | B)} = {{P(B | A)P(A)} \over{P(B)}}$$
Where \(A\) and \(B\) are events and the probability of \(B \ne 0\) and:
• \(P(A)\) and \(P(B)\) are the probabilities of observing \(A\) and \(B\)
• \(P(A | B)\) is the conditional probability of observing \(A\) _given_ that \(B\) is true
• \(P(B | A)\) is the probability of observing \(B\) _given_ that \(A\) is true
There are other formulations and re-arrangements of this. I quite like the 'Better Explained' roundup of Bayes.
Here's a pretty good YouTube vid on the subject:
So, what does it all mean Professor?
Let's take a 'modern' example which demonstrates it power. Let's suppose that we have a new test for that awful scourge of modern life, hypocondritis that is 98% sensitive and 91% specific. Sensitivity refers to how good it is at detecting hypocondritis (in this case if you have hypocondritis it will show that correctly 98 times in a hundred) and Specificity refers to how often it gets it wrong in hypocondritis-free patients (in this case only 100-91 = 9% of the time or 9 'healthy' patient in 100). So, sensitivity refers to 'false negatives' and specificity refers to 'false positives'.
Phew, glad that's out the way, so how does this work then?
Consider the population as a whole, and lets suppose that at any given time 1% (1 in every 100) of the unsuspecting populace have the dreaded hypocondritis. I don't show any symptoms but at a visit to the local GP for an unrelated reason (seasonal aversion to work - it's a real problem you know) I'm randomly selected for a test and, Oh No! the result is positive. The question is, what is the chance that I actually have hypocondritis and this isn't a freak occurrence?
Lets do some sums:
\(P(A)\) is the probability that I've actually got the disease - easy, 1% or .01 (we're given this figure)
\(P(B)\) is the probability that I'll give a positive test wether I've got the disease or not - hmm … tricky, we'll come to that.
\(P(B | A)\) is the probability that I'll test positive if I've actually got the disease - also easy, 98% or 0.98 (also given).
OK, how about that pesky \(P(B)\)? well, thinking about it, this is a combination of the overall likelihood of the group (1% of the population) of infected population giving a positive test (98% likely) plus that for the much larger un-infected population (99%) giving a positive result (100%-91% = 9% likely). So…
\(\begin{equation}
\begin{split}
P(B) &= (0.01 \times 0.98) + (0.99 \times 0.09)\\
&= 0.0989
\end{split}
\end{equation}
\)
Plugging in all these figures to the Bayes' rule then we get:
\(\begin{equation}
\begin{split}
P(A | B) &= { (0.98) \times (0.01) \over (0.0989) } \\
&= 0.9909
\end{split}
\end{equation}
\)
or nearly 10%
What does this mean? It means that even though I've tested positive for hypocondrititis I've actually only got a 10% chance of genuinely having it. So although th test is 98% 'accurate' I'm still 10 times more likely not to have it.
Here's another example:
A family has two children. Given that one of the children is a boy and that he was born on a Tuesday, what is the probability that both of the children are boys?
At first glance you might think that the Tuesday bit is irrelevant - I did - but you would be wrong believe it or not. And here's why.
If we ignore the day-of-the-week information it is pretty easy to show that the probability of 2 children being boys, given that one of them is, is \(1 \over 3\) (all possible combinations with at least 1 boy - gb, bg, bb, so probability of bb is 1/3)).
Now lets look at the actual problem using Bayes theorem.
In this case lets assume that \(A\) is the event that both children are boys and that \(B\) is the event that the family has a boy born on a Tuesday. Also lets assume that the chance of being born on any particular day of the week is \(1 \over 7 \).
It is easy to see that \(P(A) = {1 \over 4} \) (possible combinations are (gg, gb, bg, bb).
What will \(P(B | A)\) (probability that a boy is born on a Tuesday given there are two boy children) be? Well there are 49 combinations of days of the week that the two boys could be born on and 13 of these will contain a Tuesday. So \(P(B | A) = {13 \over 49} \).
For \(P(B)\) you need to note that there are for a single child there are \( 2 \times 7 = 14 \) combinations of gender/day and for 2 children this becomes \(14^2 = 196 \) combinations. Of these there will be \(13^2 = 169\) which do not have a boy born on a Tuesday and therefore \(196 - 169 = 27\) that do. So, \(P(B)\), the probability that the family has a boy born on a Tuesday, is \( 27 \over 196\).
Plugging these numbers into Bayes we get:
\(P(A | B) = {{13 \over 49} \times {1 \over 4} \over {27 \over 196}} = {13 \over 27}\) which is much closer to \(1 \over 2\) than the original \( 1 \over 3\)
This really did seem ridiculous to me - why should naming a day have such an effect? So I wrote a simple (and pretty cludgy - don't look at it too hard) piece of Python code to check it out using a Monte Carlo simulation technique (10,000) iterations. Last time I ran this it gave the incidence of 2 boys, one at least born on a Tuesday, as something like \(4800 \over 10000\) which is remarkably close to \(13 \over 27\). You can find the code here.
Now these are just toy examples but the figures aren't outrageous however so consider if this was a real clinical issue. The anxiety caused by a false positive could be pretty detrimental and further testing to 'confirm' the test diagnosis might involve unpleasant and potentially harmful procedures. Is it worth putting those 9 out of 10 healthy patients through a screening - even though the test looks really accurate?
Have a look at things like the PSA test for prostate cancer for loads of heated debate around this subject but do try and view the 'evidence' through a properly objective and analytical eye. You might also consider dear old Bayes when you see the outraged bellows from the Daily Wail on a regular basis calling for some form of blanket screening to help 'save' innocent lives.
NOTE no puppies, kittens or other doe-eyed innocent animals were harmed in the writing of this piece.