Conditional probability is the only probability and stay out of the rain

If you have had "normal" math classes, you'll probably be used to seeing entities like "the probability of being struck by lightning." You have probably been misled. For background reference, the number of people who are struck by lightning in the US is about one per million. (They don't necessarily die.)

That's a number. A ratio. A frequency. It is not a probability. Not yet. If you had a normal math education, you should want to argue with me on this point. You were taught that probabilities are just frequencies, which are just facts about the world like the lightning fact. Another number might be two in four balloons are red, in a bag containing two red and two blue balloons.

The lightning number can be turned into a probability, though, provided we provide some conditions. I say all probabilities are conditional probabilities, but you should ask: "Conditional on what?" This depends on the relevant information we have. The only information we have about the one in a million number is that it's for people who live in the US. Thus we might express the above lightning number as $Prob(being\ hit\ by\ lightning | live\ in\ US) = 1/1,000,000$. The $|$ is read as "given" or "supposing" or "on the condition that the following is true", so you would read that mathematical equation as "The probability of a person being hit by lightning given that person lives in the US is one in a million." But this fact shouldn't make you feel safe if you find yourself in the middle of a nasty thunder storm without shelter.

This number has a use. Its use is as prior information, aka a prior probability, or just simply a prior. It represents our uncertainty about being hit by lightning if all we know is that the person wondering about being hit lives in the US. It represents a starting point from which we can depart when we discover new information.

What kinds of new information? For example, the fact that the weather is stormy. Then you would ask $Prob(being\ hit\ by\ lightning | it\ is\ stormy\ and\ live\ in\ US)$. This is obviously different than before because we have new information that we can condition on. So what does it equal? It equals this: $Prob(being\ hit\ by\ lightning | live\ in\ US) *$
$Prob(it\ is\ stormy | being\ hit\ by\ lightning\ and\ live\ in\ US) / Prob(it\ is\ stormy | live\ in\ US)$.

You can see we do take into account the prior information, but now we need to multiply it by some factor in order to take into account the new information, namely that it's stormy outside. We have two new probabilities of interest, respectively the numerator and the denominator of our new multiplier.

The first probability represents our uncertainty about whether it's a stormy day given someone got hit by lightning in the US. For the moment we can assume anyone getting hit by lightning had it happen during a storm, and so this first probability is very close to 1. (Remember that probabilities are between 0 and 1 exclusively, but to simplify the math we often pretend something actually is 0 or 1.)

The second probability, the denominator, represents our uncertainty about it being a stormy day somewhere in the US. So here's a question: over the whole US, has there ever been a day that wasn't characteristically "stormy" in at least one place? I don't know. If there hasn't been such a day, then both these new probabilities are useless to the calculation since it would also be about 1, there would be no new information that we can also take advantage of, and so our prior doesn't change into anything new at all!

We need more data, specifically we need more things we can condition on that are useful. For example, it would be nice to know how many people get hit by lightning per year in the state of Florida, because then if you live in Florida you can calculate your prior probability of being hit and realize it will be significantly more than the national prior. Of course, if you find even more data, such as the number of people who get hit by lightning per year if those people rarely left their house, you might find that your probability will go down far below the national prior if you're the type of person who rarely leaves your house.

Let's look at what the national number does, though, if we did find some new information to give it. Let us pretend that the US is a single city in a single location, such that the second probability now becomes useful to us. A simple way of measuring the second probability is to count up the number of stormy days in the US for this year, and divide by 365. This is not the only way, however.

(Remembering that this is pretend, in real life we can't see into the future and so measuring this probability becomes a bit more difficult--a simple "probably good enough" measure though would just be the average stormy days per year for the past however many years of data you have.)

What this means is that we're multiplying 1/1,000,000 by 1/(stormyDays/365), because we already decided the first probability was about 1. This is equivalent to multiplying 1/1,000,000 by 365/stormyDays.

If we assume we'll always have at least one stormy day (so we don't worry about dividing by zero!), then the quantity $365/numberOfStormyDays$ has two possible categories of values.

The first is supposing every day is stormy, which means we had 365 stormy days in the year, and 365/365 is 1. Our value doesn't change! Knowing that it's stormy when every day is stormy is useless information. We're back in the same spot as we were before where I guessed that in the real United States there's at least some place every day that's stormy.

The second category is any other case where there are less than 365 storm days. Let's look at some possibilities.

There are 362 stormy days. The value is 365/362 which is about 1.008. We multiply this with the 1/1,000,000 and we end up with something that's slightly larger than before.

If there 182.5 stormy days in the year, aka half the days are stormy, half are not stormy, the value is 2. We're twice as likely to get hit knowing we're actually in a storm compared to not knowing whether we are or not. One in five hundred thousand.

If there's just 1 stormy day per year, the value is 365. This means that our previous probability changes from one in a million to 3.65 in ten thousand! That's quite a bit more likely! (Probably not enough to scare you, of course, but I'd go inside.)

This may seem counter-intuitive at first. The less days that are stormy in a year, the higher chance we have of getting hit by lightning? That seems bizarre, doesn't it?

Remember what it is we're calculating! We're calculating our conditional probability of getting hit, given that we know we're in a storm. If storms only happen once a year, and we know we're in one, we're in a special circumstance. Since we know it's practically impossible to get hit by lightning when it's not stormy, we don't worry for 364 days of the year. But for that one day that it's stormy, holy crap, we better be careful! That's when all the lightning strikes occur!

Does it make sense yet? It's actually very important that we "just know" it's practically impossible to get hit by lightning when it's not stormy, that we're allowed to treat that particular probability as almost 1. It's the ratio of the two probabilities we're multiplying by. If we knew that it was stormy only half the time people got hit by lightning (because half was from storm-lightning and half was from anything else that can only happen on a non-stormy day, such as tesla-coil lightning or Force Lightning or whatever), the math values change a bit.

Instead of the multiplying ratio being $1/(stormyDays/365)$, it becomes $1/2 / (stormyDays/365)$ which is $182.5/stormyDays$. The behavior remains the same as above when $stormyDays$ is between 1 and 182.5, but if every day is stormy, we're left with multiplying by 1/2, which makes our probability half as small! One in two million. Because the original probability of getting hit, our prior, only assumes to know we live in the US, it implicitly takes into account all possible conditions for being hit in the US, whether it's stormy or not-stormy. Thus if you know that it's stormy, and only half of lightning strikes happen when it's stormy, then you just reduced your odds by 2 from the baseline. Symmetrically, if you know it's not stormy, but half your lightning strikes happen when it's not stormy (because what Sith wants to get his cloak wet), you also just reduced your odds by 2 from the baseline.

So coming back around, there are relationships between probability and frequency, but they are not the same.

Frequencies are just numbers starting at 0 and ending at 1. They may or may not be true, but we like ones that are. Frequencies are ratios, like 1.8 people dying per second across the world. We expect frequencies to remain the same, for example when sending radio signals it's nicer for both the sender and the receiver to agree on a message frequency that is constant.

Probabilities are our own uncertainty or certainty in some matter such as being hit by lightning. They are conditional on some assumptions (which may be as broad as "the universe works the way we think it works" or as narrow as "we measure 5 seconds after pressing the start button a quark traveling at 0.9c located 50 nanometers from another quark traveling at 0.7c"). We represent our uncertainty with numbers between 0 and 1, but not 0 or 1 themselves except for rare cases of mathematical convenience, but we could have picked something else to represent them with, whether a different range of numbers or even non-numbers. And we could have said that 0, instead of 1, is the limit of complete certainty. When we measure probabilities, we do so using our best judgment. Often times, we may find a handy frequency value to use as a tool, and we'll use that. Other times, we'll use raw bits of information from some other (hopefully empirical) data we can find.

It doesn't help remove confusion that all the tools we use to calculate and measure probabilities can in turn be expressed as frequencies, and so I just ask you to remember that while 0.1 is the same as 1/10, a probability of 0.1 is not the same as the number 1/10, but it is the same as a probability of 1/10.

If you don't quite get the difference, I have only one other way to express it, and that's through a leaky metaphor and personification. The metaphor is a name-tag. Pets run around houses, we all agree they're pets, they all wear name-tags that spell "pet". But cats wear an additional name-tag with "cat" spelled on it, and dogs wear an additional name-tag with "dog" spelled on it. This enables us to talk about cats or dogs specifically, in their own context that doesn't necessarily translate to the other (talk about litterboxes for cats, doghouses for dogs), and if we want we can always talk about them both in general as pets but in doing so we lose the distinction of what class of pet each one is.

Numbers like 1, 4, 1/3, and so on, fly around freely as themselves, all wearing the name-tag "number". Some numbers like 1/3 wear a name-tag that says "ratio". All numbers wear a blank name-tag where they write things in pencil instead of permanent marker. At one moment they might write "the answer to problem #36 on Shelly's homework she's currently doing", the next moment they might erase that and write "the answer to problem #14 on Greg's current homework".

For numbers between 0 and 1, they get the special privilege of being able to sometimes write "probability of Y given Context". This signals that these numbers are, for the moment, temporarily representing probabilities, and so we can talk about them--in this specific instance, before the pencil marks get erased--as probabilities. So if you ask yourself "what's the probability that I'll eat dinner tonight given my laziness towards cooking and driving", and answer with "7/8", the number 7/8 for that moment has three name-tags. The first two have, written in permanent marker, "number" and "ratio". The third, in pencil, as "probability that etc.".

This post was meant to introduce and reinforce the claim that your school notion of probability is wrong--probabilities are something extra to mere frequencies--and to give an example of why you need to ask "odds conditional on what?" if you want to get any use out of probabilities. As a subtext, you should remember that data beats math--you can crank away all you like on that one in a million number for being hit by lightning, there are many theorems in probability theory and ad-hoc tools in Statistics where you could just "plug it in" and see what happens, but you're much better off collecting more data to condition on instead.

Posted on 2012-03-21 by Jach

Tags: bayes, math, probability

LaTeX allowed in comments, use $\\...\\$\$ to wrap inline and $$...$$ to wrap blocks.