Correlation is evidence of causation

I've been bringing the title line out frequently for the past few years in response to people saying the somewhat true phrase "correlation does not imply causation", or the true phrase "correlation is not causation" which they've been indoctrinated by fraudsters protecting Big Tobacco.

When asked for a proof, I often just link to this page: http://oyhus.no/CorrelationAndCausation.html It's the simplest and easiest to understand version I've come across. But I think it's sort of missing a final step, and a longer proof will fill that in.

In order to prove the title statement, we have to back up a bit and ask about what evidence is, and before we do that we have to ask about what belief is. Or rather, we don't really need to define what they are so much as how to measure them. Bets are a way of measuring your confidence and certainty of your beliefs, and odds ratios and other aspects of betting can be expressed through probability theory, so your beliefs being true can be expressed using probability theory as well. (If you're interested in non-betting-based foundation for probability theory governing beliefs, see Jaynes. If you're interested in representing uncertainty of several "flavors", see Goertzel.) So if we have a probability for a belief, and we encounter a new piece of evidence, then that will either raise or lower the probability of the belief depending on whether it's evidence for or against. Formally, if some fact A is evidence for belief B being true, that means that the probability of B being true is greater if A is true than if A is false. In math, $P(B|A) \gt P(B|\overline{A})$ means A is evidence of B.

So the above link proves that correlation is evidence of causation, but here I'll repeat the math (more verbosely) and add one additional fact to make things crystal clear.

Let $c$ be defined to mean "correlation", let $a$ be defined to mean "causation". Given the universe of background information $I$, we know that not everything correlates: $P(c|I) \lt 1$. Additionally we assume that if we have causation, then there is also a correlation. (It may not be linear correlation, but it will be correlation of some kind.) i.e. $P(c|a,I) = 1$.

If we're trying to determine whether a particular correlation $c$ is evidence for a particular causation $a$, we need to find out if $P(a|c,I) \gt P(a|\overline{c},I)$. We can do that with Bayes' theorem and substitution.

\begin{align} P(a|c,I) & & \text{(causation given correlation)} \\ P(a|c,I) &= P(a|I) \frac{P(c|a,I)}{P(c|I)} & \text{(Bayes theorem)} \\ &= P(a|I) \frac{1}{P(c|I)} & \text{(assumption that causation gives correlation)} \end{align}

Since we know that $P(c|I) \lt 1$, we know that it only serves to increase the value of $P(a|c,I)$ relative to $P(a|I)$, so we now know that $P(a|c,I) \gt P(a|I)$.

The proof is technically done because the next step some might consider obvious (it was left out of the link), but I'm going to follow through anyway and show it. Let $P(a|c,I) \gt P(a|I)$ which we just proved be known as Lemma 1. Now we just need to marginalize $P(a|I)$ with respect to $c$ to see that indeed $c$ is evidence of $a$. The marginalization is done by: $P(a|I) = P(a,c|I) + P(a,\overline{c}|I) = P(a|c,I) P(c|I) + P(a|\overline{c},I) P(\overline{c}|I)$. (The second step was just using the product rule.)

So now the final proof:
\begin{align} P(a|c,I) &\gt P(a|I) & \text{Lemma 1} \\ &\gt P(a|c,I) P(c|I) + P(a|\overline{c},I) P(\overline{c},I) & \text{Marginalization} \\ &\gt P(a|c,I) (1 - P(\overline{c}|I)) + P(a|\overline{c},I) P(\overline{c},I) & \text{Sum rule} \\ &\gt P(a|c,I) + P(\overline{c},I) (P(a|\overline{c},I) - P(a|c,I)) \\ 0 &\gt P(\overline{c},I) (P(a|\overline{c},I) -P(a|c,I)) \\ &\gt P(a|\overline{c},I) -P(a|c,I) \\ P(a|c,I) &\gt P(a|\overline{c},I) & \text{c is evidence of a, QED} \end{align}

If you've ever taken a stats course or a course that covered deductive reasoning with the predicate calculus, you were probably beat over the head with the phrase "correlation is not causation!" or even "correlation does not imply causation!" You are beaten over the head with these phrases because first of all it was used to defend Big Tobacco by noting that smoking may not cause lung cancer, but secondly because intuitively it seems like correlation and causation are related, and by noticing correlations you can then proceed to do experiments that show possible causation, and stats and logic courses try to beat intuition out of you because in those fields it can often lead you astray (especially when you're new). And I do agree that these phrases are true: correlation is not the same as causation, and correlation does not logically, deductively, imply causation within the predicate calculus. However, as we just showed, when you move to probability theory and allow for probabilistic inferences, you get the result that correlation is evidence of causation, and this is proved deductively. This matches common sense that correlation "implies" causation in a looser (probabilistic) sense. And if you find a bunch of correlations, like objects of widely different masses falling at the same rate in a vacuum, that hints very strongly at a cause (like gravity or the Flying Spaghetti Monster's invisible noodly appendages). Note that the above proof did not specify how much evidence a given correlation provides to a given causation, it just says that there's some evidence. Finding out how much takes more work.

To the hardcore logician who only knows deduction in the predicate calculus, this may sound like heresy. But the predicate calculus is contained as a special case in probability theory; probability theory is more general and allows for more general inferences, including ones that match up with common sense better. (For example, the subject of a future post is on logical fallacies, and how if you analyze a "fallacy" in the domain of probability theory it ceases to be a fallacy and instead becomes a theorem! If you assume (or measure a prediction sample to verify) that authorities are more right about their topic of expertise than not, an argument from authority is valid!)

Posted on 2014-06-03 by Jach

Tags: bayes, math, probability, rationality

LaTeX allowed in comments, use $\\...\\$\$ to wrap inline and $$...$$ to wrap blocks.