# Notes from Probability Theory Chapter 1

Probability Theory: The Logic of Science, by the great E.T. Jaynes, has been in my reading queue for quite some time now. Unfortunately for me it's a dense book after the first few chapters, so I've kind of plateaued around chapter 3 while reading from a bunch of other sources.

I've found that my brain is like boiling soup, in a sense, with different things coming up to my attention almost randomly but the important ones usually coming up just-in-time. So now a Jaynes bubble has reappeared and I'm going to review what I've read! If you're interested I highly recommend the actual book, since I'm here going to be sometimes more verbose, sometimes less, sometimes tangential, and always less organized than Jaynes; leave your email in the comment form and I'll send you a PDF copy if you want.

Chapter one begins with this thought-provoking quote:
The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind.
James Clerk Maxwell (1850)

### 1.1 - Deductive and Plausible (Inductive) Reasoning (Inference)

Not long afterwards, quantum mechanics was born and if you go back and listen/read Feynman he'll tell you that science has been "reduced" to probabilities.

Jaynes asks us to consider a scenario: it's a dark night, a cop is walking down a street which he thought seemed deserted. He suddenly hears a burglar alarm, looks across the street, and sees a jewelry store with a broken window. A moment later, a gentleman wearing a mask comes crawling out through the broken window, carrying a bag which is full of expensive jewelry.

Any cop wouldn't hesitate at all deciding that this gentleman is dishonest. How does he come to this conclusion? Without going too far into the depths of the brain, what, in general, is his mind probably doing?

Was this a deductive decision by the cop, assuming just the evidence alone? (You can of course assume the conclusion but that's not helpful.) Clearly not, because with a little thought one might come up with an entirely innocent explanation for the observed evidence. Jaynes offers: perhaps the gentleman is the store owner who was walking home from a masquerade party, and he didn't have his store key with him. When he was walking by his store, a passing truck may have thrown a brick through the window, and the owner was just protecting his own property. The alarm may have had a delay since the cop never heard the window break.

So clearly, because a reasonable alternative exists, the cop cannot logically deduce the dishonesty of the masked person. Yet we still think the cop has valid reasoning to suspect dishonesty. We think the evidence available, while not presenting a logically certain conclusion, nevertheless presents an extremely plausible conclusion. Why do we think this? Our brains are doing this all the time: we're constantly faced with situations where we simply don't have enough information available to allow ourselves enough assumptions to perform deductive reasoning. Premature deduction can lead you off a cliff.

It turns out that Probability Theory is a concrete, axiomatic, deductive theory for describing what roughly goes on in the cop's mind. It's a theory to explain plausible reasoning. What sort of power does this give us? Through human inductive (plausible) reasoning that we do with intuition and general, qualitative ideas, we can come up with a couple rules to assume as axioms that should govern induction in general.

All of us can think of cases where humans reasoned poorly, can we actually quantify different kinds of errors and produce a better alternative that the person in error could have done instead? Yup, we can. It's also known that humans can reason very poorly even while under the delusion they're doing just fine. I know I've convinced myself of many falsehoods that had to be corrected. If we can define a minimal set of rules, which should be simple enough not to let us make any errors, then we can deductively produce more theorems from those rules, which give us fences to help us combat our failures when our intuitive induction leads us astray.

I've gotten ahead, though. (This is why you should read Jaynes, especially if you're finding it difficult to follow me.) What do I even mean by deduction? Why do we even want deduction?

Deductive reasoning, or classical two-value logic, typically credited to Aristotle, basically boils down to two strong syllogisms repeated over and over. The first, symbolically, follows the form: $A \to B; A\ is\ true; \lower0.1ex\hbox{\bullet} \kern-0.2em\raise0.7ex\hbox{\bullet} \kern-0.2em\lower0.2ex\hbox{\bullet} \thinspace B\ is\ true$. In English:

1: We are given (or assume) the fact that if some proposition A is true, then automatically we know proposition B to be true as well. Some time through our day, we stumble upon an instance of A that we discover is true. We can automatically deduce that the other proposition B must also be true.

2: The second syllogism is just the inverse. Throughout our day, we find an instance of B and learn that it is false. Therefore we can automatically deduce that proposition A must also be false.

It should be obvious why we should prefer these two rules of logic: they're simple, and it's incredibly hard to make a mistake. Indeed, we can even program computers to check mistakes for us automatically! These rules also give us the power of Laziness. Suppose we know that if we ever find a golden apple today, we know that it will be raining above the Empire State Building for the day. So if we stumble upon a golden apple, we know it's raining above the Empire State Building. You can go look if you want, but it's not necessary. Inversely, if we're on top of the Empire State Building and it's sunny, we know that we won't find a golden apple today no matter how hard we look and we don't even need to try. Notice, however, that if we're at the Empire State Building and it's raining, that doesn't guarantee we'll find a golden apple. The if-then works one way.

Of course, rarely can you get away with saying you know such an absurd thing as golden apples implying rain. This isn't a knock against logic, though: logic only requires that you suppose, or assume. If your assumption turns out to be true, because you bothered to check it enough times until you were convinced, then hey, you get a nice bonus of having something useful. Nothing is stopping you from proving whatever you want, apart from people questioning your assumptions. Your deductive conclusion is only as valid as the place you started, and any wrong move in the dance destroys it all.

But it's not even those absurd things like my example that we generally can't get away with. Even the cop example doesn't let us get away with it, unless you want to assume that if someone is wearing a mask then they're dishonest! What'll you do at Halloween?

For many $A \to B$ relationships, it's often the case that we simply don't have enough information about A which makes the first syllogism useless to us. So in general, for everyday life, we have to fall back on a weaker syllogism:

3: Again we assume that if A is true, then B is true. We learn that B is true, therefore A becomes more plausible.

Jaynes gives another example here: suppose proposition A is "it will start to rain by 10am at the latest", and B is "the sky will become cloudy before 10am".

If we notice clouds before 10am, in other words we observe that B is true, that doesn't mean it will start to rain. But our common sense, our intuition, follow this weak syllogism in that, even if we can't logically prove certain that it will rain, we nevertheless find the idea that it's going to rain more plausible. This process of obeying the weak syllogism is a form of inductive reasoning.

Note that $A \to B$ does not need to be causal. Clearly, as far as most people's conceptions of causality go, clouds in the past cause rain in the future. Rain in the future doesn't cause clouds in the past! But rain in the future does mean that there were clouds in the past, while clouds in the past does not necessarily mean rain in the future. And so it is that if we're trying to be careful in what we assume, only the implication of $Future\ Rain \to Past\ Clouds$ seems certain enough to let by.

We are not proving causality, we are only proving valid implication according to some rules we intuitively like and agree on. Proving causality is tricky business, for years most people assumed it couldn't be done. Nowadays we have Judea Pearl.

4: There is one more related weak syllogism, using the same premise of "if A, then B." If we discover that A is false, then (precisely because we're not talking about causality) we can't say decisively whether B is true or false. The syllogism then is that if A is false, B becomes less plausible.

We say it's less plausible because, while there may be another reason for B to be true, we've at least eliminated one possible reason out of an unknown number.

These last two weak syllogisms account for almost all scientists. Is that surprising? The two strong syllogisms are so rare and hard to justify premises for that they're reserved largely for the realm of mathematics. (And even then, what mathematics actually use initially is often these weaker forms and the next just like everyone else. It's only when they want a formal proof do they try hard to use the strongest syllogisms.) So what does our cop use? He uses a still weaker syllogism, that has a different premise entirely:

5: We assume that if A is true, then B becomes more plausible. We observe B as true, therefore A becomes more plausible.

Even though this seems weak when stated formally, our brains nevertheless feel like a policeman's reaction of taking the masked man to be a dishonest criminal has almost all the power of deductive logic! What this means is that this word "plausible" comes in degrees. This will be important later. In the specific cop case, we're assuming that "If a man is a dishonest thief robbing a jewelry store, then it is more plausible for him to be wearing a mask and coming out of a broken window of an alarm-sounding jewelry store." We observe the mask, broken window, and alarm, therefore we become almost entirely certain that this man is also a dishonest thief.

In the case of the clouds before 10am, how much we believe that it will rain depends very much on the darkness of the clouds. Why is this so? Because as humans with memory we have past experiences. Imagine that there's a city where the above cop scenario happens all the time, except all the time it actually is just the partying store owner trying to protect his property finding his window was broken. After a while the cops would just start ignoring such circumstances since they aren't needed: they're making use of prior information to determine what degree of plausibility something should have.

This reasoning about determining how plausible something sounds from past information we usually just call common sense, and much of it is unconscious and very rapid. What's something that goes against your common sense? Maybe that you'll be fine after jumping off a roof, that doesn't sound very plausible. Imagine your shock and subsequent arguments at finding someone who says "Pfft, of course you'll be fine. It's common sense." What explains the differences in plausibilities you each have? Maybe the other person has jumped off many roofs him or herself without harm. In other words, you both have different prior information that you're making use of.

If you want to resolve a debate, you need to first get to common ground with respect to prior information. I lamented in my previous post about how difficult it is to have a debate with someone not at my level in terms of understanding information and probability theory: my arguments rely on prior information they don't have, and it is a lot of prior information. Imagine you're trying to prove to a member of the Piraha tribe that for any equation of the form $Ax^2 + Bx + C = 0$, there are roots at $x = \frac{-B \pm \sqrt{B^2 - 4(A)(C)}}{2A}$. To even state the problem, in the US kids take at least 6 school years to learn the symbols and operations! To actually prove it, should it be disputed in debate, requires prior knowledge of things like "Completing the Square".

Jaynes mentions George Pólya as an author who from 1945 to 1954 wrote three books about plausible reasoning and how, we can learn through experiment, humans tend to follow certain qualitative rules. (And as later experiments have shown, humans also tend to violate certain rules.) Subsequent work by Jaynes and others have been attempts at quantitatively defining such rules, with great success.

### 1.2

Join me next time.

#### Posted on 2011-10-01 by Jach

Tags: bayes, books, math, probability

LaTeX allowed in comments, use $\\...\\$\$ to wrap inline and $$...$$ to wrap blocks.