You look at a thing.

Is it a duck?

Re-phrase: What is the probability that it’s a duck, if it looks like that?

Bayes’ rule says that the probability of it being a duck, if it looks like that, is the same as the probability of any old thing being a duck, times the probability of a duck looking like that, divided by the probability of a thing looking like that.

This makes sense:

- If ducks are mythical beasts, then (our “prior” on ducks) is very low, and the thing would have to be
*very*duck-like before we’d believe it’s a duck. On the other hand, if we’re at some sort of duck farm, then is high and anything that looks even a little like a duck is probably a duck. - If it’s very likely that a duck would look like that ( is high) then we’re more likely to think it’s a duck. This is the “likelihood” of a duck looking like that thing. In practice it’s based on how the ducks we’ve seen before have looked.
- The denominator normalizes things. After all, we’re in some sense portioning out the probabilities of this thing being whatever it could be. If 1% of things look like this, and 1% of things look like this
*and*are ducks, then 100% of things that look like this are ducks. So is what we’re working with; it’s the denominator.

Here’s an example of a strange world to test this in:

There are ten things. Six of them are ducks. Five of them look like ducks. Four of them both look like ducks and are ducks. One thing looks like a duck but is not a duck. Maybe it’s a fake duck? Two ducks do not look like ducks. Ducks in camouflage. Test the equality of the two sides of Bayes’ rule:

It’s true here, and it’s not hard to show that it must be true, using two ways of expressing the probability of being a duck and looking like a duck. We have both of these:

Check those with the example as well, if you like. Using the equality, we get:

Then dividing by we have Bayes’ rule, as above.

This is not a difficult proof at all, but for many people the result feels very unintuitive. I’ve tried to explain it once before in the context of statistical claims. Of course there’s a wikipedia page and many other resources. I wanted to try to do it with a unifying simple example that makes the equations easy to parse, and this is what I’ve come up with.