Bayes’ Rule for Ducks

You look at a thing.


Is it a duck?

Re-phrase: What is the probability that it’s a duck, if it looks like that?

Bayes’ rule says that the probability of it being a duck, if it looks like that, is the same as the probability of any old thing being a duck, times the probability of a duck looking like that, divided by the probability of a thing looking like that.

\displaystyle Pr(duck | looks) = \frac{Pr(duck) \cdot Pr(looks | duck)}{Pr(looks)}

This makes sense:

  • If ducks are mythical beasts, then Pr(duck) (our “prior” on ducks) is very low, and the thing would have to be very duck-like before we’d believe it’s a duck. On the other hand, if we’re at some sort of duck farm, then Pr(duck) is high and anything that looks even a little like a duck is probably a duck.
  • If it’s very likely that a duck would look like that (Pr(looks|duck) is high) then we’re more likely to think it’s a duck. This is the “likelihood” of a duck looking like that thing. In practice it’s based on how the ducks we’ve seen before have looked.
  • The denominator Pr(looks) normalizes things. After all, we’re in some sense portioning out the probabilities of this thing being whatever it could be. If 1% of things look like this, and 1% of things look like this and are ducks, then 100% of things that look like this are ducks. So Pr(looks) is what we’re working with; it’s the denominator.

Here’s an example of a strange world to test this in:


There are ten things. Six of them are ducks. Five of them look like ducks. Four of them both look like ducks and are ducks. One thing looks like a duck but is not a duck. Maybe it’s a fake duck? Two ducks do not look like ducks. Ducks in camouflage. Test the equality of the two sides of Bayes’ rule:

\displaystyle Pr(duck | looks) = \frac{Pr(duck) \cdot Pr(looks | duck)}{Pr(looks)}

\displaystyle \frac{4}{5} = \frac{\frac{6}{10} \cdot \frac{4}{6}}{\frac{5}{10}}

It’s true here, and it’s not hard to show that it must be true, using two ways of expressing the probability of being a duck and looking like a duck. We have both of these:

\displaystyle Pr(duck \cap looks) = Pr(duck|looks) \cdot Pr(looks)

\displaystyle Pr(duck \cap looks) = Pr(looks|duck) \cdot Pr(duck)

Check those with the example as well, if you like. Using the equality, we get:

\displaystyle Pr(duck|looks) \cdot Pr(looks) = Pr(looks|duck) \cdot Pr(duck)

Then dividing by Pr(looks) we have Bayes’ rule, as above.

\displaystyle Pr(duck | looks) = \frac{Pr(duck) \cdot Pr(looks | duck)}{Pr(looks)}

This is not a difficult proof at all, but for many people the result feels very unintuitive. I’ve tried to explain it once before in the context of statistical claims. Of course there’s a wikipedia page and many other resources. I wanted to try to do it with a unifying simple example that makes the equations easy to parse, and this is what I’ve come up with.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s