# Bayes’ Rule for Ducks

You look at a thing.

Is it a duck?

Re-phrase: What is the probability that it’s a duck, if it looks like that?

Bayes’ rule says that the probability of it being a duck, if it looks like that, is the same as the probability of any old thing being a duck, times the probability of a duck looking like that, divided by the probability of a thing looking like that.

$\displaystyle Pr(duck | looks) = \frac{Pr(duck) \cdot Pr(looks | duck)}{Pr(looks)}$

This makes sense:

• If ducks are mythical beasts, then $Pr(duck)$ (our “prior” on ducks) is very low, and the thing would have to be very duck-like before we’d believe it’s a duck. On the other hand, if we’re at some sort of duck farm, then $Pr(duck)$ is high and anything that looks even a little like a duck is probably a duck.
• If it’s very likely that a duck would look like that ($Pr(looks|duck)$ is high) then we’re more likely to think it’s a duck. This is the “likelihood” of a duck looking like that thing. In practice it’s based on how the ducks we’ve seen before have looked.
• The denominator $Pr(looks)$ normalizes things. After all, we’re in some sense portioning out the probabilities of this thing being whatever it could be. If 1% of things look like this, and 1% of things look like this and are ducks, then 100% of things that look like this are ducks. So $Pr(looks)$ is what we’re working with; it’s the denominator.

Here’s an example of a strange world to test this in:

There are ten things. Six of them are ducks. Five of them look like ducks. Four of them both look like ducks and are ducks. One thing looks like a duck but is not a duck. Maybe it’s a fake duck? Two ducks do not look like ducks. Ducks in camouflage. Test the equality of the two sides of Bayes’ rule:

$\displaystyle Pr(duck | looks) = \frac{Pr(duck) \cdot Pr(looks | duck)}{Pr(looks)}$

$\displaystyle \frac{4}{5} = \frac{\frac{6}{10} \cdot \frac{4}{6}}{\frac{5}{10}}$

It’s true here, and it’s not hard to show that it must be true, using two ways of expressing the probability of being a duck and looking like a duck. We have both of these:

$\displaystyle Pr(duck \cap looks) = Pr(duck|looks) \cdot Pr(looks)$

$\displaystyle Pr(duck \cap looks) = Pr(looks|duck) \cdot Pr(duck)$

Check those with the example as well, if you like. Using the equality, we get:

$\displaystyle Pr(duck|looks) \cdot Pr(looks) = Pr(looks|duck) \cdot Pr(duck)$

Then dividing by $Pr(looks)$ we have Bayes’ rule, as above.

$\displaystyle Pr(duck | looks) = \frac{Pr(duck) \cdot Pr(looks | duck)}{Pr(looks)}$

This is not a difficult proof at all, but for many people the result feels very unintuitive. I’ve tried to explain it once before in the context of statistical claims. Of course there’s a wikipedia page and many other resources. I wanted to try to do it with a unifying simple example that makes the equations easy to parse, and this is what I’ve come up with.

# A micro-intro to ggmap

This describes what we did in the break-out session I facilitated for the illustrious Max Richman‘s Open Mapping workshop at Open Data Day DC. For more detail, I recommend the original paper on ggmap.

ggmap is an R package that does two main things to make our lives easier:

• It wraps a number of APIs (chiefly the Google Maps API) to conveniently facilitate geocoding and raster map access in R.
• It operates together with ggplot2, another R package, which means all the power and convenience of the Grammar of Graphics is available for maps.

To install ggmap in R:

install.packages("ggmap")


Then you can load the package.

library(ggmap)



One thing that ggmap offers is easy geocoding with the geocode function. Here we get the latitude and longitude of The World Bank:

address <- "1818 H St NW, Washington, DC 20433"

##      lon  lat
## 1 -77.04 38.9


The ggmap package makes it easy to get quick maps with the qmap function. There are a number of options available from various sources:

# A raster map from Google
qmap("Washington, DC", zoom = 13)


# An artistic map from Stamen
qmap("Washington, DC", zoom = 13, source = "stamen",
maptype = "watercolor")


Since we were at The World Bank, here’s a quick map showing where we were. This shows for the first time how ggplot2 functions (geom_point here) work with ggmap.

bankmap <- qmap(address, zoom = 16, source = "stamen",
maptype = "toner")
aes(x = lon, y = lat),
color = "red",
size = 10)


To connect with Max’s demo, we can load in his data about cities in Ghana.

ghana_cities <- read.csv("ghana_city_pop.csv")


We’ll pull in a Google map of Ghana and then put dots for the cities, sized based on estimated 2013 population.

ghanamap <- qmap("Ghana", zoom = 7)
ghanamap + geom_point(data = ghana_cities,
aes(x = longitude, y = latitude,
size = Estimates2013), color = "red") +
theme(legend.position = "none")


Another useful feature to note is the gglocator function, which let’s you click on a map and get the latitude and longitude of where you clicked.

gglocator()


This is all the tip of the iceberg. You’ll probably want to know more about ggplot2 if you’re going to make extensive use of ggmapRMaps is another (and totally different) great way to do maps in R.

This document is also available on RPubs.

# doge coding: much wow

I have recently come across two more or less doge-titled educational resources for coding. This definitely constitutes a trend.

First up is Learn You a Haskell for Great Good!. I’m pretty sure the title includes the exclamation point. It’s a free book about Haskell, of course. (You can also buy it if you want.)

Last up is Learn You The Node.js For Much Win!. Same deal with the exclamatory title. This one is a command-line interactive tutorial about node.js that runs on workshopper. I found out about this after first hearing about a similar thing for git called git-it.

I, for one, would love to see these somehow form the basis for an entire line of amusingly titled “Learn you” books (and so on).

# Set up git/hub on Ubuntu

This is an abbreviated version of the official help (e.g., github’s) that will work on Ubuntu.

Git will want to know who you are. It’ll tell you to do this if you try to commit and haven’t, so just go ahead and do it:

git config --global user.email "you@example.com"
git config --global user.name "Your Name"

To easily work with github, you need to have a key and put the public one into github so it knows you.

ssh-keygen -t rsa -C "you@example.com"
cat ~/.ssh/id_rsa.pub

Copy the output from that last command and then paste it as a new key in the github SSH settings section. You should be good to go!

# A shared playground on Ubuntu

After setting up a machine, I’d like to set up a bunch of users who can log in and give them a common space in which to do some work. The goal is convenience for demonstration and education.

Assume usernames are in a file called names.txt, one per line. This will create users with those names, put them in the users group, and make their passwords “none“. As root:

cat names.txt | while read line
do
adduser --gecos "" --disabled-password $line adduser$line users
echo \$line:none | chpasswd
done

Now those users should really log in and change their passwords with passwd. Up next, we make a shared directory that everybody has access to.

mkdir /home/shared
chgrp users /home/shared
chmod g+w /home/shared
chmod g+s /home/shared

That makes the directory, sets the group to users, gives group members write access, and sets the “sticky” bit so that files created in the directory will have the users group.

# Official GeekyBack and “SSH in a Box” Lyrics

You may be familiar with Justin Timberlake’s SexyBack

GeekyBack

I’m bringing geeky back.
Those other newbs don’t know how to hack
If you’re so 1337, why’d you buy a Mac?
Sudo apt-get and I’ll complete your stack.

[Bridge]
Dirty data
You see I’ve tackled
unicode in waves.
I’d keep you open if you’d only save.
It’s just that no one hexed in ASA.

[Repeat 6 times]

I’m bringing geeky back
Them other … don’t know how to hack
Come let me pyparse all up in your stack.
‘Cause your regex sucks I gotta fix it fast.

[Bridge]

I’m bringing hexy back
You subpar hackers watch my grep attack.
If that’s your URL you best urlparse a path
‘Cause your punctuation is all outta whack.

Take ’em to the meetup
[Chorus]

[Chorus]
Come here URL
come for the hack
S-E-D
Scripts in C?

You may also know The Lonely Island’s Dick in a Box

SSH in a Box

Not gonna mail you a cleartext string
Cause creds like that don’t hide anything.
Not gonna force another RSA start

Not gonna default you to homelessville
Cause a password like that is the kind they steal.
Wanna MD5 you to a UPC code
Somethin’ unique, girl.

It’s ssh in a box
ssh in a box, babe
It’s ssh in a box
Ooh, ssh in a box, girl

I’m secure enough to know
You need encryption, and I got just the key,
A two factor key–that’s right–a second plus one.

To all the admins out there with users to protect
It’s easy to do just follow these steps…

Step 1, apt-get on that box
Step 2, open ports on that box
Step 3, ssh to that box
And that’s the way you do it

It’s ssh in a box!

Special thanks to the anonymous poet. Inspired by Travis Hoppe‘s excellent PyParsing talk at Data Wranglers DC, Helping data get its sexy back. And, of course, shellinabox.