Bad Data Handbook: Quite Good

Preparing for an upcoming talk, I thought I’d read this book:

bad data handbook cover

(You can find this book online for free – is that legit?)

I like this book quite a lot. It’s a collection of chapters by different authors, and reads something like a series of excellent blog posts. With the exception of chapter 18, it’s quite good. It covers a lot of the issues that arise in practice when gathering and starting to work with data. The explanation of text encoding in chapter 4 could be the best I’ve seen, and chapter 14 (“myths of cloud computing”) is something I wish a lot of people who present themselves as “cloud experts” would read and understand. Philipp K. Janert, author of Data Analysis with Open Source Tools, contributes a very nice chapter as well.

The book closes with a “framework” for data quality, with these “four Cs”:

  • Complete
  • Coherent
  • Correct
  • aCcountable

It’s not bad, this book. I’d recommend it to anyone who needs to work with data in the real world. I think there’s room for even more theory and practice of data cleaning; I’d like to see an even better book yet!

One thought on “Bad Data Handbook: Quite Good

  1. Some theory and practice for data cleaning – Plan Space from Outer Nine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s