The Information: a History, a Theory, a Flood

This is a really good book.

the_information

James Gleick is excellent. The history is beautifully researched and explained; there is so much content, and it is all fitted together very nicely.

The core topic is information theory, with the formalism of entropy, but perhaps it’s better summarized as the story of human awakening to the idea of what information is and what it means to communicate. It is a new kind of awareness. Maybe the universe is nothing but information! I’m reminded of the time I met Frederick Kantor.

I’m not sure if The Information pointed me to it, but I’ll also mention Information Theory, Inference, and Learning Algorithms by David J.C. MacKay. This book can be read in PDF for free. I haven’t gone all through it, but it seems to be a good more advanced reference.

The Information: Highly recommended for all!

Dataclysm: There’s another book

dataclysm

Dataclysm is a nicely made book. In the Coda (p. 239) we learn something of why:

Designing the charts and tables in this book, I relied on the work of the statistician and artist Edward R. Tufte. More than relied on, I tried to copy it.

The book is not unpleasant to read, and it goes quickly. It may be successful as a popularization. I rather wish it had more new interesting results. Perhaps the author agrees with me; often the cheerleading for the potential of data reads like disappointment with the actuality of the results so far.

The author’s voice was occasionally quite insufferable. He describes himself “photobombing before photobombing was a thing” in a picture with Donald Trump and Mikhail Gorbachev, for example. This anecdote is around an eighth of the text in the second chapter; perhaps more. The chapter is about the value of being polarizing, so if he alienated me there it may count as a success.

In conclusion: the OkTrends blog is fun; there’s also a book version now.

Here Comes Everybody

Harlan mentioned this book so I read it.

herecomeseverybody

It came out back in 2008 and was a lot more timely then, I imagine.

There are lots of interesting tidbits in here. It’s largely anecdote-based, and it uses the word “suasiontwice. Here are some quotes:

… large social systems cannot be understood as a simple aggregation of the behaviors of some nonexistent “average” user.

… it’s easier to like people who are odd in the same ways you are odd, but it’s harder to find them.

… trying something is often cheaper than making a formal decision about whether to try it.

… the question “Do the people who like it take care of each other?” turns out to be a better predictor of success than “What’s the business model?”

Shirky also brings up the Bill Joy quote, “No matter who you are, most of the smart people work for someone else.” This made me wonder whether Google agrees, these days.

I like reciprocal altruism a lot: “With reciprocal altruism, favors are exchanged without formal bookkeeping …” (emphasis mine). This is my preferred way of doing things. The problem seems to be the number of people and anonymity online, and so there are systems with formal bookkeeping like eBay’s buyer/seller rating system, or points on StackOverflow. Is this the direction that everything is moving in? If we end up with zero privacy/anonymity online, will that solve the problem of freeloaders and other bad behavior?

Things I hadn’t previously heard of: asmallworld (gross), Dodgeball (people are still doing this stuff). Also Richard Gabriel‘s Worse Is Better talk (increasingly it seems LISP people have all the ideas).

Maybe the most interesting bit from the book was this forward-looking claim:

So here’s a hypothesis about the near future, based on little more than a hunch and some tantalizing examples: we’re about to experience a revolution in collective action, and the driver of that revolution will be new legal structures that will support productive collective action.

I don’t know if that has happened, or if it is happening. Shirky pointed out that intellectual property was the main collective product at the time of his writing – things like Linux and Wikipedia, where licenses like the GPL protect the product. The only things I think of that are beyond software and writing are products that get kickstarted, for example, and I don’t know if that counts. Restricting to financial structures seems unfortunate. But crowd-funding and anonymous currencies like BitCoin might be the closest thing to steps in this direction, as far as I can see. Meetup was in the book, and doesn’t have any special legal structures for organizations as far as I know. What else am I missing?

Quizz Quotes

I was exploring Google Papers the other day and came across Quizz: Targeted Crowdsourcing with a Billion (Potential) Users by Ipeirotis and Gabrilovich. Downside: occasionally reads like a Google ad. Upside: really interesting results from an experimental Q&A system which is still live. It’s very cool. Here are some quotes with my commentary:

… the strong self-selection of high-quality users to continue contributing, while low-quality users self-select to drop out.

… there is little incentive for unpaid users to continue participating when there is no monetary reward and they are not good at the task.

The goal of the system was not educational, so they celebrate the fact that it isn’t fun if you suck.

These results indicate that users may be more interested in learning about the topic rather than just knowing whether they answered correctly.

The results included that people answer more questions when the interface shows the correct answer as “feedback” rather than just showing “correct” or “incorrect.” This section of experimental results was particularly interesting, including commentary on possible failures of leaderboards.

… as more and more users participate, the achievements of the top users are difficult to match, effectively discouraging users from trying harder.

They did say that a leaderboard including only the last week’s worth of results was more effective.

I’m less interested in the application of this kind of system for crowd-sourcing information, more interested in educational applications, but there is some clear overlap, and cited papers such as The multidimensional wisdom of crowds seem very interesting. Also through Ipeirotis’ blog I found out about Smarterer, which is interesting as well. There’s some sort of spectrum, or multi-dimensional thing going on, with education, crowdsourcing, and evaluation all in the mix.

The authors’ application of information gain and a Markov Decision Process are also interesting.

And Another Thing… from the Hitchhiker’s Guide

Somehow I hadn’t known about Eoin Colfer’s addendum to Douglas Adams’ Hitchhiker’s Guide to the Galaxy series until just recently. Maybe I hadn’t heard about it because it wasn’t terribly good. I don’t know a lot about fan fiction, but I imagine on the fan fiction spectrum it was pretty good.

511vU3LKJUL

My little sister is reading the James Potter series, which is a fan fiction extension of the Harry Potter universe, naturally. That one has gotten so popular that over a million people have read it, apparently. For both these series, were the originals so mind-shatteringly good that they defy imitation? I think it may be that people (including me) fell in love with the originals for reasons not limited to the isolated merits of the work.

I watched Star Wars as a kid on the floor in my grandparents’ living room. It was warm and comfortable and amazingly good. But I don’t think I like the Star Wars movies because they’re the best films out. And I have a hard time hating the newer Star Wars movies. I feel instead a sort of impossible nostalgia for the pasts that children watching them now might recall years hence.

This is all to say, I was aware that Eoin Colfer was not Douglas Adams, but I enjoyed what he did for him.

Daily Rituals is sort of inspiring

Be regular and orderly in your life so that you may be violent and original in your work.

daily rituals coverSomehow I saw that Sam Harris was recommending this book. It looked interesting, so I bought and read it, though much of the material can be read on the blog it started as. It seems to have enjoyed some success now, getting press coverage here and there.

There are a lot of idiosyncrasies described, but the most common thread seems to be one that is rather at odds with the bite-sized micro-sectioning of the book: many many productive people are productive by focusing on work for stretches of three or more hours at a time, as close to undisturbed and undistracted as possible. Nobody seems to write novels while reading buzzfeed lists and watching videos on youtube.

Another interesting tendency is that a lot of people consume a lot of caffeine, and a fairly large number even use amphetamines (it isn’t just Erdős). I’m not sure I’ll start going in for the harder stuff, but it makes me feel better about drinking Red Bull every now and then.

Bad Data Handbook: Quite Good

Preparing for an upcoming talk, I thought I’d read this book:

bad data handbook cover

(You can find this book online for free – is that legit?)

I like this book quite a lot. It’s a collection of chapters by different authors, and reads something like a series of excellent blog posts. With the exception of chapter 18, it’s quite good. It covers a lot of the issues that arise in practice when gathering and starting to work with data. The explanation of text encoding in chapter 4 could be the best I’ve seen, and chapter 14 (“myths of cloud computing”) is something I wish a lot of people who present themselves as “cloud experts” would read and understand. Philipp K. Janert, author of Data Analysis with Open Source Tools, contributes a very nice chapter as well.

The book closes with a “framework” for data quality, with these “four Cs”:

  • Complete
  • Coherent
  • Correct
  • aCcountable

It’s not bad, this book. I’d recommend it to anyone who needs to work with data in the real world. I think there’s room for even more theory and practice of data cleaning; I’d like to see an even better book yet!