Use counts, not percentages

Consider this data:

total    part   percent
  765      47        6%

Clearly, there is some redundancy. Both part and percent express the same thing.

With infinite precision, you could use either part or percent at your pleasure. However, in the common case where the counts (total and part) are integers and the percentage(s) are not, computers will store the integers generally much more nicely and compactly than nasty decimal things (floats or string representations).

Percentages also commonly get rounded off, in which case information is lost. In the above example, 6% of 765 could be anything from 43 to 49, and possibly even more depending on what precision is used for the calculation.

The moral of the story is that for data, you should always use counts, not percentages.

2 thoughts on “Use counts, not percentages

  1. A very similar message to what John Rauser said at his Velociry 2011 talk, “Look At Your Data”:

    He called summary statistics are “lossy compression algorithms.” In the case you presented, a percentage is basically a lossy compression algorithm, too.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s