How to calculate Gini Coefficient from raw data in Python

The Gini Coefficient is a measure of inequality. It’s well described on its wiki page and also with more simple examples here.

I don’t find the implementation in the R package ineq particularly conversational, and also I was working on a Python project, so I wrote this function to calculate a Gini Coefficient from a list of actual values. It’s just a fun little integration-as-summation. Not bad!

def gini(list_of_values):
  sorted_list = sorted(list_of_values)
  height, area = 0, 0
  for value in sorted_list:
    height += value
    area += height - value / 2.
  fair_area = height * len(list_of_values) / 2
  return (fair_area - area) / fair_area

To me this is fairly readable and maps nicely to the mental picture of adding up the area under the Lorenz curve and then comparing it to the area under the line of equality. It’s just bars and triangles! And I don’t think it’s any less performant than the ineq way of calculating it.

(update: lalala, I think there are some edge cases where the standard way of calculating gini and this way are not in agreement; I’ll look into it if I ever think about this again – feel free to figure it out and leave a comment!)

One thought on “How to calculate Gini Coefficient from raw data in Python

  1. Thanks for this. I’m playing around with using gini coefficient to measure competitiveness in NBA seasons and saved me a lot of time trying to program the calculation for this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s