Maximal Information Coefficient is a measure of two-variable dependence with some interesting properties. Full information and details on how to calculate MIC using R are available at exploredata.net. Here I calculate the MIC for an example (originally here) that Andrew Gelman posed on his blog.

This is Gelman’s proposed test data, with seed set to 42 for replicability.

```
set.seed(42)
n <- 1000
theta <- runif(n, 0, 2 * pi)
x <- cos(theta)
y <- sin(theta)
plot(x, y)
```

Sure enough this looks like a circle, which is clearly a relationship between `x`

and `y`

, but a standard correlation coefficient will not indicate this.

`cor(x, y)`

`## [1] -0.008817`

The interface to work with MINE is a little clumsy, but this will do it. (Be sure to have the R package `rJava`

installed.)

`df <- data.frame(x, y) write.csv(df, "data.csv", row.names = FALSE) source("MINE.r") MINE("data.csv", "all.pairs")`

`## ********************************************************* ## MINE version 1.0.1d ## Copyright 2011 by David Reshef and Yakir Reshef. ## ## This application is licensed under a Creative Commons ## Attribution-NonCommercial-NoDerivs 3.0 Unported License. ## See ## http://creativecommons.org/licenses/by-nc-nd/3.0/ for ## more information. ## ********************************************************* ## ## ## input file = data.csv ## analysis style = allpairs ## results file name = 'data.csv,allpairs,cv=0.0,B=n^0.6,Results.csv' ## print status frequency = every 100 variable pairs ## status file name = 'data.csv,allpairs,cv=0.0,B=n^0.6,Status.txt' ## alpha = 0.6 ## numClumpsFactor = 15.0 ## debug level = 0 ## required common values fraction = 0.0 ## garbage collection forced every 2147483647 variable pairs ## reading in dataset... ## done. ## Analyzing... ## 1 calculating: "y" vs "x"... ## 1 variable pairs analyzed. ## Sorting results in descending order... ## done. printing results ## Analysis finished. See file "data.csv,allpairs,cv=0.0,B=n^0.6,Results.csv" for output`

`results <- read.csv("data.csv,allpairs,cv=0.0,B=n^0.6,Results.csv") t(results)`

`## [,1] ## X.var "y" ## Y.var "x" ## MIC..strength. "0.6545" ## MIC.p.2..nonlinearity. "0.6544" ## MAS..non.monotonicity. "0.04643" ## MEV..functionality. "0.3449" ## MCN..complexity. "5.977" ## Linear.regression..p. "-0.008817"`

The result is that the MIC for this case is 0.6545, and the relation is highly non-linear. It isn’t clear that 0.65 is the ‘right’ answer for a circle, but it does provide more indication of a relationship than a correlation coefficient of zero does.

A perfectly circular relationship will have mutual information = infinity and MIC = 1.

Well, you know, that seems reasonable – but the number that pops out of the reference implementation is 0.65. Do you think it’s a problem with the implementation, or in some choice in the definition of MIC that the authors started from? Another point mentioned on Gelman’s blog is that people might want to see a measure of the relation between two variables as the degree to which you can predict one if you know the other. In the case of a circle, you can predict perfectly (given the circle) but you need two guesses – which is why I said it isn’t clear what the result should really be.

The degree to which you can predict one variable if you know the other is quantified uniquely by mutual information. This is covered well in Cover and Thomas, “Elements of Information Theory”, chapters 2 and 6. There are infinity bits of mutual information for the circle relationship because, if you know x exactly, you can predict an infinite number of digits of y, though not the sign of y (conceptually it’s like infinity – 1 bits).

Personally, I cannot envision a situation in which using MIC would make sense. It’s just a messed up version of mutual information.

Is it relevant that the sign is in some sense the most significant bit, and a reasonable measure of prediction error will be greatly affected by this one bit? By analogy, if I can predict stock market movement perfectly, except I can only get 50/50 odds for the sign, it’s certainly interesting, but really hardly useful at all.

If you could predict stock market variation perfectly up to an unknown sign, you could still make lots of money. You’d just need a complicated investment strategy. Same goes with digits in a number. No single digit is, inherently, no more or less important than any other. Some are just known better than others. Actually, the specific values of a variable is irrelevant for computing mutual information; all that matters is how distinguishable variables are from one another. This is a fundamental aspect of information theory.

I guess the stock market analogy doesn’t go very far! It still seems to me that regardless of information theory, which I think is quite interesting, in practice some digits are more important than others. The important ones tend to be the ones to the left. The sign is often quite important. But I have to defer to your expertise regarding mutual information. How should I calculate the mutual information for the example above?