Clustered R-squared Heat-Maps in R

Sometimes you want to quickly see how much variables are related to one another (linearly, here). You might be thinking about doing factor analysis or some such thing. You’d like to see if the variables can be neatly separated into various groups, perhaps. Here’s a function that I wrote for this purpose:

clusterRsquared <- function(dataframe) {
  dissimilarity <- 1 - cor(dataframe)^2
  clustering <- hclust(as.dist(dissimilarity))
  order <- clustering$order
  oldpar <- par(no.readonly=TRUE); par(mar=c(0,0,0,0))
  image(dissimilarity[order, rev(order)], axes=FALSE)
  par(oldpar)
  return(1 - dissimilarity[order, order])
}

Call it like this, for example:

round(clusterRsquared(mtcars),2)

You’ll get output like this:

     drat   am gear  mpg   wt   hp  cyl disp carb qsec   vs
drat 1.00 0.51 0.49 0.46 0.51 0.20 0.49 0.50 0.01 0.01 0.19
am   0.51 1.00 0.63 0.36 0.48 0.06 0.27 0.35 0.00 0.05 0.03
gear 0.49 0.63 1.00 0.23 0.34 0.02 0.24 0.31 0.08 0.05 0.04
mpg  0.46 0.36 0.23 1.00 0.75 0.60 0.73 0.72 0.30 0.18 0.44
wt   0.51 0.48 0.34 0.75 1.00 0.43 0.61 0.79 0.18 0.03 0.31
hp   0.20 0.06 0.02 0.60 0.43 1.00 0.69 0.63 0.56 0.50 0.52
cyl  0.49 0.27 0.24 0.73 0.61 0.69 1.00 0.81 0.28 0.35 0.66
disp 0.50 0.35 0.31 0.72 0.79 0.63 0.81 1.00 0.16 0.19 0.50
carb 0.01 0.00 0.08 0.30 0.18 0.56 0.28 0.16 1.00 0.43 0.32
qsec 0.01 0.05 0.05 0.18 0.03 0.50 0.35 0.19 0.43 1.00 0.55
vs   0.19 0.03 0.04 0.44 0.31 0.52 0.66 0.50 0.32 0.55 1.00

And a plot like this, which is much easier to start investigating visually:

clustered heat map

Is this necessarily the One True Clustering? No, but it isn’t terribly bad either.

e is the best base

Here’s a fun little problem: What list of non-negative integers that sum to 25 will give you the biggest product when multiplied together? You can think of it as breaking 25 into pieces (additively) and then multiplying the pieces – and it’s that product that you want to be large. So the list of twenty-five ones is pretty bad, because 1^{25}=1. The list of ten and fifteen is better, because 10 \cdot 15 = 150. The list of five fives is better yet, because 5^5 = 3125. Can you do better?

(Think about it for a minute if you want.)

Okay, the answer is this: 2^2 \cdot 3^7 = 8748. It’s the list of two two’s (or one four, it doesn’t matter) and seven threes.

I noticed all these threes and remembered something I once heard in a computer science course, about how three is really the optimal base in terms of balancing number of unique digit symbols and number of digits needed to store a given number. (Binary, base two, has “too few” symbols – just 0 and 1 – which means lots of digits needed to store a big number, while decimal, base ten, has “too many” symbols – ten of them – but correspondingly uses fairly few digits to represent even largish numbers.) Of course it wasn’t really three that was best, it was e (wiki), which is around 2.71. And of course computers use binary anyway, so it was kind of beside the point. But I remembered this statement, which I never saw an explanation for or any other mention of.

Now I offer that the same sort of thing is going on here, and that you will be able to correctly answer any question like the above (not just for 25) by using as many threes as possible and then being smart with what’s left over. And it’s because 3 is close to e!

If we relax the rules about whole numbers and agree that we want to use all the same number (convince yourself) then the problem is now for whatever starting number N (the 25 from the original problem) we want to find the x' that maximizes \left(\frac{N}{x'}\right)^{x'} or equivalently (with x = \frac{N}{x'}) find the x that maximizes x^{\left(\frac{N}{x}\right)}. Using the standard method for maximizing, the N quickly drops out and soon we have shown that x = e. (Thanks Joe!)

The way I first investigated this was by choosing 100 and using R to try a bunch of numbers and plot the results. The products get large and I don’t care much about their absolute values, and the curvature changes if you use numbers other than 100 to start from, but the graph always looks basically like this (code):

Rplot

So now we have both real math (calculus) and computational-experiment demonstrations of yet another way that e is cool! I don’t know if it’s worth adding to this list, but I like it.

Some quotes from If on a winter’s night a traveler by Italo Calvino

You know that the best you can expect is to avoid the worst.

The dimension of time has been shattered; we cannot love or think except in fragments of time each of which goes off along its own trajectory and immediately disappears.

What you would like is the opening of an abstract and absolute space and time in which you could move, following an exact, taut trajectory; but when you seem to be succeeding, you realize that you are motionless, blocked, forced to repeat everything from the beginning.

…it is my relationship with my life, consisting of things never concluded and half erased…

a novel that gives the sense of living through an upheaval that still has no name, has not yet taken shape….

And so if by chance I happen to dwell on some ordinary detail of an ordinary day, … , I can be sure that even in this tiny, insignificant episode there is implicit everything I have experienced, all the past, the multiple pasts I have tried in vain to leave behind me, the lives that in the end are soldered into an overall life, my life, which continues even in this place …

It was all very well for me to say that every time I had landed in a jam I had always extricated myself, from every lucky situation as well as from every disaster.

And as well, I say, this might be the time when I can convince myself that all my pasts are burned and forgotten, as if they had never existed.

“… They have been waiting for me for some time, since I wired from Switzerland that I had managed to persuade that elderly author of thrillers to entrust to me the beginning of the novel he was unable to continue, assuring him that our computers would be capable of completing it easily, programmed as they are to develop all the elements of a text with perfect fidelity to the stylistic and conceptual models of the author.”

… now one can ask of the novel only to stir a depth of buried anguish, …

That in every experience you take for granted a dissatisfaction that can be redeemed only in the sum of all dissatisfactions?

One reads alone, even in another’s presence.

What makes lovemaking and reading resemble each other most is that within both of them times and spaces open, different from measurable time and space.

I would like to be able to write a book that is only incipit, that maintains for its whole duration the potentiality of the beginning, the expectation still not focused on an object.

I expect readers to read in my books something I didn’t know, but I can expect it only from those who expect to read something they didn’t know.

She explained to me that a suitably programmed computer can read a novel in a few minutes and record the list of all the words contained in the text, in order of frequency. “That way I can have an already completed reading at hand,” Lotaria says, “with an incalculable savings of time. What is the reading of a text, in fact, except the recording of certain thematic recurrences, certain insistences of forms and meanings? An electronic reading supplies me with a list of the frequencies, which I have only to glance at to form an idea of the problems the book suggests to my critical study. Naturally, at the highest frequencies the list records countless articles, pronouns, particles, but I don’t pay them any attention. I head straight for the words richest in meaning; they can give me a fairly precise notion of the book.”

Perhaps instead of a book I could write lists of words, in alphabetical order, an avalanche of isolated words which expresses that truth I still do not know, and from which the computer, reversing its program, could construct the book, my book.

… the gay and carefree air with which certain children who grow up amid bitter family dissension defend themselves against their surroundings …

… eyes that, like those of children, look at an eternal present without forgiveness.

… always curious, always insatiable reading that managed to uncover truths hidden in the most barefaced fake, and falsity with no attenuating circumstances in words claiming to be the most truthful.

The book is an accessory aid, or even a pretext.

“Don’t be amazed if you see my eyes wandering. In fact, this is my way of reading, and it is only in this way that reading proves fruitful for me. If a book truly interests me, I cannot follow it for more than a few lines before my mind, having seized on a thought that the text suggests to it, or a feeling, or a question, or an image, goes off on a tangent and springs from thought to thought, from image to image, in an itinerary of reasonings and fantasies that I feel the need to pursue to the end, moving away from the book until I have lost sight of it. The stimulus of reading is indispensable to me, and of meaty reading, even if, of every book, I manage to read no more than a few pages. But those few pages already enclose for me whole universes, which I can never exhaust.”

“Reading is a discontinuous and fragmentary operation. Or, rather, the object of reading is a punctiform and pulviscular material. …”