A Review of Clojure High Performance Programming

Packt Publishing asked me to review Clojure High Performance Programming, and these are my (largely unfettered) notes on it.

Given with and our recent interest in using Cascalog for a number of things Hadoop, squeezing as much performance out of as possible is of particular interest to me, so I breezed through the introductory chapter and associated definitions and dove into it with relish.

However, as I progressed through the book, I noticed that a fair amount of sections followed a predictable, repeating pattern:

  1. Introduce a concept or language feature and explain its relationship to performance
  2. Tour the relevant bits of or JVM design that are associated with it
  3. Cap off with a terse, generic example that is supposed to demonstrate how it works

But there were a few problems with this approach, at least as far as I’m concerned:

  • There is fairly little in-depth detail in a lot of the explanations – they’re systematic and cover the breadth of the topics at hand, but they don’t go into specifics or convey usable techniques
  • The examples aren’t very practical or dissected in-depth – they formally illustrate the topics, but aren’t delved upon as much as I’d like to and lack applicability

A good example of this is the to Java compilation section, where we’re encouraged to go and fetch a decompiler to analyze the resulting bytecode for our own programs with merely a cursory look at a single, incipient example (little more than an arithmetic operation) without delving into a usable piece of code and how it could (eventually) be made to go faster.

Another is the discussion of immutable data structures, where I expected to find a lot more detail (including, perhaps, code examples and discussion of some techniques) and came away with an O(n) table.

The highlights for me were some of the bits on JVM internals (I haven’t really paid attention to internals since 1.3 or so) and the formal background on testing and performance monitoring, but I’m not sure if those would be appealing to a general audience.

In my opinion the book’s approach ought to have been reversed, focusing less on concepts and more on practical examples – things would probably work out much better if challenging examples were put forward, benchmarked and then dissected to reveal their impact on /JDK internals.

Then you could discuss how those fit into the grand scheme of things, and refer the reader to further sources to boost their theoretical background.

All in all, it reminded me a lot of some of the academic papers I peruse regularly – a gradual buildup of theoretical background and prior art that leads to a set of constraints or comparison of design choices, and a generalized, watered down algorithmic example (or result) that is supposed to embody the core postulate of said paper.

That kind of overview and distillation is fine and good when you’re doing a comparative thesis (you’re expected to demonstrate your firm grasp on an entire field), but it falls short for a programming book, towards which the reader usually has the expectation of being guided along a well-defined path.

But in the end, it all comes down to whether the book is helpful to you – if you want a roadmap for investigation of all the factors associated with JVM performance, the book might suit you just fine (as long as you don’t expect a truly in-depth analysis of performance). After all, it does provide good grounding on the fundamentals of performance measurements, and gives enough pointers for you to research the rest.

As to me, it was a bit of a miss. I don’t think of myself as experienced with (not by a long shot), but I recently (and, typically, rather obsessively) a (conceptually) simple bit of code.

I ended up doing around ten versions of it using simple threading, chunking, atoms, the works, and came across a good deal of the practical information I could glean from the book entirely by myself – if I had read the book earlier, it might have made a difference, but (and this is my main point) only in terms of conceptual background, since few of the techniques I used are discussed there in any sort of depth – some (like atoms) are mentioned and their internals are delved into, but their impact isn’t measured nor are you guided on how best to apply them.

And that, I think, is the main point – if you’re looking for a cookbook of usable techniques, this isn’t it. But if you need a suitable frame of reference for thinking about performance in , it might be a good starting point – and it certainly seems like a solid enough foundation for a second edition to build and expand upon.