The weather is back to its (un)usual antics, and so am I, to a degree – a sizable portion of the week was spent attending the Data Storm Big Data Summer School at my alma mater.
One of the nice things about Big Data (provided you can cut through all of the hype and get down to brass tacks) is that industry and academia are mostly on par on a number of topics, so (save for a particular session where I had to bite down and avoid mentioning that doing I/O-bound database benchmarks on virtualized hardware is anything but conclusive) the whole thing provided a sizable amount of food for thought.
And some evening entertainment as well, for even though I had to skip some of the lab sessions due to work commitments, I still managed to do the assignments remotely on college systems and my dinky toy cluster, for it turns out that Spark is actually usable on extremely low end hardware.
I’d had a brief encounter with it back when it was at 0.8 or so and flagged it as “check again when it reaches 1.0”, and although it isn’t a silver bullet, it nicely sidesteps a lot of my gripes with Hadoop – and Spark SQL looks promising, too, even though I still think SQL isn’t the right way to tackle complex queries in this day and age.
But it seems to be a popular enough abstraction to deal with all kinds of data, even event streams – and speaking of that, Esper was also on the menu, and harder to get going with (partly due to its nature and my not having useful event streams handy at home). I’d glanced at it three years ago or so, but never really had the time to try out, so that was another net gain (and another nice addition to my tool kit).