Nothing much to report this week, thanks to a bout of the flu which my sinuses decided to support by providing me with a completely free (and quite painful) earache that has made it very hard to sleep at all.
If you’re interested in that, the right way to do things is by using these images, which work just fine – I’ve used them to test a few things without breaking my existing install, and the base image takes up less than 300MB, so there’s no need to worry about filling up your SD card.
On the “Little Big Data” front, here are a few notes on getting Spark to run – assuming you already know how to set it up in standalone cluster mode, it’s completely painless to get it working with the IPython notebook and have your jobs run on remote executors:
# get the cluster going, so that we can have remote workers /opt/spark/sbin/start-all.sh # tell PySpark we intend to use the IPython notebook IPYTHON_OPTS="notebook --pylab inline --ip=* --port=8889" # start PySpark (and the notebook server), pointed at our master /opt/spark/bin/pyspark --master spark://master:7077
…and that’s it – you automatically get a working
SparkContext as the
sc global inside your notebooks, so you’re good to go1.
Time to grab some more tea (discreetly seasoned with ibuprofen) and see if I can get well quickly enough to be of some use at the office tomorrow.