Distributed MapReduce in Python

Distributed MapReduce in Python ➹

I had quite a bit of fun processing some logs with this during the week, spreading the load across several machines running pypy. It’s a trifle IO-bound and requires clever arrangement of your data source as a generator (preferably handling chunks with each iteration) but it automagically shunts off your data and your map and reduce functions across the network to worker threads that can join and leave the cluster dynamically, so it’s pretty much zero effort to set up and trivial to modify. Not too shabby for a mere 13K of nearly magical code.

Tao of Mac

Distributed MapReduce in Python ➹