For a while now I've been visiting companies and doing a 'brown bag' lunch, where I gather a bunch of engineers and database people, and walk them through writing their own simple MapReduce jobs in Python. Ever since I discovered how straightforward the MapReduce approach actually was behind the intimidating jargon, I've been on a mission to spread the word. You can write a useful MapReduce job using just a couple of simple Python scripts, run it from the Unix command line, and then take the same scripts and run them as Hadoop streaming jobs. A few months ago I got together with the O'Reilly team and filmed an extended version of one of those training sessions, which I'm hoping will help my message reach a wider audience.
I used to think that MapReduce was an esoteric, academic approach to data processing that was too much trouble to learn. Once I wrapped my head around it, I realized how simple and useful it actually is, so my goal is to help other people over that same hump, and start using it in their daily work. The main link to the course is at:
It's $20 for the full two hour video, but check out the free preview to get a flavor before you buy. A big thanks to the students who volunteered their day. It turned out to be a long recording session, thanks to some technical issues in the second half, but they were all wonderfully patient and fantastic collaborators.