Five short links


Photo by Sami Sieranoja

JSON Pointer – For one crazy moment I thought this was an attempt to squeeze C's memory management into Javascript. It's actually a very useful effort to standardize how we describe parts of JSON structures, a bit like XPATH is for XML.

Microsoft and Hadoop – Even the Beast of Redmond digs Hadoop these days. Do I need to be a code hipster and find something more obscure to evangelize now?

Pygmalion – I've been knee-deep in Pig and Cassandra internals for the last week, trying to build an approachable analytics solution for a massive, dynamic data set. It has been something of a struggle, thanks to the combination of my unfamiliarity with both Pig and Cassandra, and the scarcity of other users. I've had some fantastic help from the community though, especially from Jeremy Hanna and Brandon Williams, and I recommend checking out Jeremy's library and talks if you're also wandering into this area.

SMS Corpus – The National University of Singapore has made around 60,000 voluntarily collected text messages in English and Chinese available as a research data set. There's precious little like this available for academic researchers, so asking for contributions is an interesting solution to the privacy problem.

Bill Nguyen – I met Bill briefly at the Color offices, and he is startlingly charismatic. This profile includes some thoughtful quotes from Paul Kedrosky and Eric Ries, but the one that rang most true was the old Hollywood saying that "nobody knows anything". I'm lousy at predicting which companies will go on to success, I have my own mental anti-portfolio of fantastic startups I could have got more deeply involved in. The only way to keep my sanity is to work on products I'm proud of, and hope everything else works out.

