Open Data Handbook Launched – I love what the Open Knowledge Foundation are doing with their manuals. Documentation is hard and unglamorous work, but has an amazing impact. I'm looking forward to their upcoming title on data journalism.
My first poofer Workshop – This one's already gone, but I'm hoping there will be another soon. I can't think of a better way to spend an afternoon than learning to build your very own ornamental flamethrower.
Using photo networks to reveal your home town – Very few people understand how the sheer volume of data that we're producing makes it possible to produce scarily accurate guesses from seemingly sparse fragments of information. When you look at a single piece in isolation it looks harmless, but pull enough together and the result becomes very revealing.
Introducing SenseiDB – Another intriguing open-source data project from LinkedIn. There's a strong focus on the bulk loading process, which in my experience is the hardest part to engineer. Reading the documentation leaves me wanting more information on their internal DataBus protocol, I bet that includes some interesting tricks.
IPUMS and NHGIS – As someone who recently spent far too long trying to match the BLS's proprietary codes for counties with the US Census's FIPS standard, I know how painful the process of making statistics usable can be. There's a world of difference between a file dumps in obscure formats with incompatible time periods and units, and a clean set that you can perform calculations on. I was excited to discover the work being done at the University of Minnesota to create unified data sets that cover a long period of time, and much of the world.