Data Patterns – A pithy, useful and opinionated (in a good way) collection of advice and techniques for dealing with common data problems, from parsing HTML, threading scrapers and the joy of CSV for data storage. It's early days and there's lots more to be filled out, but what's there is great.
The Guild of Silicon Valley – This article makes me want to grow a chin beard. One funny thing about the 'new wave' of data technologies like Hadoop, Lucene and Cassandra is that they're written in Java, a language most startup web developers avoid like the plague. The painful thing about Java and C++ is that they force you to think hard up front about what you're building before you dive in. The insight of agile programming is that for smaller projects that's a waste, but these show you still need it for industrial-grade frameworks. Or maybe it's just that Doug Cutting's a force of nature and it happens to be his favorite language, since he's responsible for two of the three projects above?
WeoGeo – The interface is mind-boggling, but if you persevere. there's a rich set of free and commercial geographic data sets available. I discovered a compendium of cell tower locations from the FCC I was unaware of, amongst other goodies.
Scaling Up Machine Learning – Solid advice from people who've obviously been fighting in the trenches.
Xeround – I'm tired of spending my time dealing with database housekeeping for uninteresting transactional data problems, so I love the idea of a relational database that just works, a turnkey service that I don't have to set up but that can still scale. I haven't used it or similar services like ScaleDB, so I'm sure there's caveats, but it's a problem that needs solving. Today it feels like I have to build my own power plant just to get electricity. I'd much rather pay somebody else to deal with a lot of the solved database issues so I can focus on the more interesting problems.