Twelve steps to running your Ruby code across five billion web pages

Photo by Andrew Ferguson Common Crawl is one of those projects where I rant and rave about how world-changing it will be, and often all I get in response is a quizzical look. It's an actively-updated and programmatically-accessible archive of public web pages, with over five billion crawled so far. So what, you say? This … Continue reading Twelve steps to running your Ruby code across five billion web pages