Five short links

Photo by Metrix X

How to not log personally-identifiable information – IP addresses are PII, so removing them from your server logs should be standard practice unless you have a specific need.

Inside Google's MapReduce infrastructure – Bloody hell, they're processing one exabyte of data a month! I didn't even know the term for 1,000 terabytes before, that's an astonishing number.

Netflix cloud storage – A white paper on Netflix's use of SimpleDB. I have to admit I've given up on it as a solution, the obstacles to large data loads overwhelmed me, but great to see they've had success.

Feedera – An intriguing take on 'personalized pagerank' for surfacing interesting Twitter articles. Had a great geek out last night with its creator Sachin Rekhi too.

Tealeaf – Remember that data entry field that cost Expedia $12m in lost sales? Julian Green had lots of similar tales to tell from his Ebay experiences, and apparently Tealeaf is a great tool for analyzing and diagnosing that sort of customer behavior.

