Five short links

Fiveoclock
Photo by Metrix X

How to not log personally-identifiable information – IP addresses are PII, so removing them from your server logs should be standard practice unless you have a specific need.

Inside Google's MapReduce infrastructure – Bloody hell, they're processing one exabyte of data a month! I didn't even know the term for 1,000 terabytes before, that's an astonishing number.

Netflix cloud storage – A white paper on Netflix's use of SimpleDB. I have to admit I've given up on it as a solution, the obstacles to large data loads overwhelmed me, but great to see they've had success.

Feedera – An intriguing take on 'personalized pagerank' for surfacing interesting Twitter articles. Had a great geek out last night with its creator Sachin Rekhi too.

Tealeaf – Remember that data entry field that cost Expedia $12m in lost sales? Julian Green had lots of similar tales to tell from his Ebay experiences, and apparently Tealeaf is a great tool for analyzing and diagnosing that sort of customer behavior.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: