Yesterday I had lunch with Stan James of Lijit fame, and it was a blast. One of the topics that’s fascinated both of us is breaking down the walls that companies put up around your data. In the 90’s it was undocumented file formats and this decade it’s EULAs on web services like Facebook. The intent is to keep your data locked in to a service, so that you’ll remain a customer, but what’s interesting is that they don’t have any legal way of enforcing exactly that. Instead they forbid processing the data with automated scripts and giving out your account information to third-party services. It’s pretty simple to detect when somebody’s using a robot to walk your site, and so this is easy to enforce.
The approach I took with Google Hot Keys was to rely on users themselves to visit sites and view pages. I was then able to analyze and extract semantic information on the client side, as a post processing step using a browser extension. It would be pretty straightforward to do the same thing on Facebook, sucking down your friends information every time you visited their profile. I Am Not A Lawyer, but this sort of approach is both impossible to detect from the server side and seems hard to EULA out of existence. You’re inherently running an automated script on the pages you receive just to display them, unless you only read the raw HTTP/HTML responses.
So why isn’t this approach more popular? One thing both me and Stan agreed on is that getting browser plugins distributed is really, really hard. Some days the majority of Google’s site ads seem to be for their very useful toolbar, but based on my experience only a tiny fraction of users have it installed. If Google’s marketing machine can’t persuade people to install client software, it’s obvious you need a very compelling proposition before you can get a lot of uptake.