One of the most interesting questions to come out of the Facebook debate was about making public data more easily accessible. Everything I was looking at releasing was available through a Google search and through many other commercial companies, so in a simplistic view it was already completely public and releasing it in a convenient form made no difference. However that doesn't match our intuitive reactions, we are a lot more relaxed when data is theoretically available to anyone but hard to get to than when there's an easy way to access it.
One of my favorite researchers in this area, Arvind Narayanan, recently started a series of articles that try to turn this gut reaction into a usable model. I also spent a very productive lunch with Jud Valeski, Josh Fraser and Jon Fox hashing out the implications of the coming wave of accessibility, so here's a few highlights from that discussion.
Prop 8. Information about donors to political campaigns has always been public, but traditionally required a visit to city hall to dig through piles of paper. Suddenly the donors behind Prop 8 in California found themselves listed on a map anyone could access on the internet. While predictions of violence or boycotts didn't materialize, Scott Eckern ended up resigning from his job once his donation became widely known. I'm pretty certain he wasn't aware that his donation would be public knowledge, it's a clear case where the the distribution channel made the information much more powerful.
InfoUSA. Imagine a thought experiment where I downloaded the income, charitable donations, pets and military service information for all 89,000 Boulder residents listed in InfoUSA's marketing database, and put that information up in a public web page. That's obviously pretty freaky, but absolutely anyone with $7,000 to spare can grab exactly the same information! That intuitive reaction is very hard to model. Is it because at the moment someone has to make more of an effort to get that information? Do we actually prefer that our information is for sale, rather than free? Or are we just comfortable with a 'privacy through obscurity' regime?
So what's my conclusion? On the one hand, the web has created so many amazing innovations because it's a fantastic way to make information more available, and initial privacy concerns have faded into the background as people become more used to services. On the other, the jury's not back on how the revolution will end. Is everyone really going to be their own public broadcaster on Twitter, or are we going to retreat into more private forums in the wake of future freakouts? I don't know the answer, but everyone working in this area needs to be thinking about more than the technical aspects of data accessibility.