As we share more and more data about our lives, there's a lot of discussion about what organizations should be allowed to do with this information. The longer I've spent in this world, the more I think that this might be a pointless debate. Controlling what happens with our data requires punishing people who are caught misusing it. The trouble is, how do you tell where they got it from?
If an organization has your name, friends and interests, here's just a few of the places that information could have come from:
– Your Facebook account, via the API or hacking.
– Your email inbox, through a browser extension or hacking, gathering a list of the people you mail and purchase confirmations.
– Your phone company, analyzing the calls you get and the URLs you navigate to on your smart phone.
– Your credit card company. They'd have trouble with the friends, though theoretically spotting split checks and simultaneous purchases should be a strong clue.
– Retailers sharing data with each other about their customers.
There are now so many ways of gathering facts about your life, that it's usually impossible to tell where a particular set of data came from. You can inject fake Mountweazel values into databases to catch unskilled abusers, but as soon as there's multiple independent sources for any given fact, you can avoid them by only including values that are present in more than one of them. If I know your postal address, can you prove that I hacked into the DMV, rather than just getting it from a phone book or one of your friends?
In practice this means that creepy marketing data gathered by underhand means can be easily laundered into openly-sold data sets, since nobody can prove it has murky origins. This has always been theoretically possible, but what has changed is that there's now so many copies of our personal data floating around, it's far easier to gather and harder to trace. From a technical point of view I don't see how we can stop it, as long as we continue to instrument our lives more and more.
I'm actually very excited by the new world of data we're moving into, but I'm worried that we're giving people false assurances about how much control they can keep over their information. On the other hand the offline marketing world has gathered detailed data on all of us for decades without raising much public outrage, so maybe we don't really care?