Data, Story Telling and Mass Murders

Photo from the Calvin propaganda collection

I've just finished Bloodlands, a scholarly but powerful overview of the millions of killings committed by both the Soviets and the Nazis before and during World War II. What was especially striking was how little the murders relied on technology. I was more familiar with the western portion of the Holocaust, which required an elaborate system of transportation and bureacracy to carry out its murders. Most of the eastern victims were simply rounded up where they lived, forced to dig their own graves and then shot.

This is important because the threat model we use for privacy is based on the dangers of giving a state bureacracy too much information about our lives, to feed into a sophisticated Big Brother operation that constantly monitors us. The chilling examples of both Germany under the Nazis and communism demonstrate how much truth there is to that fear, but I think it also causes us to miss other risks.

Think about the Rwandan genocide. The killings weren't carried out using a sophisticated organization, it was people killing their neighbors with machetes. What the Nazis, Soviets and Rwanda all had in common was a sophisticated and effective propaganda campaign. They successfully convinced large numbers of people that unarmed men, women and children were deadly threats who had to be exterminated. Hutu broadcasters branded Tutsis as 'cockroaches', Nazi propaganda claimed Jews poisoned Aryan children, Stalin claimed the Ukranians were deliberately witholding food and causing starvation.

Story-telling like this is a vital component for every genocide. Killings on that sort of scale require active effort from hundreds of thousands of people, and passive acquiesance from millions. This requires a massive amount of motivation, and the only way to drive that is through effective propaganda.

What this doesn't explain is why the propaganda succeeded so well in those particular cases but not in others. My theory is that the recent introduction of new communications technologies should take a lot of the blame. The Nazis were able to harness films and radio to spread their message, the Hutus used radio almost exclusively. What the situations have in common is an elite who discover a way to tell stories in a very powerful way, thanks to a previously untapped medium. They succeed because the audience lacks a healthy scepticism. Stalin could edit people out of photos and be believed because 'Cameras don't lie'.

In this model, a new form of media is like an infection hitting a previously unexposed population. Some people figure out how it can be used to breach the weak spots in the audience's mental 'immune system', how to persuade people to believe lies that serve the propagator's purpose. Eventually  the deviation from reality becomes too obvious, people wise up to the manipulation and a certain level of immunity is propagated throughout the culture.

What does this have to do with our work in the startup and data worlds? I'm passionate about this area because I truly believe it can change the world. The downside is that history shows the power inherent in any new communications tool is often abused by evil people. Why should what we're doing be any different? I've already had some of my visualizations used by 'White Power' groups to argue that the US is being taken over by Mexicans, thanks to a few border counties where Jose pops up as a common first name. In general I'm worried by the lack of scepticism about the truth behind the results of large-scale data analysis and visualization. My Facebook map was a Saturday afternoon effort with a paint program, the methodology would never survive peer review, but it still ended up getting discussed very seriously in influential publications. As long as a visualization has decent-looking production values, or an analysis claims to use a sufficiently large set of data, most people will take it at face value, no matter how murky its hidden foundations. You might call this the Freakonomics Effect; while their arguments are backed up with real evidence, a lot of people have copied the same form without putting in the hard work needed to be as sure of the conclusions.

What's vital is that we take responsibility for the effects of the tools we're creating. In practical terms that means relentlessly pushing journalists to improve their understanding and scepticism of our results, and taking a stand when we see bogus work being promoted. We need to have standards we expect reputable data scientists to adhere to, and demand enough information and reproducibility before we take data-backed claims seriously. Science polices itself thanks to an informal community structure, we need to learn lessons from that. If we don't, then our wonderful new tools will just end up being hijacked, and mislead instead of enlightening.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: