What is data journalism?

Photo by Ian S

If I was the Lord High Dictator of English Usage, the very first term I'd ban is NoSQL, since all it does is enrage potential sympathizers whilst failing to accurately summarize what the new wave of database technologies have in common. Right after that though, I'd turn my beady eye towards 'data journalism', and give it a good hard stare.

The term actually has a long and illustrious history, if you consider it a synonym for database journalism. The trouble is, every modern journalist is using databases constantly, even if it's just via LexisNexis or Google searches. This makes the original term so broad to be meaningless.

I'd prefer to reserve the name for some of the interesting and unique work that's been emerging, work that's driven by a lot of the same shifts that have propelled Big Data into prominence. Here's the characteristics that I think true data journalism stories should possess:

The data is a lead protagonist. It's common for trend stories to reach for a few statistics to back up a pre-determined conclusion, but this use of data as a Greek chorus is seldom very enlightening or rigorous, as it lends itself to cherry picking. I'm much more interested when the data is treated as an interview subject by the journalist, asked questions and the answers are allowed to drive the story's conclusion. With the Wikileaks dumps, the data is the lead character in most of the reports, and it's unearthed some unexpected results.

The source material is public. If you quote a named source to back up your story, then anyone who wants to check whether you're distorting them can go back and talk to that person. With data-driven journalism, the only way to keep reporters honest and enable a real debate is to make the original information you base your conclusions on publicly available. Otherwise it's like using unnamed sources, you require a leap of faith from the reader that you're not being mislead. The Guardian is a shining example of how easy this can be, but the New York Times consistently refuses to release copies of the original source documents they base their stories on.

There's real detective work involved. Is reporting on the unemployment rate or stock market data journalism? Almost always it's just repeating a pre-digested number, with a hand-wavey explanation throwing in for good measure – "The stock market was up today because of <random correlation>". What I love instead is when a reporter is clever about finding unusual data sources or powerful tools to uncover new information, often that was hidden in plain sight. My favorite recent example is Marshall Kirkpatrick's use of Needlebase to uncover information on Twitter's new data center, just by analyzing public Tweets from their employees.

On second thoughts, forget about changing the name. These principles are going to win out because they lead to more interesting and trustworthy stories, no matter what you call the genre. So let's just call it plain old good journalism instead.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: