Five short links

Picture by John Gurche

The Battle of Towton – Archaeological detective work on a mass grave from Britain’s War of the Roses. The violence of the deaths is chilling, but I can’t help being impressed by the amount of information they’re able to infer from the remains. I think data science has a lot to learn from the humanities, when it comes to recovering insights from fragmentary, unstructured sources.

Firefighting with data – Lessons learned at the micro-level about what works when you’re building an integrated information system for firefighters. With eight data display devices already in their truck cab, they don’t need a ninth.

Earth Project aims to simulate everything – We’d be much better off finding targeted applications for the new data sets, this simulation idea seems like trying to boil the ocean.

You should blog even if you have no readers – “Consider anything else a side-benefit” is the key phrase. I’ve got hooked on writing because it’s like a workout for my brain.

The fine art of terrible lizards – I never realized ‘Paleo art’ even existed as a term. It may never hang in the Louvre, but it takes me back to the hours I spent entranced by my dinosaur books as a kid, and it’s somehow very heartening to know there’s a whole world of it available now.

How does Apple cure cash cow disease?

Photo by Justine Otto

It's both easy and fun to be a critic of Microsoft and Google's failed innovations, but as so often in the tech world, Apple confounds most theories. Michael Dell advised Steve Jobs to close shop in 1997, to avoid wasting more of the shareholder's money. Later, MP3 players looked like a classic loss of focus. The mistakes MS and Google made are only obvious with hindsight, and any simple rule you apply to weed out Kins or Waves would also kill the iPod.

Microsoft's track record on innovation is terrible, and Google's is starting to look troubled, so what's Apple doing differently? The key word from the original article is 'discipline'. There's a legend that when Steve returned, he asked employees who ended up in an elevator with him "What do you do for Apple?". Anyone who couldn't come up with something convincing by the end of the ride was fired. Whether or not that's true, it sums up the company culture.

Apple builds products that people pay for, at 30% margins. If your project can't meet those tough criteria, even as an idea, no resources are wasted on it. There's obviously cases like Safari that appear not to fit, but do you notice how the OS folks beg and borrow everything they can from the open-source world? That's because they're incredibly small teams of engineers, focused on supporting the money-making divisions.

Apple's different because everything it does is focused on making a near-term profit. Demanding money for products gives a flow of hard information about what's working, something other big tech firms have lost. They make mistakes just like any other company, but there's a tight feedback loop that catches them early, and lets them take baby steps to execute on big visions.

Both Microsoft and Google's big successes came after years of massive investments with little visible progress, so they seem relaxed about money-losing projects. That leaves them deaf to the grumbles of the market, unslapped by the invisible hand. They don't have to stay grounded in reality. Apple are rigorous about demanding money from their users, and so they pay close attention to what people in the real world actually want. That's how you avoid catching cash cow disease.

Why do you weigh yourself?

Photo by Phillip Fierlinger

I was recently having a discussion with Paul Kedrosky and Deva Hazarika about weight, and it reminded me of my own aversion to scales. I have literally no idea what I weigh, I've avoided them for over twenty years, and if I ever have to be weighed for a checkup, I don't look at what the result is. This is because numbers have power. If I start measuring my weight, then even small daily or weekly gains will make me anxious. The stress is counter-productive because it saps my morale and reduces my motivation to stay fit and eat well.

I'm a particularly obsessive person when it comes to goals, which is why I've learned to be careful what targets I use. Maybe other people find watching their weight helpful, but looking at the people I know well leaves me wondering if it works for anyone? If you think about it, nobody's end goal is to lose weight. What we actually want is to be healthier, fitter and sexier. There's better measures for all of those than raw mass, whether it's cholesterol levels, running times or just checking yourself in a mirror.

A quote I'll always remember from an undergraduate course is "You start off measuring what you value, and end up valuing what you measure". Any time that we reduce a complex situation down to a single number, it distorts every action we take. We judge results by that metric, so anything it doesn't capture gets neglected.

My sister worked at a call center for years, where they were measured primarily on how many calls an hour they dealt with. She told me her colleagues would hang up on long-winded customers – apparently if you do it while you're talking, people assume it was a glitch. The numbers looked great, despite the harm it was doing the business.

If you watch the stock market prices every day, you'll sell low and buy high. The emotional response to seeing the numbers fall gets hard to resist, as does the greed when they've been rising for weeks. What you actually care about is how much money you end up making, but the stock ticker doesn't capture that. It's why the average investor's returns are so much lower than average investment returns.

If you pay a programmer based on how many lines of code she produces, and you'll get incredibly verbose code with more room for bugs.

Is measuring your weight helping or hindering you from reaching your real goals?

What has San Francisco taught me?

Photo by Mezarc Buslon

Dogs can shop. I was amazed to see people casually wandering the local Safeway (Church and Market) with pet dogs. According to a worker, they're forbidden by law from asking for proof that a dog is a service animal, and customers take full advantage. I'm torn – as a dog lover thIs is incredibly convenient, but it seems like a really underhand way of achieving it. If I wasn't a canine fan, I'd probably be aghast. I haven't taken Thor shopping yet, but I have been taking advantage of other places' dog friendliness. Shotwell's beer bar in the Mission not only welcomes them, the bar staff even give out treats, so my evening dog walks have a strange tendency to head in that direction. 

Clippercard is hard. I've lost around $30 thanks to my inability to understand the nuances of the system. I swiped through the BART gate instead of the Muni one at Montgomery St, thought my card wasn't valid after I swiped and went into a negative balance on a Caltrain return trip so bought a paper ticket, swiped at a station before I realized there wasn't a train due in time and I'd have to take a taxi, amongst other mistakes. I'm starting to figure it all out now, but I'm still confused by the apparent lack of a way to top up your card at a station. I was excited by the idea of doing it online, until I found out it might take three days for the money to appear on my card. Maybe the autopay system is what I need?

I can parallel park. I never owned a car in the UK, and living in LA and Colorado I was spoilt by having a compact in a sea of parking lots, so I didn't have to learn how to squeeze my vehicle into a tiny curbside spot. That all changed here, I spent the last few weeks becoming a parallel parking ninja, and discovering the secrets of dealing with a car in the city. Almost all streets in my neighborhood are residential permit only over two hours, but there are a couple of back-streets that were open to all comers. Of course, you still have to move them weekly for street-cleaning, so it's hardly a painless solution. After spending the last month of navigating the system, I was very relieved to sell my car, it was such a source of stress, and the only way out looked like paying a few hundred bucks to rent a garage, which made no sense for something I rarely used.

Big umbrellas are obnoxious. I didn't own an umbrella when I moved here, so as I was loading up with furniture at Ikea, I picked one up. Big mistake. You could have camped under it, and while it kept me perfectly dry, it was very embarrassing to jostle through the downtown crowds knocking lesser umbrellas out of the way. I found a nano-umbrella that folds down to pocket size, and now feel much less of a jerk.

French food is good. I'm a bit late to the party on this one, I know. I've never gone out of my way to eat a French restaurants, the few I've been to have been long on pretensions and short on execution, very forgettable. A couple of weeks back, an old Apple colleague and I got into a conversation with another diner at a local sushi bar, and it turned out he was a restauranteur himself. My companion is an impressive foodie, so she grilled him on his favorite places. He went into rhapsodies over L'Ardoise, a traditional French bistro that's only a few blocks from my place. Intrigued, I tried it out and found myself in heaven. It's a tiny place, galley-style, the staff are entirely French, the atmosphere and service were very welcoming, the prices are reasonable, but most importantly, the food was a revelation. From light but rich soups, through salmon that melts under your fork, to coq au vin with a perfect sauce. Even their Creme Brulé isn't the stodgy pudding I remember, but a soft, creamy delight. I can't help myself, I keep going back. They have bar seating for when I'm on my own, which gives you a great view of the chefs preparing the meals behind the hatch and I've never needed a reservation.

Those blue boxes are for mail. It's a small thing, but I'd never lived anywhere without a mailbox in the front yard. For the first few days I left my outgoing mail in my apartment box, and couldn't figure out why they wouldn't take it. Finally I cornered our mail-lady, and she very patiently explained what those big blue boxes on most street corners are for.

Five short links

Picture by Hideji Terada

Locksmiths and perception of value – If you don't understand what somebody's doing, then you judge them on how hard they make their job look. In software engineering I used to think of this as the acrobats vs gymnasts problem. Some programmers put on a massive performance to make the most pedestrian feats look as impressive as possible, like acrobats at the circus. Often the best engineers make the miraculous look easy, like Olympic gymnasts. If you're not careful as a manager, you'll reward the acrobats and neglect the gymnasts.

Glitch art – I've always found the random results of computer errors compelling and thought-provoking on a visual level, ever since I was old enough to wave a magnet near my CRT TV.

The wonder of the universe – I failed to realize how much the estimate of the number of stars in the universe has grown in the last couple of decades. "200 sextillion stars, a 2 followed by 23 zeroes" is the latest, which has to be good news for Drake's equation, even if almost all of them are out-of-reach.

Some problems with Mexican mortality – Another great piece from Diego, showing that the victims of the Acteal Massacre were listed as 'accidental deaths' in the government statistics. A good reminder that we often accept numbers as hard facts, when you always need to know the rigor of the process that produced them to judge.

SchoolView – The interface is a bit tricky to navigate at first, but once you're in this is actually a compelling visualization of schools' quality across Colorado. Even more impressively, this is from the public sector, using Oracle fusion, according to a page I don't think they realize they're exposing.

Strata Rejects goes legit

Photo by Ronayne

Mad scientists of the world rejoice, Edd and the other Strata folks liked the idea of a Rejects session so much, it's now part of the official pre-conference program. It will now be starting at 6:15pm at the hotel, as one of the first sections of the BigDataCamp unconference on the 31st, the day before the full show. The only downside is that we now have a hard time limit, so my original plan of running talks until the audience walked out has been scrapped. We still have some space left in the hour, and there's likely to be one or two slots open due to travel delays, but email me now if you want to avoid being rejected from the rejects! The whole BigDataCamp is going to be a lot of fun, so make sure you sign up for it, and you'll also get a 25% discount on the main event.

The wonderful data team at LinkedIn have even offered to help with liquid refreshments, so I know it's going to be a blast. Bring your lab coats, bubbling flasks and grudges against the uncomprehending world.

Five short links

Picture by Mike Lay

So THIS Is How Bloomberg Gets Earnings Reports Hours Before They’re Publicly Released… – Remember kids, anything you post to a web server is accessible, even before you link to it. I’m still aghast that Apache’s default behavior is serving up directory listings for folders with no index, which makes this sort of thing even easier.

Law and the Multiverse – Real lawyers deal with the implications of imaginary superheroes. The beauty of this site is how much depth and rigor the participants bring to the problems.

MapEveryBit – Early-stage but interesting tool for mapping your social network across Twitter and Facebook.

Visualizations to show causality – I like this exploration of graphing techniques to explain cause-and-effect. Animated and interactive graphs can be a lot more than toys or eye-candy.

Lenana – Tales from the front-lines of education in Tanzania. The statistics are powerful (only a third of the kids are in school now, but that’s up from just 12% a few years ago) but I was really affected by the stories of kids like Nawasa. The practical steps they’re taking to solve their problems are heartening, like the Empowered Girls Club they’ve set up.

Bad writing is good


Last night I ended up at Shotwell's with Mike Melanson, and we spent quite a lot of our time talking about journalism. He's a professionally-trained reporter with a masters degree, but the sheer pace of blogging at ReadWriteWeb means a lot of that education is not directly applicable. I'm not saying his standards have lowered, but producing twenty articles a week requires a whole different approach to writing than a traditional US newspaper article.

The discussion reminded me of an article defending Stieg Larsson's books against the critics complaining that they're crowded out high literature. Laura Miller makes the point that literary books are demanding to read, they require us to put in effort to understand unfamiliar ways of expressing ideas and emotions. That effort is rewarded by the revelations and sense of wonder you can only get from challenging works, but sometimes we don't have the energy left to tackle a tough read. Bad writing is often more enjoyable, because clichés, genre conventions and predictable plots all help a book 'flow' more smoothly. They demand a lot less from the reader. That got me thinking about how I approach producing five or six posts a week.

American journalism is built on the assumption that reporters are providing a public service, and the top priority is communicating important truths to their readership. In turn, the readers are expected to be engaged and curious, willing to put in some effort to understand a complex story. This is a worthy goal, but leads to some painfully dry writing.

In contrast, the only value British newspapers hold sacred is entertainment. Even the serious newspapers go out of their way to avoid boring their readers, and the tabloids are full-blown three-ring circuses of populism, happy to publish blatant lies or fan prejudice in the pursuit of higher circulation numbers. I'm sure that sounds like a nightmare to American reporters, but somehow it works, producing a better-informed readership than the US model.

That background leaves me very comfortable with the blogging approach to news. We still need traditional in-depth newspaper articles, but the popularity of blog-like news sites with off-the-cuff writing styles, liberal use of clichés and a willingness to publish before all the facts are in, shows that there was an unmet demand for digestible stories. I'm not saying we should emulate the dark side of the British tabloids, but we need to understand that journalism is writing for a purpose, and it sometimes requires embracing the tools that bad writers rely on.

Don't expect the public to read you because what you're writing is important, just grab them by the throat by using every cheap trick at your disposal, from sensational, teaser headlines to hyperbole and synthesized conflict within the article. If the story is worth telling, you'll be doing more good than harm by reaching more readers.

Correlation, Causation and Thor’s Raincoat


My dog Thor hates getting wet, but even when there's rain lashing against the windows he still starts off dancing in circles when it's time for his walk. It's only when I pull out his yellow rain jacket that he slumps and stares at me mournfully. He seems convinced that if I just left the jacket off, the rain would go away.

Much as I try and convince him of the error in his logic, he's unmoved, and it's hard to blame him. Humans will happily swallow studies that use the weasel word 'link' to claim something that is associated with an outcome is its cause. Does obesity spread through your friendships? No, you just share the same risk-factors as your friends.

As the Big Data revolution gives us more and more data to play with, we'll find many more suggestive correlations like these appearing. Our whole mental architecture is about seeing meaningful patterns, even if we're staring at random phenomena like clouds in the sky. How much these mirages matter depends on how we want to apply them –


Sometimes you don't care about whether something causes an outcome, you just want some early warning of what that outcome will be. Thor knows it will be wet when he sees the raincoat. The main danger is that the two variables aren't actually dependent in any way, they just happen to have been moving in an apparently synchronized way recently. The more variables you have to compare the more likely these sort of false correlations are, so expect a lot of them with Big Data.

If you're going to rely on a correlation to predict outcomes, you need at least a plausible story for the mechanism behind the correlation, and ideally multiple independent data sets that back it up.


If I notice I'm struggling to get Thor out of the front door, then maybe I'll hide the rain jacket until we're in the porch. Thor's resistance means that he can no longer use the raincoat as a reliable signal of rain. The only reason to make a prediction is to take some action, and those actions may destroy the correlation. This is a painfully common problem in economics, and is usually expressed as Goodhart's Law: "once a social or economic indicator or other surrogate measure is made a target for the purpose of conducting social or economic policy, then it will lose the information content that would qualify it to play such a role".

This means that even if you've found a correlation with predictive power, you have to constantly measure its effectiveness, since the very act of relying on it as a guide may degrade its usefulness.


I half-expect to get up one morning and discover that Thor's eaten the raincoat, in the hope of bringing back the sun. Once we notice a correlation, it's easy to convince ourselves and others that it's actually causing the outcome. Humans love stories, and stories have their own rules. If X happens, followed by Y, narrative logic requires that X caused Y. That makes it simple to persuade other people that they're seeing cause and effect rather than correlation, without really having to prove it.

The only reliable way I know to figure out whether you really can affect outcomes is by experiment, so before you put time or money behind an attempt, require a prototype. If the guy or girl trying to persuade you to take action can't show a small-scale proof-of-concept, then they might as well be trying to sell you a bridge in Brooklyn. Even if they show compelling results, the Hawthorne Effect may be kicking in, but at least you've got some weak proof.

If you want to be effective in a world awash with data, it pays to be skeptical of correlations, since you'll be seeing a lot more of them over the next few years.