What makes a good data API?

Centaurskeleton
Picture by Victoria (Mouse World)

I’ve been working on a guide to data APIs, and making decisions about what to include has forced me to think about exactly what I look for. If you’re going to build an API that’s useful to a wide range of people, and will add value to the whole data ecosystem, here’s what you need.

  • Free, or self-service signup. Traditional commercial data agreements are designed for enterprise companies, so they’re very costly and time-consuming to experiment with. APIs that are either free or have a simple sign-up process make it a lot easier to get started.

  • Broad coverage. There’s been quite a few startups that build infrastructure, and hope that users will then populate it with data. Most of the time, this doesn’t happen, so you end up with APIs that look promising on the surface but actually contain very little useful data.

  • Online API or downloadable bulk data. Most of us now develop in the web world, so anything else requires a complex installation process that makes it much harder to try out.

  • Linked to outside entities. There has to be some way to look up information that ties the service’s data to the outside world. For example, the Twitter and Facebook APIs don’t qualify because you can only find users by internal identifiers, whereas LinkedIn does because you can look up accounts by their real-world names and locations.

The first three principles are just about ease of use, but having linkable data is essential if you’re going to allow developers to innovate by combining data sources. Once you’ve got an external reference point, we can join information to come up with insights you’d never expect.

Five short links

Paintedfive
Photo by Chris in Plymouth

The Linked Open Data cloud diagram – I disagree with the Linked Data philosophy, I think top-down, formal semantic approaches are a dead end, and believe RDF is the Devil’s Own Format. I can’t deny that the array of sources they’ve linked together is impressive though, and it’s beautifully presented here.

Taco Bell Programming – The hacker mentality can be an incredibly powerful tool for compressing days-long tasks into minutes, if you can just look at them from the right angle. Mmmm, Taco Bell….

The Perils of Kinder Surprise – I’m so glad we’re being protected from the dangers of small chocolate eggs with plastic toys inside. I never really liked them growing up in the UK, but it depresses me that here we’re paying border guards to seize an average of 25,000 of them every year. When Kinder Surprises are outlawed, only outlaws will have Kinder Surprises.

The Myth and Truth of the NYC Engineer Shortage – Hiring ‘A players’ doesn’t mean hiring people with the exact skills you need, or even experienced engineers. Hire for smarts and enthusiasm, give your experienced folks time to help them, and within a few months you’ll have productive employees. Even better, they’ll be cheaper, and more loyal than that hot-shot you keep dreaming of. Hire for the right mentality, and everything else will follow.

Elusive Forger, Giving but Never Stealing – My favorite character reading the Norse myths as a kid was always Loki the Trickster, so I find this story of a non-profit forger slipping his works into museum’s collections delightful.

Thoughts on London

London
Photo by Ian Brumpton

It's always strange going back to the country I grew up in. I spent my early academic and professional life there in a near-constant state of frustration, so it's hard for me to analyze it rationally. Bearing that in mind, here's some of the impressions I was left with after spending a week back in London.

High Finance. I was there to help out some startup friends, and their biggest problem was that big financial firms could easily outbid any early-stage startup for technical talent. If an experienced developer can get $500,000 a year, it takes a lot to lure them. This might sound great for developers, but only if it's a long-term, sustainable situation. My fear is that the current high levels of financial firm profits won't last, those jobs will vanish, and without a widespread startup culture there will be no good replacements. Felix Salmon did a great article on the problem of finance sucking up all the oxygen, and I think he's spot on. I don't have figures to back this up, but even New York with its massive finance industry feels like it has a lot more diversity to fall back on than London.

Deference. There's a real reluctance to give young punks responsibilities, and a separation between management and engineering. As a 25 year-old with big ideas, the difference between what I was allowed to do in British and American companies was amazing. I went from getting into trouble to being given pats on the back for coding outside of my assigned areas. I was included in management discussions, not kept in the dark. The conservative social system in the UK makes it tough to be flexible like that, and discourages a lot of troublemaking innovators.

Don't look! Keep your eyes focused on the ground five feet in front of you at all times when walking. I hadn't realized how much my habits had changed until I was wandering around London and wondering why everyone was bumping into each other. Do I walk like an American now?

Drip-feeding. European investors are complete wimps. They have the terrible habit of handing out investment in tiny chunks. This forces entrepreneurs to constantly be fund-raising, unable to plan more than a couple of months ahead. I talked to several startups with traction that would earn them millions in VC investment on the west coast, and they're all struggling with this issue. I'm not the only one to have spotted the problem, though Paul has different ideas on the cause.

Raw Potential. Despite all these criticisms, I met so many clever, motivated people and great startups. I'm just a tourist there these days, so my hat goes off to everyone working to make London the tech innovation hub it deserves to be.

Five short links

Numerofive
Photo by Francisco Nogueira

History of the English language – I knew the general outlines already, but there’s some fascinating details in here, especially the example of how the Lord’s Prayer would have been written at different times. The 1000 AD sample is unintelligible, but by 1384 it’s hard but readable. It also led me to discover that Illinois had a law on the books until the 1960s that the official language was American, not English. Makes sense to me.

CC San Francisco Salon – This looks like a stellar line-up of data folks for an informal discussion around openness in a data-driven world. I’m disappointed I can’t make it since I’m out of the country, but I’ll be checking out the video record of the event.

DataDay Austin – Texas has a cluster of cutting-edge data companies, and they’ve lined up an impressive day of training and talks. Folks from Infochimps, Google, 80legs and more will be there.

DataSets, Redistributable Data Sets – Delicious is still an essential tool for easily sharing resources, and I’m thankful that Julian and Peter are publishing their finds.

AsciiDoc – Why didn’t somebody tell me about this before? It’s an elegant little tool for taking the plain-text conventions we all use when creating READMEs, and formalizing them into a markup language that can be used to create everything from HTML to PDF and epub documents. I’ve been using Pages or Word to build books, and the boiler-plate formatting work was so time-consuming. This has made my latest project a breeze.

Five short links

Fiveballoon
Photo by Balloon Shop Enfield

DataSift – UK startup focused on making it easy to build your own tools on top of massive social media streams like the Twitter firehose. Seems a bit like Yahoo Pipes for social data, without the visual interface, and could open up the area to a much wider audience of developers.

The Doctor vs the Computer – A thousand-character limit on descriptions in medical records is so obviously arbitrary and unneeded, it hurts. Websites that have code to complain about spaces in credit-card numbers but somehow can't strip them out are bad enough, but here the bondage-and-discipline over-specification could kill people.

Trouble in the House of Google – Google's had massive success because they realized that inelegant statistical methods of detecting things like spam, plagiarism and relevance work a lot better than more elegant traditional semantic/AI techniques. Unfortunately, the black hats have figured out that there's no statistical technique in the world that can truly rate the quality of a page. Google's relying on statistical measures that used to correlate with that quality, but as the bad guys mimic those more closely, they are tricking the search engine into believing spam is the real thing. We need more inputs, whether that's a return of some kind of manual rating system, data from social networks or click-through rates.

Bike Accidents in Tucson – Exactly the sort of thing I built OpenHeatMap for. Collin Forbes is using it to help influence the debate about policing in his city.

This isn't a post about Facebook – Mourning the rise of a service that's a closed system, instead of the openness of Google. I'm not as pessimistic as Paul, I think that Facebook is demonstrating how much people want tools that reflect their off-line social world and behaviors, and once the open world absorbs that lesson, we'll see a new wave of competition for the social network. That competition will have to be more open in a technical sense, just because that's such a tempting way to get early traction.

Leave a trail of breadcrumbs

Breadcrumbtrail
Photo by Virelai

Maybe your purpose in life is to serve as an example to others of what not to do? That's a thought that actually cheers me up when I'm feeling down, because at least it adds some meaning to horrible experiences. I was thinking about that when I read about Jud's brush with personal disaster. Anyone searching on BPPV now has a detailed account of what went wrong and how he recovered. That may seem like a small thing, but for a handful of sufferers it will be information that helps them immensely. There's no theoretical limit on how long it could remain useful either – our great-great-grandkids could still be learning from his experiences.

We take economic growth for granted, but did you ever stop and think about what it actually means? Why should the same number of people be able to produce a few percent more for the same amount of effort, year after year, for centuries now? The secret is culture. As one person or organization discovers how to do more with less, that secret gets passed around and remembered collectively by humanity. Productivity is actually a massive series of niche lessons about what works and what doesn't. Our whole world is built on millenia of anecdotes like Jud's.

That's why the internet leaves me with so much hope for the future. Over my lifetime we've created an incredibly powerful way of transmitting our experiences to others who care. Even if there's only a handful of people in the world who might benefit from a particular insight, for very little effort you have a good chance of reaching them and improving their lives.

People ask me if they should blog or Twitter, and I tell them it won't make you money, it won't bring you fame, and in terms of the concrete returns, it's a waste of time. I still encourage them to do it though, because every true story is worth telling. For years despite low traffic I'd keep going because the search logs would tell me there were one or two people a day who found a solution to their problem thanks to a post I'd written. If you think about it, that's hundreds of people a year you can help, just by writing down a few of your experiences.

So, when you look at your life in 2011, ask yourself if you're leaving a trail of breadcrumbs? It might be the most effective way you can make the world a better place.

Five short links

Dinosaur2
Picture by John Gurche

The Battle of Towton – Archaeological detective work on a mass grave from Britain’s War of the Roses. The violence of the deaths is chilling, but I can’t help being impressed by the amount of information they’re able to infer from the remains. I think data science has a lot to learn from the humanities, when it comes to recovering insights from fragmentary, unstructured sources.

Firefighting with data – Lessons learned at the micro-level about what works when you’re building an integrated information system for firefighters. With eight data display devices already in their truck cab, they don’t need a ninth.

Earth Project aims to simulate everything – We’d be much better off finding targeted applications for the new data sets, this simulation idea seems like trying to boil the ocean.

You should blog even if you have no readers – “Consider anything else a side-benefit” is the key phrase. I’ve got hooked on writing because it’s like a workout for my brain.

The fine art of terrible lizards – I never realized ‘Paleo art’ even existed as a term. It may never hang in the Louvre, but it takes me back to the hours I spent entranced by my dinosaur books as a kid, and it’s somehow very heartening to know there’s a whole world of it available now.

How does Apple cure cash cow disease?

Applecow
Photo by Justine Otto

It's both easy and fun to be a critic of Microsoft and Google's failed innovations, but as so often in the tech world, Apple confounds most theories. Michael Dell advised Steve Jobs to close shop in 1997, to avoid wasting more of the shareholder's money. Later, MP3 players looked like a classic loss of focus. The mistakes MS and Google made are only obvious with hindsight, and any simple rule you apply to weed out Kins or Waves would also kill the iPod.

Microsoft's track record on innovation is terrible, and Google's is starting to look troubled, so what's Apple doing differently? The key word from the original article is 'discipline'. There's a legend that when Steve returned, he asked employees who ended up in an elevator with him "What do you do for Apple?". Anyone who couldn't come up with something convincing by the end of the ride was fired. Whether or not that's true, it sums up the company culture.

Apple builds products that people pay for, at 30% margins. If your project can't meet those tough criteria, even as an idea, no resources are wasted on it. There's obviously cases like Safari that appear not to fit, but do you notice how the OS folks beg and borrow everything they can from the open-source world? That's because they're incredibly small teams of engineers, focused on supporting the money-making divisions.

Apple's different because everything it does is focused on making a near-term profit. Demanding money for products gives a flow of hard information about what's working, something other big tech firms have lost. They make mistakes just like any other company, but there's a tight feedback loop that catches them early, and lets them take baby steps to execute on big visions.

Both Microsoft and Google's big successes came after years of massive investments with little visible progress, so they seem relaxed about money-losing projects. That leaves them deaf to the grumbles of the market, unslapped by the invisible hand. They don't have to stay grounded in reality. Apple are rigorous about demanding money from their users, and so they pay close attention to what people in the real world actually want. That's how you avoid catching cash cow disease.

Why do you weigh yourself?

Bigfatsanta
Photo by Phillip Fierlinger

I was recently having a discussion with Paul Kedrosky and Deva Hazarika about weight, and it reminded me of my own aversion to scales. I have literally no idea what I weigh, I've avoided them for over twenty years, and if I ever have to be weighed for a checkup, I don't look at what the result is. This is because numbers have power. If I start measuring my weight, then even small daily or weekly gains will make me anxious. The stress is counter-productive because it saps my morale and reduces my motivation to stay fit and eat well.

I'm a particularly obsessive person when it comes to goals, which is why I've learned to be careful what targets I use. Maybe other people find watching their weight helpful, but looking at the people I know well leaves me wondering if it works for anyone? If you think about it, nobody's end goal is to lose weight. What we actually want is to be healthier, fitter and sexier. There's better measures for all of those than raw mass, whether it's cholesterol levels, running times or just checking yourself in a mirror.

A quote I'll always remember from an undergraduate course is "You start off measuring what you value, and end up valuing what you measure". Any time that we reduce a complex situation down to a single number, it distorts every action we take. We judge results by that metric, so anything it doesn't capture gets neglected.

My sister worked at a call center for years, where they were measured primarily on how many calls an hour they dealt with. She told me her colleagues would hang up on long-winded customers – apparently if you do it while you're talking, people assume it was a glitch. The numbers looked great, despite the harm it was doing the business.

If you watch the stock market prices every day, you'll sell low and buy high. The emotional response to seeing the numbers fall gets hard to resist, as does the greed when they've been rising for weeks. What you actually care about is how much money you end up making, but the stock ticker doesn't capture that. It's why the average investor's returns are so much lower than average investment returns.

If you pay a programmer based on how many lines of code she produces, and you'll get incredibly verbose code with more room for bugs.

Is measuring your weight helping or hindering you from reaching your real goals?

What has San Francisco taught me?

Dogcart
Photo by Mezarc Buslon

Dogs can shop. I was amazed to see people casually wandering the local Safeway (Church and Market) with pet dogs. According to a worker, they're forbidden by law from asking for proof that a dog is a service animal, and customers take full advantage. I'm torn – as a dog lover thIs is incredibly convenient, but it seems like a really underhand way of achieving it. If I wasn't a canine fan, I'd probably be aghast. I haven't taken Thor shopping yet, but I have been taking advantage of other places' dog friendliness. Shotwell's beer bar in the Mission not only welcomes them, the bar staff even give out treats, so my evening dog walks have a strange tendency to head in that direction. 

Clippercard is hard. I've lost around $30 thanks to my inability to understand the nuances of the system. I swiped through the BART gate instead of the Muni one at Montgomery St, thought my card wasn't valid after I swiped and went into a negative balance on a Caltrain return trip so bought a paper ticket, swiped at a station before I realized there wasn't a train due in time and I'd have to take a taxi, amongst other mistakes. I'm starting to figure it all out now, but I'm still confused by the apparent lack of a way to top up your card at a station. I was excited by the idea of doing it online, until I found out it might take three days for the money to appear on my card. Maybe the autopay system is what I need?

I can parallel park. I never owned a car in the UK, and living in LA and Colorado I was spoilt by having a compact in a sea of parking lots, so I didn't have to learn how to squeeze my vehicle into a tiny curbside spot. That all changed here, I spent the last few weeks becoming a parallel parking ninja, and discovering the secrets of dealing with a car in the city. Almost all streets in my neighborhood are residential permit only over two hours, but there are a couple of back-streets that were open to all comers. Of course, you still have to move them weekly for street-cleaning, so it's hardly a painless solution. After spending the last month of navigating the system, I was very relieved to sell my car, it was such a source of stress, and the only way out looked like paying a few hundred bucks to rent a garage, which made no sense for something I rarely used.

Big umbrellas are obnoxious. I didn't own an umbrella when I moved here, so as I was loading up with furniture at Ikea, I picked one up. Big mistake. You could have camped under it, and while it kept me perfectly dry, it was very embarrassing to jostle through the downtown crowds knocking lesser umbrellas out of the way. I found a nano-umbrella that folds down to pocket size, and now feel much less of a jerk.

French food is good. I'm a bit late to the party on this one, I know. I've never gone out of my way to eat a French restaurants, the few I've been to have been long on pretensions and short on execution, very forgettable. A couple of weeks back, an old Apple colleague and I got into a conversation with another diner at a local sushi bar, and it turned out he was a restauranteur himself. My companion is an impressive foodie, so she grilled him on his favorite places. He went into rhapsodies over L'Ardoise, a traditional French bistro that's only a few blocks from my place. Intrigued, I tried it out and found myself in heaven. It's a tiny place, galley-style, the staff are entirely French, the atmosphere and service were very welcoming, the prices are reasonable, but most importantly, the food was a revelation. From light but rich soups, through salmon that melts under your fork, to coq au vin with a perfect sauce. Even their Creme Brulé isn't the stodgy pudding I remember, but a soft, creamy delight. I can't help myself, I keep going back. They have bar seating for when I'm on my own, which gives you a great view of the chefs preparing the meals behind the hatch and I've never needed a reservation.

Those blue boxes are for mail. It's a small thing, but I'd never lived anywhere without a mailbox in the front yard. For the first few days I left my outgoing mail in my apartment box, and couldn't figure out why they wouldn't take it. Finally I cornered our mail-lady, and she very patiently explained what those big blue boxes on most street corners are for.