Five short links

Letterv
Photo by Chris in Plymouth

Extraordinary Claims – An in-depth look at the methodology behind both Daryl Bem’s research claiming evidence of precognition, and the critical responses to it. I’m deeply sceptical that his claims are correct myself, but Peter clearly lays out how the critics are trying to change the rules to dismiss them, rather than having a fair fight. As the Climategate coverage shows, science isn’t just about getting the right answer. Like justice it has to be seen to be done, having a transparent and even-handed process for dealing with heretics is important.

25 Commandments for Journalists – I’ve been thinking a lot about ‘sensationalism’ in writing, engaging the reader and how to square that with truth, justice and the American Way (of journalism). It’s been one of the most controversial topics I’ve tackled, provoking some insightful push back from regulars like Emily Cunningham. This manifesto from Tim Radford articulates the British position far better than I’ve managed to so far, with key phrases like “Nobody has to read this crap” and “Words like ‘sensational’ and ‘trivial’ are not insults to a journalist”. The final commandment is the most important though, about the balancing act we all need to do.

25. Writers have a responsibility, not just in law. So aim for the truth. If that’s elusive, and it often is, at least aim for fairness, the awareness that there is always another side to the story. Beware of all claims to objectivity. This one is the dodgiest of all. You may report that the Royal Society says that genetic modification is a good thing, and that depleted uranium is mostly harmless. But you should remember that genetic modification was invented by people who were immediately elected to the Royal Society for their cleverness, by people already in there because they knew how to enrich uranium fuel rods and deplete the rest. So to paraphrase Miss Mandy Rice-Davies (1963) “They would say that, wouldn’t they?”

Cuatro años de ejecuciones en México (Four years of executions in Mexico) – This is exactly the sort of important story I hoped OpenHeatMap could help tell. A sobering read, even through a stuttering automatic translation.

Gremlin – A screencast introducing Gremlin, a Groovy-based graph processing language. Uses analysis of Grateful Dead playlists as the example, and makes dealing with graph traversal look easy. Thanks to Chris Diehl for the heads-up.

Wolfram Alpha’s API is free, but is it open? – Wolfram has assembled an awesome collection of knowledge and aims to make it ‘computable’, but their API only returns images and textual descriptions of their data. If we’re going to do more than just display supplemental search results to users, we’ll need a machine-readable version. Anybody know folks there that I can quiz about that? Email me if you do, thanks (or if you have any other thoughts too of course!).

Five short links

Eveportraits

Eve Online User-Generated Portraits – Just look at the quality of those pictures above – they’re all created by players using Eve Online’s character generation system. Back in 1997 I worked on a pool game with animated characters, and a generator we nicknamed the Barbie Fashion Show, but I never realized how far the technology had come since then. Just check out this video of the interface to see how easy it is to create amazing results. In a funny twist, one of my old colleagues from that pool game is now working for Eve out in Iceland as a senior designer.

Carpets for airports – A connoisseurs guide to airport flooring, revealing their secret meanings. The Da Vinci Code for carpets, with a funky flash interface.

Without adding context, a journalist with data can be dangerous – A fantastic example of something I’ve been struggling to get across to people. At the moment we’re incredibly susceptible to believing number’s people throw at us, in a way we wouldn’t with stories told in prose. As a society, we need to wise up and develop enough savvy to build an immune system to this sort of manipulation, and part of that has to be calling out distortions like the one David pounces on.

Lighting the dark continent – Africa’s lack of development can seem staggering, but as Jon points out, it’s also a massive opportunity.

The Quantified Self Conference – I didn’t know there was a grass-roots movement around this idea, I thought it would be something driven by the product folks, but it makes sense there’s people interested in instrumenting their lives. This looks like a great opportunity to get feedback if you are in the business of offering solutions around this area.

Spotlight your startup at Strata

Spotlight
Photo by Bryan Stevenson

Are you a data startup who'd love to be at Strata but can't afford the admission? You now have a chance to attend the conference and show off what you've been building, thanks to the Strata Startup Showcase. There's space for fifteen startups, and successful companies will be given two free passes and five minutes to show off their work in front of investors. It's a great opportunity, but the deadline for admissions is Friday, so you'll need to be quick.

Don't forget the free Big Data Camp Unconference on the Monday before the main event too, the price is specially tailored for starving entrepreneurs' wallets.

What makes a good data API?

Centaurskeleton
Picture by Victoria (Mouse World)

I’ve been working on a guide to data APIs, and making decisions about what to include has forced me to think about exactly what I look for. If you’re going to build an API that’s useful to a wide range of people, and will add value to the whole data ecosystem, here’s what you need.

  • Free, or self-service signup. Traditional commercial data agreements are designed for enterprise companies, so they’re very costly and time-consuming to experiment with. APIs that are either free or have a simple sign-up process make it a lot easier to get started.

  • Broad coverage. There’s been quite a few startups that build infrastructure, and hope that users will then populate it with data. Most of the time, this doesn’t happen, so you end up with APIs that look promising on the surface but actually contain very little useful data.

  • Online API or downloadable bulk data. Most of us now develop in the web world, so anything else requires a complex installation process that makes it much harder to try out.

  • Linked to outside entities. There has to be some way to look up information that ties the service’s data to the outside world. For example, the Twitter and Facebook APIs don’t qualify because you can only find users by internal identifiers, whereas LinkedIn does because you can look up accounts by their real-world names and locations.

The first three principles are just about ease of use, but having linkable data is essential if you’re going to allow developers to innovate by combining data sources. Once you’ve got an external reference point, we can join information to come up with insights you’d never expect.

Five short links

Paintedfive
Photo by Chris in Plymouth

The Linked Open Data cloud diagram – I disagree with the Linked Data philosophy, I think top-down, formal semantic approaches are a dead end, and believe RDF is the Devil’s Own Format. I can’t deny that the array of sources they’ve linked together is impressive though, and it’s beautifully presented here.

Taco Bell Programming – The hacker mentality can be an incredibly powerful tool for compressing days-long tasks into minutes, if you can just look at them from the right angle. Mmmm, Taco Bell….

The Perils of Kinder Surprise – I’m so glad we’re being protected from the dangers of small chocolate eggs with plastic toys inside. I never really liked them growing up in the UK, but it depresses me that here we’re paying border guards to seize an average of 25,000 of them every year. When Kinder Surprises are outlawed, only outlaws will have Kinder Surprises.

The Myth and Truth of the NYC Engineer Shortage – Hiring ‘A players’ doesn’t mean hiring people with the exact skills you need, or even experienced engineers. Hire for smarts and enthusiasm, give your experienced folks time to help them, and within a few months you’ll have productive employees. Even better, they’ll be cheaper, and more loyal than that hot-shot you keep dreaming of. Hire for the right mentality, and everything else will follow.

Elusive Forger, Giving but Never Stealing – My favorite character reading the Norse myths as a kid was always Loki the Trickster, so I find this story of a non-profit forger slipping his works into museum’s collections delightful.

Thoughts on London

London
Photo by Ian Brumpton

It's always strange going back to the country I grew up in. I spent my early academic and professional life there in a near-constant state of frustration, so it's hard for me to analyze it rationally. Bearing that in mind, here's some of the impressions I was left with after spending a week back in London.

High Finance. I was there to help out some startup friends, and their biggest problem was that big financial firms could easily outbid any early-stage startup for technical talent. If an experienced developer can get $500,000 a year, it takes a lot to lure them. This might sound great for developers, but only if it's a long-term, sustainable situation. My fear is that the current high levels of financial firm profits won't last, those jobs will vanish, and without a widespread startup culture there will be no good replacements. Felix Salmon did a great article on the problem of finance sucking up all the oxygen, and I think he's spot on. I don't have figures to back this up, but even New York with its massive finance industry feels like it has a lot more diversity to fall back on than London.

Deference. There's a real reluctance to give young punks responsibilities, and a separation between management and engineering. As a 25 year-old with big ideas, the difference between what I was allowed to do in British and American companies was amazing. I went from getting into trouble to being given pats on the back for coding outside of my assigned areas. I was included in management discussions, not kept in the dark. The conservative social system in the UK makes it tough to be flexible like that, and discourages a lot of troublemaking innovators.

Don't look! Keep your eyes focused on the ground five feet in front of you at all times when walking. I hadn't realized how much my habits had changed until I was wandering around London and wondering why everyone was bumping into each other. Do I walk like an American now?

Drip-feeding. European investors are complete wimps. They have the terrible habit of handing out investment in tiny chunks. This forces entrepreneurs to constantly be fund-raising, unable to plan more than a couple of months ahead. I talked to several startups with traction that would earn them millions in VC investment on the west coast, and they're all struggling with this issue. I'm not the only one to have spotted the problem, though Paul has different ideas on the cause.

Raw Potential. Despite all these criticisms, I met so many clever, motivated people and great startups. I'm just a tourist there these days, so my hat goes off to everyone working to make London the tech innovation hub it deserves to be.

Five short links

Numerofive
Photo by Francisco Nogueira

History of the English language – I knew the general outlines already, but there’s some fascinating details in here, especially the example of how the Lord’s Prayer would have been written at different times. The 1000 AD sample is unintelligible, but by 1384 it’s hard but readable. It also led me to discover that Illinois had a law on the books until the 1960s that the official language was American, not English. Makes sense to me.

CC San Francisco Salon – This looks like a stellar line-up of data folks for an informal discussion around openness in a data-driven world. I’m disappointed I can’t make it since I’m out of the country, but I’ll be checking out the video record of the event.

DataDay Austin – Texas has a cluster of cutting-edge data companies, and they’ve lined up an impressive day of training and talks. Folks from Infochimps, Google, 80legs and more will be there.

DataSets, Redistributable Data Sets – Delicious is still an essential tool for easily sharing resources, and I’m thankful that Julian and Peter are publishing their finds.

AsciiDoc – Why didn’t somebody tell me about this before? It’s an elegant little tool for taking the plain-text conventions we all use when creating READMEs, and formalizing them into a markup language that can be used to create everything from HTML to PDF and epub documents. I’ve been using Pages or Word to build books, and the boiler-plate formatting work was so time-consuming. This has made my latest project a breeze.

Five short links

Fiveballoon
Photo by Balloon Shop Enfield

DataSift – UK startup focused on making it easy to build your own tools on top of massive social media streams like the Twitter firehose. Seems a bit like Yahoo Pipes for social data, without the visual interface, and could open up the area to a much wider audience of developers.

The Doctor vs the Computer – A thousand-character limit on descriptions in medical records is so obviously arbitrary and unneeded, it hurts. Websites that have code to complain about spaces in credit-card numbers but somehow can't strip them out are bad enough, but here the bondage-and-discipline over-specification could kill people.

Trouble in the House of Google – Google's had massive success because they realized that inelegant statistical methods of detecting things like spam, plagiarism and relevance work a lot better than more elegant traditional semantic/AI techniques. Unfortunately, the black hats have figured out that there's no statistical technique in the world that can truly rate the quality of a page. Google's relying on statistical measures that used to correlate with that quality, but as the bad guys mimic those more closely, they are tricking the search engine into believing spam is the real thing. We need more inputs, whether that's a return of some kind of manual rating system, data from social networks or click-through rates.

Bike Accidents in Tucson – Exactly the sort of thing I built OpenHeatMap for. Collin Forbes is using it to help influence the debate about policing in his city.

This isn't a post about Facebook – Mourning the rise of a service that's a closed system, instead of the openness of Google. I'm not as pessimistic as Paul, I think that Facebook is demonstrating how much people want tools that reflect their off-line social world and behaviors, and once the open world absorbs that lesson, we'll see a new wave of competition for the social network. That competition will have to be more open in a technical sense, just because that's such a tempting way to get early traction.

Leave a trail of breadcrumbs

Breadcrumbtrail
Photo by Virelai

Maybe your purpose in life is to serve as an example to others of what not to do? That's a thought that actually cheers me up when I'm feeling down, because at least it adds some meaning to horrible experiences. I was thinking about that when I read about Jud's brush with personal disaster. Anyone searching on BPPV now has a detailed account of what went wrong and how he recovered. That may seem like a small thing, but for a handful of sufferers it will be information that helps them immensely. There's no theoretical limit on how long it could remain useful either – our great-great-grandkids could still be learning from his experiences.

We take economic growth for granted, but did you ever stop and think about what it actually means? Why should the same number of people be able to produce a few percent more for the same amount of effort, year after year, for centuries now? The secret is culture. As one person or organization discovers how to do more with less, that secret gets passed around and remembered collectively by humanity. Productivity is actually a massive series of niche lessons about what works and what doesn't. Our whole world is built on millenia of anecdotes like Jud's.

That's why the internet leaves me with so much hope for the future. Over my lifetime we've created an incredibly powerful way of transmitting our experiences to others who care. Even if there's only a handful of people in the world who might benefit from a particular insight, for very little effort you have a good chance of reaching them and improving their lives.

People ask me if they should blog or Twitter, and I tell them it won't make you money, it won't bring you fame, and in terms of the concrete returns, it's a waste of time. I still encourage them to do it though, because every true story is worth telling. For years despite low traffic I'd keep going because the search logs would tell me there were one or two people a day who found a solution to their problem thanks to a post I'd written. If you think about it, that's hundreds of people a year you can help, just by writing down a few of your experiences.

So, when you look at your life in 2011, ask yourself if you're leaving a trail of breadcrumbs? It might be the most effective way you can make the world a better place.

Five short links

Dinosaur2
Picture by John Gurche

The Battle of Towton – Archaeological detective work on a mass grave from Britain’s War of the Roses. The violence of the deaths is chilling, but I can’t help being impressed by the amount of information they’re able to infer from the remains. I think data science has a lot to learn from the humanities, when it comes to recovering insights from fragmentary, unstructured sources.

Firefighting with data – Lessons learned at the micro-level about what works when you’re building an integrated information system for firefighters. With eight data display devices already in their truck cab, they don’t need a ninth.

Earth Project aims to simulate everything – We’d be much better off finding targeted applications for the new data sets, this simulation idea seems like trying to boil the ocean.

You should blog even if you have no readers – “Consider anything else a side-benefit” is the key phrase. I’ve got hooked on writing because it’s like a workout for my brain.

The fine art of terrible lizards – I never realized ‘Paleo art’ even existed as a term. It may never hang in the Louvre, but it takes me back to the hours I spent entranced by my dinosaur books as a kid, and it’s somehow very heartening to know there’s a whole world of it available now.