Why is open geocoding important?

Photo by Werner Kunz

A few years ago I had what I thought was a simple problem. I had a bunch of place names, and I needed to turn them into latitude and longitude coordinates. To my surprise, it turned out to be extremely hard. Google has an excellent geocoder, but you're only allowed to use it for data you're displaying on Google maps, and there are rate limits and charges if you use it in bulk. Yahoo has an excellent array of geo APIs with much better conditions, but there are still rate limits and their future was in doubt even then!

So, I ended up hacking up my own very basic solution based on open data. It turned out to be a fascinating problem, one you could spend a lifetime on, trying to draw a usable, detailed picture of the world from freely available data. I bulked up the underlying data and algorithms, and it became the core of the Data Science Toolkit. Turning addresses into coordinates may sound like a strange obsession, but it has become my white whale.

There are some folks who agree that this is an important problem, but I've been surprised there aren't more. Placenames describe our world, and we need an open and democratic way for machines to interpret them. Almost any application that uses locations needs to do this operation, and right now we have no alternative to commercial systems.

What are the practical impacts of this? We've got no control over what our neighborhoods are called, or how they're defined. We can't fix problems in the data that impact us, like correcting the location of our address so that delivery drivers can find us. We can't build applications that take in large amounts of address data unless we can afford high fees, which cuts out a whole bunch of interesting projects.

This is on my mind because I'm making another attack on improving the DSTK solution. I've already added a lot of international postal codes thanks to GeoNames, but next I want to combine the public domain SimpleGeo point-of-interest dump with OpenStreetMap data to see if I can synthesize more addressable ranges for at least some more countries. That will be an interesting challenge, but if I get some usable it opens the door to adding more coverage through any open data set that combines street addresses and coordinates. I can't wait to see where this takes me!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: