Why we need an open-source geocoding alternative to Google

Photo by Marc Levin

You can't use Google's geocoding for anything but map display! I've always been surprised by how many services rely on the Google Maps API for general address to coordinate translation, despite it being prohibited unless you're displaying the results on one of their maps. Google have provide some fantastic resources for geo developers, they've moved the whole field forward, but we can't rely on them for everything. The recent changes to their terms of service have alerted a few people to this long-time issue, so here's the alternatives I've discovered over the years, and why I think you should look into open-source solutions.


The easiest change for an application developer is to use one of Yahoo's excellent geocoding APIs, either Placefinder for street addresses, or Placemaker for more unstructured names of places like towns, provinces or countries. There's no restrictions on how you use the data, and you get 50,000 requests a day. It has good coverage worldwide (though I recently noticed an issue with Finland).

The biggest downside is that they clearly have an uncertain future. Yahoo hasn't managed to monetize their awesome developer APIs, and most of the engineers involved in setting them up have left. It's nerve-wracking to build your application on an API that could disappear at any point!

Schuyler Erle

Schuyler is a one-man open-source-geocoding machine! He wrote the original Perl module for taking US Census data and looking up addresses, and also created an updated Ruby version for the Geocommons folks. I've found it works impressively well on US addresses. The biggest drawbacks are the requirement that you download and import many gigabytes of US census data before you can set it up on your own machine, and a lack of international coverage.


OpenStreetMap has created the Nominatim project for converting addresses into coordinates using its open-source collection of mapping information. Unfortunately it's way too logical for its own good, expecting to receive addresses that are strictly hierarchical. For example, it can't understand "40 Meadow Lane, Over, Cambridge CB24 5NF, United Kingdom", you have to mangle it to something unnatural like "40 Meadow Lane, Over, Cambridgeshire, England" before it starts to parse it, and even then it picks the wrong one as the first result. It also generally doesn't know where numbers fall on particular streets, since it relies on landmark points like pubs with numbers attached, and these are generally very sparse.

Data Science Toolkit

Since I couldn't find anything that met my needs, I decided to take a shot at pulling together a lot of the existing resources into a more convenient package. I took Schuyler's work on the TIGER/Line data for US addresses, and used some of the Nominatim backend code with a more flexible front-end to handle more postal addresses. I then rolled up a couple of virtual machine packages so you don't have to do the messy data importing yourself, so you can grab it as an Amazon AMI or a VMware image. You can get started using the main datasciencetoolkit.org site through the API too, but I wouldn't recommend it for heavy use since it's just a single machine.

Its main limitation is that it only handles US and UK addresses. The UK lookups are all done through OpenStreetMap data, so it should be possible to extend it worldwide given enough work, I just haven't been able to devote enough time to do that. I'd love to see someone extend the current code though, or improve a different project like Nominatim, or even start a whole new one. There's already enough data out there to build a truly open API for geocoding, so let's make it happen!

One response

  1. It is not only google that requires geocoding results from postal address geocoding to be displayed in the vendor’s map services. Pretty much everyone does it. That is how they make money.
    When you talk about open source, you mean specific license, right?

    I tried to work with the machines you posted and could launch them. Are they still in place.
    Also, Tigers database interpolates the results and gives 10-to 300 feet offset from the actual location. For many that is nothing but still.
    I work for postal address geocoding batch service called CSV2GeoData at https://csv2geo.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: