Free bulk geocoding for US addresses

Mappins
Photo by Chris Blakeley

My goal with OpenHeatMap is to have the computer handle all of the messing around that’s usually required to load data into a GIS system. I want to accept anything that describes a location, rather than forcing users to spend endless time massaging their input data.

This is fairly straightforward with country names, zip codes, and even US county names, but I’ve struggled to find a good solution for turning addresses into latitude, longitude positions. All of the free APIs out there are either very accurate but have crippling limits on how often you can call them, or are unlimited but with very low precision. The going rate for commercial geocoding is $10 per thousand addresses, which ruled that right out!

Happily I’ve found a solution. Schuyler Erle and Jo Walsh created an open-source Perl module a few years ago called Geo-Coder-US. It uses the public-domain Tiger/Line data from the US census to look up American addresses. In my tests of the online version it was remarkably accurate (much better than OpenStreetMap’s Nominatim for example) though the authors warn that rural coverage is not as good. The only downside was that the actual database file to accompany the code was too large for the authors to host, so I had to spend some time digging around the census FTP site to find the right source files, download all 9 GB of them and then run the database creation which took several hours.

To save anyone else from having to go through the same struggle, I’ve uploaded a version of the project to github that contains the compiled database file. Be warned, the database is almost a gigabyte in size, so it’s not a quick download! You may also need to install Geo-Coder-US-1.00.tar.gz via cpan to grab all the dependencies. Once you have it, cd into the directory and try running

eg/lookup.pl “2543 Graystone Pl, Simi Valley, CA 93065”

You should see the following output:

“2543 Graystone Pl, Simi Valley, CA 93065”, 34.280874, -118.766207

You can either pass multiple addresses as command line arguments, or pipe a file to the script and it will treat each line as an address and output as CSV. The original authors also include a SOAP server script for Perl, so you could also run this as a web service. I’m going to be moving OpenHeatMap to using this, so look out for more accurate address locations, at least for American data.

A big thanks to Schuyler and Jo for making this code available in the first place, do keep them in mind for any location consulting work you might have.

One response

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: