How to use Yahoo’s Placemaker API to extract places from documents

Oldmap

Today I was lucky enough to hear Greg Cohn walk us through all the goodies Yahoo offers developers. I'm a big fan and heavy user of their Geoplanet geocoding API, so I was stoked to hear they'd just launched a service to recognize placenames in arbitrary HTML and XML documents. Why is this so interesting? Look at what Just Landed have done by searching for the words "Just landed in" in Twitter messages and then geocoding and visualizing the placenames. Placemaker makes it a lot simpler to build tools like this with anything that can be expressed as XML or HTML. That covers web pages, REST APIs like Twitters and even RSS feeds, so you can see why I'm excited!

I've put together a simple example that shows off how to use it as a bash script, tested on OS X. You can download it as geturlplaces.zip here, or I've included the source below. To use it, pass a web page address as the first argument, eg ./geturlplaces http://news.bbc.co.uk/

For production code you'll want a real XML parser rather than the regexs used below.

#!/bin/bash

# enter your Yahoo geo app id here – to obtain one go to http://developer.yahoo.com/wsregapp/index.php and register
# (interestingly as of May 20th 2009 it works with a bogus id!)
APPID=XXXXX

if [ $# -ne 1 ]
then
  echo "Extract a list of all the recognized place names from a web page using Yahoo's Placemaker API"
  echo "Usage: `basename $0` <web page url>"
  exit 65
fi

curl –silent -d "documentURL=$1&documentType=text/html&outputType=xml&appid=$APPID" "http://wherein.yahooapis.com/v1/document&quot; | grep '<text><\!\[CDATA\[' | sed 's/<text><\!\[CDATA\[//; \
s/\]\]><\/text>//' | sort | uniq

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: