Today I was lucky enough to hear Greg Cohn walk us through all the goodies Yahoo offers developers. I'm a big fan and heavy user of their Geoplanet geocoding API, so I was stoked to hear they'd just launched a service to recognize placenames in arbitrary HTML and XML documents. Why is this so interesting? Look at what Just Landed have done by searching for the words "Just landed in" in Twitter messages and then geocoding and visualizing the placenames. Placemaker makes it a lot simpler to build tools like this with anything that can be expressed as XML or HTML. That covers web pages, REST APIs like Twitters and even RSS feeds, so you can see why I'm excited!
I've put together a simple example that shows off how to use it as a bash script, tested on OS X. You can download it as geturlplaces.zip here, or I've included the source below. To use it, pass a web page address as the first argument, eg ./geturlplaces http://news.bbc.co.uk/
For production code you'll want a real XML parser rather than the regexs used below.
#!/bin/bash
# enter your Yahoo geo app id here – to obtain one go to http://developer.yahoo.com/wsregapp/index.php and register
# (interestingly as of May 20th 2009 it works with a bogus id!)
APPID=XXXXX
if [ $# -ne 1 ]
then
echo "Extract a list of all the recognized place names from a web page using Yahoo's Placemaker API"
echo "Usage: `basename $0` <web page url>"
exit 65
fi
curl –silent -d "documentURL=$1&documentType=text/html&outputType=xml&appid=$APPID" "http://wherein.yahooapis.com/v1/document" | grep '<text><\!\[CDATA\[' | sed 's/<text><\!\[CDATA\[//; \
s/\]\]><\/text>//' | sort | uniq