Where am I, who am I?


"Queequeg was a native of Rokovoko, an island far away to the West and South. It is not down in any map; true places never are."

Where am I right now? Depending on who I'm talking to, I'm in SoMa, San Francisco, South Park, the City, or the Bay Area. What neighborhood is my apartment in? Craigslist had it down as Castro when it was listed. Long-time locals often describe it as Duboce Triangle, but people less concerned with fine differences lump it into the Lower Haight, since I'm only two blocks from Haight Street.

When I first started working with geographic data, I imagined this was a problem to be solved. There had to be a way to cut through the confusion and find a true definition, a clear answer to the question of "Where am I?".

What I've come to realize over the last few years is that geography is a folksonomy. Sure, there's political boundaries, but the only ones that people pay much attention to are states and countries. City limits don't have much effect on people's descriptions of where they live. Just take a look at this map of Los Angeles' official boundaries:


There's clearly little correlation between the legal city boundaries and how people describe the place that they live. You could argue that Los Angeles County is the correct region to use, but then people way out in the desert by Littlerock would be included!

The arbitrary and human nature of places is even more pronounced with neighborhoods. As I showed above, there's a surprising amount of consensus on the names of the neighborhoods, but almost none on their boundaries.

Why do I care about all this? It's crucial for data processing to recognize that if you force what the user puts in the 'Location' box into a standardized form, you're losing information. For example, knowing how somebody naturally describes where they are is going to be a lot more useful for grouping them together than a street address or latitude/longitude coordinates. If I choose the Lower Haight label, I'm more likely to be a hippy or a punk, for the Castro I want to identify with the gay image, or if I go for the Mission I'm associating myself with hipsters.

I'm glad Twitter has stuck with their free-form text fields, and I hope Facebook will become more flexible. Don't throw this data away, treasure it! It makes it a lot harder for machines to deal with the content that people produce, but unless you're shipping packages or targeting ICBMs, the payoff of richer knowledge of your users is worth it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: