How to extract and categorize email addresses


Photo by SlightlyLessRandom

It’s possible to extract some interesting information from someone’s email address, such as which organization they represent, what type of organization it is, and whether it’s a work or personal account. This is very useful if you want to do automatic contact location in a Spoke-like way, eg who do I know at company X, and for the statistical analysis of large email stores in my own Mailana.

The key is the 80/20 rule. 80% of emails come from 20% of organizations. That makes it feasible to create a white-list that covers the most common US companies, colleges and ISPs, noting their type and giving the organization’s full name. With Liz’s help, I’ve put together an initial list of 2200. Here’s a demonstration of it in practice, or you can enter some addresses into the box below:

You can also download the source and list at

It’s definitely not infallible, but it’s good enough to be useful for my purposes. The more organizations get added, the more accurate it gets, so to add your own edit the domaininformation.txt file. There’s a line for each organization, in this format:

organization domain|display name|type

Let me know if you do generate a larger list you’re willing to share, and I’ll update the example. Thanks to Christine DeMello for compiling her directory of colleges.

