Photo by Stephanie Booth
When I designed the Mailana architecture I built my data pipeline around an XML format capturing the message information I need. That meant I could support a wide variety of sources by just writing a single import component for each that translated the native format into my XML. That's worked out really well, letting me pull in data directly from Exchange servers, Outlook PST files, Gmail and other IMAP services, and of course from Twitter.
I've been having some fascinating chats with Pete Sheinbaum, and one thing he's been enthusiastic about is tapping into the mass market by grabbing communications data that isn't easily accessible. In practice that means screen-scraping and other unconventional techniques, all of which are immensely appealing to my subversive geeky streak (see my old GoogleHotKeys project) and would be easy to integrate into Mailana as an import component. Here's some of my favorite approaches to grabbing email:
Yahoo IMAP Spoofing
Normally you can only get IMAP or POP access to your Yahoo inbox if you upgrade to a premium account. Last year they introduced their Zimbra desktop client which works even with free accounts, and it wasn't long before some enterprising coders discovered it was using a slightly modified version of IMAP. To programmatically access all Yahoo email accounts all you need to do is send one non-standard command!
Outlook Web Access Screen Scraping
The Substandard Evil Genius has a nice little snippet for logging into OWA and grabbing the HTML for an email inbox. Parsing that would give you the headers for a page of emails, and then you could grab the links to download each message's content. It's definitely tougher than using a real API, but with some of time and care it's very feasible to pull down everything from an Outlook account.
TrueSwitch's Uber Screen Scraper
I hadn't run across TrueSwitch until recently, but they're a fascinating company. Their purpose in life is to let people transfer everything from old to new email accounts, including all messages and contacts. What rocks is that they use screen scraping to support all the main email services, even those like Hotmail that only have a webmail interface. Amusingly they're also used by all the main email providers to make it easy for users to switch to them, even those like Yahoo and MSN that deliberately don't offer an API, presumably to make it harder for users switch away!
They don't offer an API, but what it does demonstrate is that it's possible to access almost everyone's email by screen scraping if you're willing to invest the time and effort.