Google, Yahoo and MSN Mail APIs

Mailbox

Whilst there’s no official Gmail API, there is an unsupported but widely used standard, the functions used by mobile phones to access Google mail. Luckily for me, there’s been a lot of work already done to figure out the format and protocol, probably the best documentation is the source of the libgmailer PHP project. The downside of it being unofficial is that it keeps getting broken by Google’s changes, but there’s an active community using it who seem to patch it up again very quickly.

Yahoo actually has an official mail API, but it suffers from a couple of serious flaws. First there’s this language at the start of the documentation: "You may not use the Yahoo! Mail Web Service API to mine or scrape user data from the user’s Yahoo! account." Umm, so I can access the data but can’t ‘mine’ or ‘scrape’ it, whatever that means? Does that include creating a social graph from their mailbox? It certainly sounds like it.

Secondly, some basic functions like GetMessage to grab information about an individual email are only available to premium accounts. I’d imagine that would instantly cut down the potential audience by an order of magnitude.

MSN/Hotmail used to have a nice undocumented API through WebDAV/HttpMail. Unfortunately they shut down access to non-premium customers in apparent response to spammers. There are reports (bottom of article) that it’s still possible to use it to download messages, just not send them, but I haven’t tested that. It looks like the only alternative is screen-scraping.

This is a great example of the ‘separate data silos with unusable content’ problem that Doc Searls discussed in his Defrag talk. The user could gain a lot from allowing other services access to their mail, for example decent external mail integration onto Facebook, but it’s not in the interest any of the companies that physically hold their data to allow that.

Lots of interesting mail/social graph buzz

Buzz
As Brad says, it is pretty obvious once you connect the dots, but I was still interested to see the NY Times article about the big players looking at their email services, and figuring out they’re not that far from having their own social networks.

It was good to learn about a new site covering this thanks to the comments section of Feld Thoughts; Email Dashboard. I’ll also need to write up what I learnt about Trampoline Systems and ClearContext at Defrag, but that’s for another post.

Defrag has arrived!

Defragbanner

I flew into Denver this morning, and even though Defrag doesn’t officially start until tomorrow, I’ve already had a couple of early meet-ups with some of the local folks. It was fun seeing Rob and Josh from EventVue in the flesh for the first time, and hearing about all their hard work. They’ve been running at full steam since May I hope they get a chance for a break soon.

I also made an interesting discovery; Denver has two Hyatt hotels just a couple of blocks from each other, the Grand Hyatt and the Hyatt Regency. I only found this out after I’d dropped my car at the Grand’s valet parking and tried to check in! Luckily I was able to make it to the right one without further misadventure.

Funhouse Photo User Count: 2,097 total, 111 active. Ticking up gradually, with some good weekend active numbers.

Event Connector
User Count
: 109 total, 4 active.

Beautiful data

Datavisualization

With the mass of raw data I’m getting from a couple of years of my own email, I’m looking around for a good way to turn that into information. A simple ranking of my closest contacts is a good start, but I want to also see how much of the real-life groupings between others can be revealed. I’m working on a basic force-directed graph implementation, but that still leaves a lot of display choices.

VisualComplexity.com is one of my favorite places to find inspiration. They’ve done a great job collecting some of the most striking methods of presenting graph data visually. I also enjoy the Data Mining blog. Matthew’s a great resource and he’s good at reminding me to focus on getting something useful from my visualizations, not just pretty pictures. He’s headed to Defrag, so I hope I’ll get a chance to say hello.

Funhouse Photo User Count: 2,042 total, 52 active. Steady growth, but a low active count.

Event Connector User Count: 106 total, 13 active. A miniature growth spurt over the last day or two, with a comparatively large number of engaged users.

Beetlejuice, Beetlejuice, Beetlejuice

Beetlejuice

When I was 16, I’d sneak into a University of Cambridge computer lab, with the desire to download Amiga mods and demos from ftp sites. This was pre-web, but I owned The New Hacker’s Dictionary (aka the Jargon File), and had been very excited by the concepts of email and Usenet. What blew me away when I actually tried them was that the person who wrote my assembler reference was on the ARM newsgroup and answering questions from mere mortals like me!

I was reminded of that yesterday when Matt Brezina from Xobni left a comment here after I reviewed their product. I’m old enough to remember a time before the internet, when people just couldn’t connect like that. Now there’s services that let you know when someone talks about something you’re tracking, and start a conversation. That’s what I find really interesting about working with a social graph, building tools that help people build relationships.

That computer lab also led to me living in an Alaskan treehouse for three months, but that’s a story for another post!

Funhouse Photo User Count: 2,003 total, 73 active. Broke 2000!

Event Connector User Count: 100 total, 5 active. Broke 100, which isn’t quite as exciting.

Competing mail graph services

Wrestler
My idea of deriving a social graph from your email messages isn’t new. In a simple form, sites like Facebook already pull your contacts from webmail sites to build a first draft of your friends list. Unfortunately, your contacts list is a poor map of your actual relationships.

SNARF is a Microsoft project from 2005 which analyzes your email in a much more sophisticated way, and uses this information to help you triage your messages. It has a flexible system of metrics, for example the number of emails you’d sent someone, to calculate an importance score for each contact. Emails from important people are displayed in a top priority area of the UI, away from the less important bacn.

Xobni is a more recent Outlook add-in that analyzes your messages. The slick UI gives you a lot of interesting ways to drill down into the relationships it finds. It gives you fast searching, raw stats on each contact’s mailing patterns, and lots more. Its automatic phone number extraction from messages and the display of attachments by sender look particularly useful.

Looking further afield, there’s some overlap with spam protection services. Spam guards attempt to exclude a particular class of email by analyzing messages. I don’t know of any that go beyond checking if you’ve sent someone an email before in their social analysis, but it seems like a natural direction for some of those companies to head in.

Yahoo and Google both have access to the raw information from millions of mail users, so if they see an advantage in this sort of mail-based social graph, they could create something really compelling. One interesting area is the APIs they offer, which might be enough for a third-party developer to at least show a proof-of-concept demo, though it would no doubt be against the ToS.

What I’ve yet to see if a really painful problem that is solved by any of these services. Xobni is the closest, but it still feels like a great set of additional features for Outlook, not a make-the-pain-stop solution that people will pay for.

I’m certain there are some great ways to solve problems using a mail-based social graph, I just need to find them!

Funhouse Photo User Count: 1,998 total, 73 active. Almost up to 2000 total,  with no work on it for the last month.

Event Connector User Count: 91 total, 5 active. A few new users still coming in through the directory, but no conference sign-ups.

Accessing mail with Extended MAPI

Stamp
In the first part of this series, I gave an overview of the APIs you can use to access Outlook mail, along with directions for how to get started with Visual Basic for Applications.

As I mentioned, using the Outlook Object Model/VBA is both slow and throws up a lot of security warnings. I need to process a lot of mail messages, and I need a language that will handle both heavy data processing and advanced graphics for the work I’ll need to do to turn the messages into a graph. This means I need a non-interpreted language with a lot of libraries, such as Java, C# or C++. Luckily, Extended MAPI is an old-school COM API that fits well with C++, and avoids the security restrictions.

One thing I didn’t realize about MAPI is that it’s usable even if you’re not running as an Outlook plugin. It’s designed to let third-party programs interact with the user’s mail store, and is available as long as Outlook has been installed on the system. This OutlookCode.com page is the most in-depth description and bibliography I’ve found.

The down-side of MAPI is that it’s COM, offers a lot of functionality, and has been implemented slightly differently by each version of Office. This means that using it can be tough, just getting to the ‘hello world’ level requires a lot of work, and past that there’s a lot of pits with spikes at the bottom to wander into. What I needed was a good code example to get me started.

Lucian Wischik came to the rescue, with some wonderful sample code demonstrating exactly how to use MAPI, along with some utility functions to simplify access, ample commentary explaining what was going on, and a completely public domain license for use. Amazingly, this is the third time in recent months that I’ve found what I was looking for on Lucian’s site. He also has the best samples on the web for writing a BHO without ATL, and hosting IE web rendering inside your own window. He must be accruing some serious good karma.

With Lucian’s help, it’s not too hard to create your own standalone executable that enumerates Outlook folders and items, and extracts information from them. Unfortunately, it’s not always possible to pull out the full HTML body of the message, but I’m mostly interested in the sender and receivers of each email which is easily accessible. It runs a lot faster than the equivalent VBA code, and doesn’t bring up any security dialogs.

Funhouse Photo User Count: 1,892 total, 101 active. Almost exactly the same as last night, so I must be looking at the same sample window in the stats.

Event Connector User Count: 82 total, 4 active. Also very similar.

Outlook plugin basics

Postcard

My current mission to create a social graph from my mailbox. I need to get access to the raw data on my email habits, and the easiest platform for that is Outlook, since it offers a plugin API. I had no clue how to get started with it though, and most articles I could find assumed some level of familiarity with the technology, so here’s a guide covering what I discovered on the absolute basics.

The first thing I discovered is, like standards, that the great thing about Outlook APIs is that there’s so many to choose from! The second discovery was that the flood of email viruses using those APIs a few years ago led Microsoft to barricade most of them behind heavy security. Luckily, there’s a lot of people writing Outlook code, and so there’s a lot of examples and resources to learn from. The best starting point is http://outlookcode.com/, but plain google searching will usually throw up some answers to even fairly obscure questions.

Collaboration Data Objects (CDO)
are a legacy interface API, and using them will often bring up scary security dialogs, since Outlook assumes you’re doing something malicious.

The Outlook Object Model (OOM) is the most common way of working with the data held by Outlook. It the interface you use by default when writing Visual Basic for Applications macros within the app, it’s well documented and supported, though you have to be careful for differences between app versions. Unfortunately for us, it’s also hedged in by security warning dialogs and restrictions. Because you generally have to make an API call for each mail item you want to deal with, it can be very slow if you want to handle thousands of them at once.

Extended MAPI is an advanced interface to the mail store in Outlook, and it’s mostly commonly used from from C++ or other compiled languages. The good news is that you’ve got access to almost any information, with no security UI. The bad news is that it’s not well documented, and is tricky to work with. Because you can access lots of items at once, without the overhead of an interpreted call every time, it can be a lot faster when you’re working with large numbers of emails.

Redemption is a third-party library that combines a lot of the advantages of Extended MAPI and OOM. Under the hood it uses MAPI, and so it’s fast and avoids security warnings, but it exposes an interface that’s a carbon copy of OOM, which makes it simple to use. It’s a real feat of engineering by Dmitry Streblechenko, and he’s also been very prolific in answering Outlook programming questions. At only $200 for an unlimited distribution version, it’s a great deal.

Getting started prototyping your code is as simple as pressing Alt+F11 in Outlook. This will bring up a separate VBA code window. If you select the session in the left pane, then take any of the simple examples out there and paste them in, you should be able to run them by hitting F5.

To make Redemption available, run its installer and then it should appear in the code editor’s Tools->References dialog box as an option. Then just reference its objects like any of the built-in types.

I’ve managed to create a simple prototype that loops through all the messages in a folder and outputs the basic information I need to build my graph to a text file. It’s slow, probably only a few dozen messages a second, but it should be good enough to let me create the foundation for a social graph. I’m moving over several years worth of email now, so I’ll have a good data-set to work with.

Funhouse Photo User Count
: 1,847 total, 93 active. I had a new review, requesting larger photos. That’s an interesting idea, if more people had photos in their albums that could be a compelling feature. I also saw the stats showing more people removing than adding the app, which doesn’t seem to match the total figures!

Event Connector User Count
: 77 total, 4 active. Still very quiet.