PostPath’s drop-in replacement to Exchange

Path
Photo by Rogilde

Everyone knows there has to be a better hub for your mail than the current Exchange, but you can’t adopt a system like Zimbra or Open Exchange without making changes to all your desktop and mobile clients. (corrected- Gray pointed out that ActiveSync means no mobile headache!) So I was excited to see a server solution that aims to leave the rest of your communications world untouched, PostPath. They emulate a whole series of Microsoft’s proprietary communication APIs, and they’ve done it the hard way, sniffing network packets since they were tackling this before the latest Open Specification releases.

They seem to have done an impressive job emulating Exchange, with a good crop of real-world deployments showing that it’s at production quality. There’s some things I’d like to figure out, like how much of Exchange you need to keep around for Active Directory and management tools, but I’m looking forward to downloading the evaluation version and seeing for myself.

How to improve Gmail

Lab
Photo by Michael Bonnet Jr

I’m a big fan of Google Mail, I’ve moved most of my family over to it, and use it for my own accounts. They offer a lot of tools for searching and filtering your email, their browser interface is top-notch with advanced support for things like hot-keys, and they support APIs like IMAP so you can easily connect to non-web devices. I’m eagerly anticipating the day that they apply the same smarts they use to process web data to all of that information in my inbox, but the service seems to have stood still since it was launched.

My hopes were raised when I saw the launch of Gmail Labs, but I was disappointed when I looked through the experimental tools available. They were all fairly minor UI tweaks, things like removing the chat sidebar or the unread items counts. I was looking forward to seeing some funky analytical magic, things like Xoopit’s innovative attachment display, or extracting your social network from your mail history.

I’m not sure why Google is being so slow to innovate with Gmail. Part of it may be technical, doing those sort of analytics requires a lot of database work, and that may be too resource-intensive and scary for the spare-time Labs model to produce results. They may be worried about the negative effects on their reputation if they’re seen to be data-mining peoples emails too. It makes sense for them to focus on attracting users to their service. If anything like Xoopit does become popular, they can imitate and rely on gathering a similar customer base to be a big barrier to any smaller competitors. Hotmail and Yahoo could pose a threat thanks to their larger user count, but they seem even less likely to do something radical and new.

To move forward, I agree with Marshall Kirkpatrick that Gmail should offer an API for email content, one that doesn’t require users to hand over their passwords like IMAP. Imagine all the fun stuff that a Facebook-style plugin API could offer to mail users, operating securely within a Google sandbox to limit the malicious possiblities. If the reputation risks of that are too scary, they could make progress with an internal push to do something similar, encouraging their mail developers to move beyond incremental improvements and really sink their teeth into some red meat innovation.

I think the biggest barrier is the perception of email as boring, which leads to few resources being devoted to it, which leads to few innovations, which makes it appear boring. Hopefully services like Xoopit and experiments like Mail Trends will break that cycle by opening people’s eyes to the possibilities.

Mailbox quotas, and why businesses don’t use Gmail

Subway
Photo by Gullevek

Google mail offers me 6.7 GB of storage and counting. Your IT department probably sends nagging emails once you hit 200 MB. Why?

Your corporate mail runs on Exchange. Microsoft makes money by selling licenses for Exchange to run on a single machine. Gmail runs on the Google operating system, so the work to deal with your mail is spread across an arbitrary network of machines and storage. They’re already dealing with insane storage and performance requirements to run search, and they can largely reuse the same technology for mail.

With Exchange, dealing with scaling is left to your local IT administrator. They buy a single fast machine with speedy RAID drives, and any upgrade to capacity or speed means moving everything over to new hardware. This makes it vital to avoid filling up the disk, and the only way to guarantee that is to put a hard limit on the size any user’s mailbox can reach. It’s not only disk capacity that’s constrained, the high cost of CPU cycles in this setup means that new algorithms to do interesting things are hard to justify.

This all sounds like a textbook case for software-as-a-service. So why aren’t all companies running on Gmail? There’s things missing from the Google services that Microsoft offers, but I think the real blocker is the business importance of email. When mail stops working, modern businesses are crippled. For something that vital, CEOs want to know there’s someone they can fire if it breaks. As an outsourced service, and with their consumer focus, it’s hard to feel confident that someone will be sufficiently motivated to solve your problem right now.

Why use email as an interface?

Piping_2
Photo by VoxPhoto

There’s some great examples out there of using email as the gateway to a service. I Want Sandy is a fantastic automated personal assistant that you drive entirely through email. You send emails containing natural language details of your events and lists, and you get back timely reminders and updates. Posterous lets you email files and documents directly to a website, with an incredibly streamlined interface.

So why do they use email as an interface, rather than the web?

Everybody can email. You don’t have to teach anyone a new web interface. You type in a mail, chose an address and hit send.

Mail programs make great content. You can easily attach files, add text styles and include photos. If I forget and hit Command-B in Firefox while I’m writing a blog post, my text doesn’t get bolded, I just get to see the bookmarks sidebar. Email programs get this right, they give you drag and drop, hot-keys and let you create good-looking documents easily.

Email is everywhere. Sure, most devices also have the web, but they usually have a much better UI for mail.

Email contains everything. Outlook is the center of most professional lives, and personal email already has most of the information, files and pictures you want to share. Being able to do interesting things with all of that without stepping outside of your mail service is really convenient. All of your history with any service is stored in the same place you keep everything else.

So how can you tap into that power? I don’t know what Sandy and Posterous are using, but GoodServer looks like an intriguing solution. It’s a Java library that implements an IMAP server that you can then plug your custom application logic into. They’ve got good documentation, a free evaluation copy, and it’s been battle-tested by a lot of commercial outfits.

Cross-platform Exchange connectivity with Moonrug

Cables
Photo by Melissa Morano

Thanks to a Gmail ad I recently discovered Moonrug Software. They offer a Java-based library that uses the MAPI network protocol to interface with any Exchange server. This is the same way that Outlook connects to Exchange, so it has the potential to support everything Outlook has access to, including calendar and contact information. This makes it a lot more comprehensive than basic email protocols like IMAP.

I’ve exchanged a few emails with Moonrug’s founder, and they’re still rolling out their full package, but they have recently released a sample demonstrating synchronization with Exchange. It’s good to see someone figuring out a cornerstone of the Exchange connectivity puzzle. Traditionally Microsoft has tried to maintain a competitive advantage by keeping it’s mail ecosystem as closed as possible. In theory that’s changing with the new Windows Open Protocols initiative. In practice they’ve not yet got around to releasing the really juicy details of things like the MAPI network protocol, so you’re stuck trying to reverse engineer them instead. Moonrug have been working on that approach for the last couple of years, long before the protocol initiative was announced.

Their product should be a great alternative to trying to do the same yourself, helping to open up the Exchange world to some real innovations.

How to fix illegal character errors in PHP XML parsing

Stop
Photo by Intimaj

I’m still plagued by occasional failures in my XML parsing due to illegal characters. Explicitly setting the character encoding reduced the frequency, but they’re still popping up occasionally. I have a couple of techniques I’ve tried. One is to use iconv() to strip out any illegal characters for the set I’m using, eg

$output = iconv("ISO-8859-1", "ISO-8859-1//IGNORE", $input);

This apparently works with more complex unicode sets, but at the moment I’m sticking with an 8 bit character encoding. The problem is that all values correspond to a defined character in ISO-8859-1. It took some head-scratching to realize that ISO-8859-1 is not the same as ISO 8859-1! The extra hyphen after ISO denotes an extended version that includes values in the range 0x00 to 0x1f, 0x7f and 0x80 to 0x9f. This fills up the range of mapped values, so that any number between 0 and 255 corresponds to a valid character in ISO-8859-1, and the line above does nothing.

So, in theory that will fix Unicode encodings, but I need something that will handle the characters that are valid in ISO-8859-1 but that aren’t allowed by the XML spec. These are the control characters in the range 0x00 to 0x1f, and 0x7f. To replace these you can run a regular expression that looks something like this:

/[\x00-\x19\x7F]//g

I actually had a large file on disk that I wanted to change, so I actually used sed and its control character class shorthand:

sed ‘s/[[:cntrl:]]//g’ messages.xml > messages.xml.fixed

This solved the illegal character error I was hitting. Now I’m hitting "XML error: EntityRef: expecting ‘;’ at line 451837", and inspection of the text hasn’t helped me figure out what’s wrong yet. At least I’ve got a lot further through the file.

Even more ways to speed up IMAP Gmail importing in PHP

Bomberos
Photo by Zerega

In my last two articles on importing mail from Google in PHP I thought I’d got performance up to a pretty high level, but once I started testing with mailboxes with over 30,000 mails, I realized I had to be more creative.

The main trick I discovered in that investigation is using imap_fetch_overview() to get information on a lot of messages at once. This is a lot faster than grabbing the full header info for a single message at a time using imap_headerinfo(). The downside is that it doesn’t return as much information about each message. For me the most painful loss was that you only get the first recipient. Another wrinkle is that you don’t get the sender information separated into the email address and display portions, you just get a single string that may contain either both, or just the address. I had to write my own regex parser to pull out the two components.

I’ve updated my sample code to use the overview function, and it includes the code to split up the combined sender string too. You can try it online, or download it as evenfasterphpgmail.zip. The sender parsing code is also included below:

function extract_address_from_display($full)
{
    $matchcount = preg_match_all(
"/(.*)<[^\._a-zA-Z0-9-]*([\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+).*>/i",
$full, $matches);
    if ($matchcount)
    {
        $address = $matches[2][0];
        $display = $matches[1][0];
    }
    else
    {
        $matchcount = preg_match_all(
"/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i",
$full, $matches);
        if ($matchcount)
        {
            $address = $matches[0][0];
            $display = $address;
        }
        else
        {
            $address = "";
            $display = $full;
        }
    }
   
    return array( "address" => $address, "display" => $display);
}

Speed up your Gmail IMAP downloading

Launch
Photo by IslandBoy

Now I’m getting deeper into using the IMAP API to pull email from Google, I’m hitting a lot of performance issues. Most of them are on the parsing and database loading side, but while profiling I did discover a few ways I was using IMAP inefficiently. I’ve updated my original PHP/Gmail example with some optimizations. The main speed boost was switching from grabbing all the email headers using imap_headers() just to get the total number of messages in the mailbox. That’s very inefficient, especially on large mailboxes. Instead I just call imap_num_msg() to get the count directly, and that’s much faster. Another wrinkle was asking for the INBOX mailbox to get all the messages. It’s better to look for [Gmail]/All Mail if you want the complete set of non-spam email in case the user has organized their mail into different folders, though you do also get the sent mail as part of that.

Here’s the source code as a zip, or you can give it a try online. Big thanks to Rob and Josh at EventVue for trying some of this out on their mailboxes too, they’ve been a fantastic help.

My medium can beat up your medium

Scarybloke
Scary bloke by AphasiaFilms

I recently indulged in some arm-waving about how email is the Big Daddy of message systems, despite all the glamorous alternatives taking the spotlight. To back this up with some data, I set out to get some rough global usage figures for the top text-based mediums out there; email, SMS, Facebook, IM, blogs and Twitter.

  • Facebook has over 70 million active users. As a closed system, it’s hard to work out the message frequency, but around 2 a day seems plausible to me. That would mean around 50 billion sent each year.
  • The comScore global IM user count is 800 million. Guessing again an average of 2 messages a day, that’s 600 billion messages a year.
  • SenderBase indicates that there’s around 3 billion non-spam emails a day. That’s around 1 trillion messages annually.

Email, SMS and IM are the clear winners in raw volume. It does lead me to wonder about the driving forces behind choosing which system to use.

Privacy is obviously important. Tomi Ahonen has a great comment on this story where he talks about kids using SMS to friends in the same room, not for convenience but because a clandestine communication channel is a powerful social bonding tool. There’s a widespread assumption that openness is both good and inevitable, but we’re just primates at heart, and sharing secrets is one equivalent of picking fleas off each others backs.

Using the raw numbers like this is obviously unfair. I put a lot more time into an average blog post than an email, my Facebook messages have more content than my IMs, and I seldom use anything but email for business communications. Even so, the statistics make a strong case that despite their growth, other systems will take a long time to pass email.

Is email dying?

Dummy
Photo by TCMHitchhiker

Technologies die pretty frequently. When I first logged on in 1992, Usenet was the place to be, with thousands of high-quality, high-traffic discussions groups hosted on an open system. It’s very openness destroyed it as a mainstream technology, with first the Eternal September when AOL allowed access to a large group of people unfamiliar with the voluntary netiquette required to keep it functioning, and then the first Green Card Lottery spam that signalled the start of the battle against unsolicited ads on open networks. Usenet retains some bright spots, I still love rec.arts.sf.written for its quality and depth, but the vast majority of discussions now happen on website forums, and most users have never heard of newsgroups.

It’s tempting to see an analogy with internet mail. It’s another open system that suffers from bad actors abusing its lack of restrictions. Even with Gmail’s spam filter I still end up with about an email a month incorrectly marked as spam, and there’s no way to verify anyone’s identity, easily guarantee security or prioritize messages from people you know. This makes Facebook’s system very alluring if everyone you want to talk to is on there. Twitter is in an interesting space between IM and email too, with the potential for interesting consequences as people adapt to its rules.

A lot of the commenters on Seth’s post argue that traditional email isn’t ever going away because it has such a massive network effect advantage over any closed system. I don’t agree. I think that it will remain as the lowest common denominator of internet communication, but a new service like Facebooks that had a large user base could offer a high quality experience for proprietary communications but also fall back to internet mail for talking to anyone who’s not signed up. Even today most company’s Exchange setups offer extra features for internal communications, like a global address book allowing you to just type in a name rather than a full address.

The big reason I’m working in the email area is that writing notes to other people is never going away, but the existing options are very limited. Internet email is by social convention a private medium, and it’s non-realtime. IM is also private but real-time. Twittering is public and real-time. Blogging is public and non-realtime. At their heart they’re all about writing down your thoughts and communicating them to other people. So why can’t I take an interesting email discussion and add it to my blog? Email an IM discussion to someone else who might be interested, and seamlessly continue the conversation through both email and IM?

The need for private, non-realtime electronic messages isn’t going away, and internet mail will remain alive as part of that, but there will be an increasing number of services that offer a higher quality experience.