PostPath’s drop-in replacement to Exchange

Path
Photo by Rogilde

Everyone knows there has to be a better hub for your mail than the current Exchange, but you can’t adopt a system like Zimbra or Open Exchange without making changes to all your desktop and mobile clients. (corrected- Gray pointed out that ActiveSync means no mobile headache!) So I was excited to see a server solution that aims to leave the rest of your communications world untouched, PostPath. They emulate a whole series of Microsoft’s proprietary communication APIs, and they’ve done it the hard way, sniffing network packets since they were tackling this before the latest Open Specification releases.

They seem to have done an impressive job emulating Exchange, with a good crop of real-world deployments showing that it’s at production quality. There’s some things I’d like to figure out, like how much of Exchange you need to keep around for Active Directory and management tools, but I’m looking forward to downloading the evaluation version and seeing for myself.

How to improve Gmail

Lab
Photo by Michael Bonnet Jr

I’m a big fan of Google Mail, I’ve moved most of my family over to it, and use it for my own accounts. They offer a lot of tools for searching and filtering your email, their browser interface is top-notch with advanced support for things like hot-keys, and they support APIs like IMAP so you can easily connect to non-web devices. I’m eagerly anticipating the day that they apply the same smarts they use to process web data to all of that information in my inbox, but the service seems to have stood still since it was launched.

My hopes were raised when I saw the launch of Gmail Labs, but I was disappointed when I looked through the experimental tools available. They were all fairly minor UI tweaks, things like removing the chat sidebar or the unread items counts. I was looking forward to seeing some funky analytical magic, things like Xoopit’s innovative attachment display, or extracting your social network from your mail history.

I’m not sure why Google is being so slow to innovate with Gmail. Part of it may be technical, doing those sort of analytics requires a lot of database work, and that may be too resource-intensive and scary for the spare-time Labs model to produce results. They may be worried about the negative effects on their reputation if they’re seen to be data-mining peoples emails too. It makes sense for them to focus on attracting users to their service. If anything like Xoopit does become popular, they can imitate and rely on gathering a similar customer base to be a big barrier to any smaller competitors. Hotmail and Yahoo could pose a threat thanks to their larger user count, but they seem even less likely to do something radical and new.

To move forward, I agree with Marshall Kirkpatrick that Gmail should offer an API for email content, one that doesn’t require users to hand over their passwords like IMAP. Imagine all the fun stuff that a Facebook-style plugin API could offer to mail users, operating securely within a Google sandbox to limit the malicious possiblities. If the reputation risks of that are too scary, they could make progress with an internal push to do something similar, encouraging their mail developers to move beyond incremental improvements and really sink their teeth into some red meat innovation.

I think the biggest barrier is the perception of email as boring, which leads to few resources being devoted to it, which leads to few innovations, which makes it appear boring. Hopefully services like Xoopit and experiments like Mail Trends will break that cycle by opening people’s eyes to the possibilities.

Mailbox quotas, and why businesses don’t use Gmail

Subway
Photo by Gullevek

Google mail offers me 6.7 GB of storage and counting. Your IT department probably sends nagging emails once you hit 200 MB. Why?

Your corporate mail runs on Exchange. Microsoft makes money by selling licenses for Exchange to run on a single machine. Gmail runs on the Google operating system, so the work to deal with your mail is spread across an arbitrary network of machines and storage. They’re already dealing with insane storage and performance requirements to run search, and they can largely reuse the same technology for mail.

With Exchange, dealing with scaling is left to your local IT administrator. They buy a single fast machine with speedy RAID drives, and any upgrade to capacity or speed means moving everything over to new hardware. This makes it vital to avoid filling up the disk, and the only way to guarantee that is to put a hard limit on the size any user’s mailbox can reach. It’s not only disk capacity that’s constrained, the high cost of CPU cycles in this setup means that new algorithms to do interesting things are hard to justify.

This all sounds like a textbook case for software-as-a-service. So why aren’t all companies running on Gmail? There’s things missing from the Google services that Microsoft offers, but I think the real blocker is the business importance of email. When mail stops working, modern businesses are crippled. For something that vital, CEOs want to know there’s someone they can fire if it breaks. As an outsourced service, and with their consumer focus, it’s hard to feel confident that someone will be sufficiently motivated to solve your problem right now.

Why use email as an interface?

Piping_2
Photo by VoxPhoto

There’s some great examples out there of using email as the gateway to a service. I Want Sandy is a fantastic automated personal assistant that you drive entirely through email. You send emails containing natural language details of your events and lists, and you get back timely reminders and updates. Posterous lets you email files and documents directly to a website, with an incredibly streamlined interface.

So why do they use email as an interface, rather than the web?

Everybody can email. You don’t have to teach anyone a new web interface. You type in a mail, chose an address and hit send.

Mail programs make great content. You can easily attach files, add text styles and include photos. If I forget and hit Command-B in Firefox while I’m writing a blog post, my text doesn’t get bolded, I just get to see the bookmarks sidebar. Email programs get this right, they give you drag and drop, hot-keys and let you create good-looking documents easily.

Email is everywhere. Sure, most devices also have the web, but they usually have a much better UI for mail.

Email contains everything. Outlook is the center of most professional lives, and personal email already has most of the information, files and pictures you want to share. Being able to do interesting things with all of that without stepping outside of your mail service is really convenient. All of your history with any service is stored in the same place you keep everything else.

So how can you tap into that power? I don’t know what Sandy and Posterous are using, but GoodServer looks like an intriguing solution. It’s a Java library that implements an IMAP server that you can then plug your custom application logic into. They’ve got good documentation, a free evaluation copy, and it’s been battle-tested by a lot of commercial outfits.

Cross-platform Exchange connectivity with Moonrug

Cables
Photo by Melissa Morano

Thanks to a Gmail ad I recently discovered Moonrug Software. They offer a Java-based library that uses the MAPI network protocol to interface with any Exchange server. This is the same way that Outlook connects to Exchange, so it has the potential to support everything Outlook has access to, including calendar and contact information. This makes it a lot more comprehensive than basic email protocols like IMAP.

I’ve exchanged a few emails with Moonrug’s founder, and they’re still rolling out their full package, but they have recently released a sample demonstrating synchronization with Exchange. It’s good to see someone figuring out a cornerstone of the Exchange connectivity puzzle. Traditionally Microsoft has tried to maintain a competitive advantage by keeping it’s mail ecosystem as closed as possible. In theory that’s changing with the new Windows Open Protocols initiative. In practice they’ve not yet got around to releasing the really juicy details of things like the MAPI network protocol, so you’re stuck trying to reverse engineer them instead. Moonrug have been working on that approach for the last couple of years, long before the protocol initiative was announced.

Their product should be a great alternative to trying to do the same yourself, helping to open up the Exchange world to some real innovations.

How to fix illegal character errors in PHP XML parsing

Stop
Photo by Intimaj

I’m still plagued by occasional failures in my XML parsing due to illegal characters. Explicitly setting the character encoding reduced the frequency, but they’re still popping up occasionally. I have a couple of techniques I’ve tried. One is to use iconv() to strip out any illegal characters for the set I’m using, eg

$output = iconv("ISO-8859-1", "ISO-8859-1//IGNORE", $input);

This apparently works with more complex unicode sets, but at the moment I’m sticking with an 8 bit character encoding. The problem is that all values correspond to a defined character in ISO-8859-1. It took some head-scratching to realize that ISO-8859-1 is not the same as ISO 8859-1! The extra hyphen after ISO denotes an extended version that includes values in the range 0x00 to 0x1f, 0x7f and 0x80 to 0x9f. This fills up the range of mapped values, so that any number between 0 and 255 corresponds to a valid character in ISO-8859-1, and the line above does nothing.

So, in theory that will fix Unicode encodings, but I need something that will handle the characters that are valid in ISO-8859-1 but that aren’t allowed by the XML spec. These are the control characters in the range 0x00 to 0x1f, and 0x7f. To replace these you can run a regular expression that looks something like this:

/[\x00-\x19\x7F]//g

I actually had a large file on disk that I wanted to change, so I actually used sed and its control character class shorthand:

sed ‘s/[[:cntrl:]]//g’ messages.xml > messages.xml.fixed

This solved the illegal character error I was hitting. Now I’m hitting "XML error: EntityRef: expecting ‘;’ at line 451837", and inspection of the text hasn’t helped me figure out what’s wrong yet. At least I’ve got a lot further through the file.

Even more ways to speed up IMAP Gmail importing in PHP

Bomberos
Photo by Zerega

In my last two articles on importing mail from Google in PHP I thought I’d got performance up to a pretty high level, but once I started testing with mailboxes with over 30,000 mails, I realized I had to be more creative.

The main trick I discovered in that investigation is using imap_fetch_overview() to get information on a lot of messages at once. This is a lot faster than grabbing the full header info for a single message at a time using imap_headerinfo(). The downside is that it doesn’t return as much information about each message. For me the most painful loss was that you only get the first recipient. Another wrinkle is that you don’t get the sender information separated into the email address and display portions, you just get a single string that may contain either both, or just the address. I had to write my own regex parser to pull out the two components.

I’ve updated my sample code to use the overview function, and it includes the code to split up the combined sender string too. You can try it online, or download it as evenfasterphpgmail.zip. The sender parsing code is also included below:

function extract_address_from_display($full)
{
    $matchcount = preg_match_all(
"/(.*)<[^\._a-zA-Z0-9-]*([\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+).*>/i",
$full, $matches);
    if ($matchcount)
    {
        $address = $matches[2][0];
        $display = $matches[1][0];
    }
    else
    {
        $matchcount = preg_match_all(
"/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i",
$full, $matches);
        if ($matchcount)
        {
            $address = $matches[0][0];
            $display = $address;
        }
        else
        {
            $address = "";
            $display = $full;
        }
    }
   
    return array( "address" => $address, "display" => $display);
}