How to use IMAP as a Gmail API in PHP

Palomarstamp
Photo by Voxphoto

I’ve tended to avoid client/server APIs like IMAP or POP for my mail analysis work, because they’re inherently limited to a single account and a lot of the information I’m interested in comes from looking at an entire organization’s data. Mihai Parparita’s work with MailTrends impressed me though, so I’m going to show you how to access Gmail messages using IMAP as an API. I’ll be using a PHP script, since I have an irrational bias against Python. Something about semantically significant whitespace really gets my goat.

I’ve got a demonstration page up at http://funhousepicture.com/phpgmail/. You’ll need to enter your full gmail address and password if you want to try it out there, or you can download the sourcecode and run it on your own server. I’ve also included it inline below. After connecting, it will fetch all of the headers from your account, along with the full content of the first ten messages. This may take a few seconds

You’ll need PHP with support for the IMAP library enabled to use it yourself. I was surprised to find this wasn’t included by default in the OS X distribution, and after some considerable yak shaving trying to get my own copy of PHP compiled, along with all its dependencies, I gave up doing local development and relied on my hosted Linux server instead. Thankfully that worked right out of the box.

<?php

function gmail_login_page()
{
?>
<html>
<head><title>Gmail summary login</title>
<style type="text/css">body { font-family: arial, sans-serif; margin: 40px;}</style>
</head>
<body>
<div>This page demonstrates how to access your Gmail account using IMAP in PHP. </div><br/>
<div>Enter your full email address and password, and the next page will show a selection of information about your account.</div><br/>
<div>See <a href="http://petewarden.typepad.com/">http://petewarden.typepad.com/</a&gt; for more information.</div><br/>
<hr/><br/>
<div>
<form action="index.php" method="POST">
<input type="text" name="user"> Gmail address<br/>
<input type="password" name="password"> Password<br/>
<br/>
<input type="submit" value="Get summary">
</form>
</div>
<hr/>
</body>
</html>
<?php
}

function gmail_summary_page($user, $password)
{
?>
<html>
<head><title>Gmail summary for <?=$user?></title>
<style type="text/css">body { font-family: arial, sans-serif; margin: 40px;}</style>
</head>
<body>
<?php
   
    $imapaddress = "{imap.gmail.com:993/imap/ssl}";
    $imapmainbox = "INBOX";
    $maxmessagecount = 10;

    display_mail_summary($imapaddress, $imapmainbox, $user, $password, $maxmessagecount);
?>
</body>
</html>
<?php
}

function display_mail_summary($imapaddress, $imapmainbox, $imapuser, $imappassword, $maxmessagecount)
{
    $imapaddressandbox = $imapaddress . $imapmainbox;

    $connection = imap_open ($imapaddressandbox, $imapuser, $imappassword)
        or die("Can’t connect to ‘" . $imapaddress .
        "’ as user ‘" . $imapuser .
        "’ with password ‘" . $imappassword .
        "’: " . imap_last_error());

    echo "<u><h1>Gmail information for " . $imapuser ."</h1></u>";

    echo "<h2>Mailboxes</h2>\n";
    $folders = imap_listmailbox($connection, $imapaddress, "*")
        or die("Can’t list mailboxes: " . imap_last_error());

    foreach ($folders as $val)
        echo $val . "<br />\n";

    echo "<h2>Inbox headers</h2>\n";
    $headers = imap_headers($connection)
        or die("can’t get headers: " . imap_last_error());

    $totalmessagecount = sizeof($headers);

    echo $totalmessagecount . " messages<br/><br/>";

    if ($totalmessagecount<$maxmessagecount)
        $displaycount = $totalmessagecount;
    else
        $displaycount = $maxmessagecount;

    for ($count=1; $count<=$displaycount; $count+=1)
    {
        $headerinfo = imap_headerinfo($connection, $count)
            or die("Couldn’t get header for message " . $count . " : " . imap_last_error());
        $from = $headerinfo->fromaddress;
        $subject = $headerinfo->subject;
        $date = $headerinfo->date;
        echo "<em><u>".$from."</em></u>: ".$subject." – <i>".$date."</i><br />\n";
    }

    echo "<h2>Message bodies</h2>\n";

    for ($count=1; $count<=$displaycount; $count+=1)
    {
        $body = imap_body($connection, $count)
            or die("Can’t fetch body for message " . $count . " : " . imap_last_error());
        echo "<pre>". htmlspecialchars($body) . "</pre><hr/>";
    }

    imap_close($connection);
}

$user = $_POST["user"];
$password = $_POST["password"];

if (!$user or !$password)
    gmail_login_page();
else
    gmail_summary_page($user, $password);

?>

My own private Los Angeles

Gunplay

A friend who lives nearby sent me this photo. It’s pretty mind-blowing that there’s parts of LA where this is a necessary public service announcement, and got me thinking about how I experience the city. When I talked to the recruiter about jobs in the US the only guidance I gave was "anywhere but LA". I had grown up with LA Law, Baywatch and countless movies that left me certain that I’d hate it. Of course, all the interviews he arranged were in LA. I ended up accepting an offer here, with the idea I’d stay maybe a year.

A couple of days after I landed, I pulled out a street map and looked for any big patches of green, in the hope of finding some small place to walk in peace. I was surprised by the size of the blank spaces and picked one that looked promising. Rancho Sierra Vista was only a few minutes from where I was staying, and I found I could walk 9 miles straight through wilderness along Sycamore Canyon, right to the Pacific. Even more amazing was that this was the narrow axis of the parkland, it stretched for over 30 miles from Santa Monica to Camarillo. Ever since then, the Santa Monica Mountains have been my real Los Angeles.

Unlike any other city I’ve lived in, LA is entirely optional. Hardly anyone I know visits the east side, or even the sketchy neighborhoods near Santa Monica. The reliance on freeways means that downtown is a lot less important than you’d expect, with events and attractions scattered through the other hot locales like Hollywood. You can pick and choose which areas you want to visit and miss out on very little. It’s not like London where the center has all of the biggest shops, tourist traps and entertainment, reinforced by the flow of the tube lines. The only place that forces you to come into contact with Angelenos from the whole city is the freeway itself, with Humvees scattered between gardener’s pickups.

I’m not proud of my isolation from the majority of the city, but it does seem characteristic of LA. One of my favorite parts of trail work is getting local kids who have no idea there’s even wilderness on their doorstep excited about the outdoors. Many of their families are as ignorant of the beauty on offer as I was when I arrived, so getting the word out is crucial. The reason I’m writing up the local spots is so anybody who starts an internet search for hiking or camping hears about all the choices. I love my Los Angeles but I want to share it, even if that makes it a little less private.

Analyzing your Gmail

Mailtrends


Mihai Parparita
, a Google developer, has created a system to display information about your email over time. Mail Trends is a python script that connects to your Gmail account through IMAP, and generates a series of tables and graphs showing information about your mail account over time. The time aspect is key, it’s one of the most interesting parts of email, and something that distinguishes it from other implicit data we have access to. He has a demonstration using part of the Enron data set, and you can see the most prolific emailers, subjects and who sends you the most email. I was hoping it would also demonstrate searching by keyword, since being able to look for specific terms is very useful for research in Google Trends and similar buzz tracking sites for the web. One of my goals is to both show graphs of search keywords over time in your mail, in the same way that MarkMail does for its public mailing list search, and also have a animated tag clouds that show the most popular terms as they change over time. I’ll be watching closely for future developments, at least one of the blog commenters understands how this could build into something larger.

On the technical side, using IMAP is a great way to work around the lack of a proper Gmail API. He’s using the Python IMAPLib, I’ll have to look at the equivalents for other languages, since I have an irrational prejudice against any language in which whitespace is significant. Tabs in make files also bother me, but I’ve learnt to live with them. A hat tip to Brad and Googlified for pointing me towards Mail Trends.

How to build your own Facebook server

Sunstorm
Photo by Coccinelle69

In the last post I talked about the mechanics of how an app communicates with Facebook. With the alpha release of Ringside, there’s now an example of how to implement the server side of Facebook. It’s open-source and the two most interesting parts are their underlying mysql database and the PHP interface code that implements the API on top of that. Using mysql makes it hard to scale to massive numbers of users, so it’s not ready to power Facebook yet. On the other hand, having enough users to strain a single database server is a good problem to have. At that point you should have the resources to reimplement something more advanced under the hood.

Having a reference host for any plugin architecture is immensely helpful, especially one that’s open source. For example, if I was having trouble with the details of fetching events, I could open up ringside/api/includes/ringside/api/facebook/EventsGet.php and inspect exactly what their implementation is. There’s no guarantee that it’s the same as Facebook’s code, but it’s at least an unambiguous and exact specification of what somebody else thinks it should be doing. To get your own copy of the source using SVN, run
svn co https://ringside.svn.sourceforge.net/svnroot/ringside ringside

The other exciting part of Ringside’s release is their mysql schema. It could become a defacto standard for expressing the data that underlies all social networks. Anybody who’s able to take their own data source and translate it into the same tables can plug that into Ringside’s system. Turn the key, and you’ve got your own private Facebook. The schema is at ringside/api/config/ringside-schema.sql

If you want to customize it, the API source is full of great examples of how to work with the database to extend its capabilities, though the LGPL licence might require your changes to also be published.

What’s going on under the hood of Facebook’s API?

Clockwork

Photo by fallsroad

Facebook’s API comes wrapped in libraries for all the popular server languages, but there will come a day when you need to debug the raw HTTP transactions that they all boil down to. As a scripting language, the PHP implementation is easy to understand, and I ended up tweaking mine to output the exact text that’s flowing between me and Facebook. This was partly to help debugging, but also for my own curiosity. I’d like to model some of my interfaces on Facebook’s since it’s simple, robust and flexible.

You call a method by sending an HTTP request to "http://api.facebook.com/restserver.php&quot;. Arguments to the method are passed in the POST string sent as part of the request. Here’s an example for an event API call, split up on ampersands so that it won’t go off the edge of the blog, and with any secret values replaced with X:

uid=XXXXXXXXX&
eids=&
start_time=0&
end_time=1000000000000&
rsvp_status=&
method=facebook.events.get&
session_key=XXXXXXXXXXXXXXXXXXXXXXXX_XXXXXXXXX&
api_key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&
call_id=1206547876.5053
&v=1.0&
sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is generated by taking the normal PHP arguments to each method, along with stored login and API keys, and serializing them into this string. If CURL is present on the server, this is then used to send the request, otherwise PHP’s native HTTP access functions are used.

Assuming that the call name (specified in "method") and the other arguments check out, then the Facebook server will return a string as its response. This string is in XML, and looks something like this:

<?xml version="1.0" encoding="UTF_8"?>
<events_get_response xmlns="http://api.facebook.com/1.0/&quot;
xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance&quot;
xsi:schemaLocation="http://api.facebook.com/1.0/ http://api.facebook.com/1.0/facebook.xsd&quot;
list="true">
  <event>
    <eid>5172087276</eid>
    <name>Blog World Expo example</name>
    <tagline>http://www.blogworldexpo.com/</tagline&gt;
    <nid>0</nid>
    <pic>http://profile.ak.facebook.com/object2/5/55/s5172087276_7478.jpg</pic&gt;
    <pic_big>http://profile.ak.facebook.com/object2/5/55/n5172087276_7478.jpg</pic_big&gt;
    <pic_small>http://profile.ak.facebook.com/object2/5/55/t5172087276_7478.jpg</pic_small&gt;
    <host>BlogWorld</host>

… <snip …
</event>
</events_get_response>

The library then takes this simple XML string, and parses it into a PHP hierarchical array of values that looks like this:

Array
(
    [0] => Array
        (
            [eid] => 5172087276
            [name] => Blog World Expo example
            [tagline] => http://www.blogworldexpo.com/
            [nid] => 0
            [pic] => http://profile.ak.facebook.com/object2/5/55/s5172087276_7478.jpg
            [pic_big] => http://profile.ak.facebook.com/object2/5/55/n5172087276_7478.jpg
            [pic_small] => http://profile.ak.facebook.com/object2/5/55/t5172087276_7478.jpg
            [host] => BlogWorld

… <snip> …
        )
)

This always matches the structure of the XML. Facebook use a restricted subset that avoids tag attributes and anything else that might make it hard to map to this JSON style format.

Another possibility is that an error will be returned. In that case, the XML will normally just be a couple of tags, the error message string and the numeric error code. This gets converted to a PHP exception.

To dig into this code yourself, I recommend looking through facebookapi_php5_restlib.php in the client folder of the Facebook SDK. That’s a good place to add your own debugging code too, though there’s already some that can be enabled by setting the $GLOBALS[‘facebook_config’][‘debug’] variable to true.

How to convert mbox files to an Outlook pst

Kenlay
Photo by MotherPie

[Update- There's now a good alternative that includes separate PSTs for each user]

After getting the Enron emails into the mbox format, the next step was to convert them into something that the Outlook/Exchange world can understand. Thankfully I already had a great conversion program in mind, Aid4Mail. At its core its a translator between a large number of mail formats, including Outlook, Outlook Express, Windows Mail, Eudora, Thunderbird, Netscape Messenger, Pegasus Mail and a whole bunch of generic formats including several mbox variants. It can read and write to all of these formats, and has a large number of options to transform the mail as you do so. For example you can choose to only convert mails sent between certain dates, or to ignore attachments. If you're working with mail, I highly recommend giving this program a try, it's the swiss army knife of email tools.

To do the Enron conversion, I selected generic unix mbox as the input format. On the next screen I navigated to the root folder that contained all my files, and then chose Outlook pst as the destination type. I left all the other options at their defaults, so no filtering was done and the folder hierarchy was preserved. It took around 16 hours to process all 500,000 messages, and the pst file came out at around 5 GB.

I'm able to open it in Outlook and browse through the messages, and can also add them to my Exchange server. There are some issues, it doesn't preserve the original user structure, since they're all in one pst, attachments aren't included, and some of the addresses are obsfucated. It's good enough to give me the testbed I need to put some of my tools through some real-world stress tests.

Once the upload has finished, you should be able to access the pst yourself at
http://funhousepicture.com/enron.pst
It's 5 GB, so it won't be all there for a good few hours, and be prepared for a long download time.

The joy of nearly being eaten

Kingsnake_2

After growing up in Britain, where the apex predator is the badger, I feel lucky to be living where there’s truly wild wildlife. There’s something about the knowledge that you could be eaten or poisoned around the next corner to add an edge of alertness to any trip. The possible downside is being somebody’s next meal, but the certain upside is appreciating you’re in a true wilderness.

Liz once saw the rear end of a mountain lion disappear down the trail, but I’ve had to content myself with plenty of bob cats, coyotes and rattlesnakes. Two weeks ago, we even had a rattler who refused to leave our worksite, so he watched us warily for a few hours. Above you can see me relocating a harmless California King Snake after our maintenance had disturbed its home. Below are a few more of the lovely beasties we’ve encountered.

Scorpion_2

It’s not unusual to come across these small scorpions when you turn over a rock. So far nobody’s been stung, and from what I understand our local variety aren’t too poisonous anyway. It makes me feel like I’m in a western every time I spot one though.

Blackwidow1_2

This action shot is a Black Widow in our back yard. We seem to have dozens around the outside of the house, they have the most beautiful sleek black bodies, with the distinctive red hourglass marking. We don’t have many closeup photos of them for obvious reasons.

Walkingstick_2

I’m not too worried about this Walking Stick insect eating me, but he’s one of the coolest designs I’ve seen in a long time. He’s definitely got the Apple elegance about him, the MacBook Air of the insect world.

How to convert individual email files to mbox

Pearls
Pearls by Matuko Amini

I need to load my Exchange server with a large set of real emails, so I can simulate how my tools will work on a big organization’s mail. The best data set out there is the Enron collection, but since most researchers are doing static analysis, it’s only available in easy-to-process forms like a mysql database or as individual files. There’s no obvious way to turn them back into something that can be imported into Outlook or Exchange.

I needed a way to get it into a form that standard mail programs would recognize. The easiest format to convert individual files to is mbox. In this setup, a set of email messages is stored in a single file ending with .mbox. Within each file, messages are seperated by a "From line". This consists of the characters ‘F’, ‘r’, ‘o’, ‘m’, ‘ ‘, followed by an email address and a date in asctime format. Each of these from lines must be preceded by a blank line. To make sure there’s no confusion with message content, any line beginning "From " in the body of a message must be changed to ">From ".

Since this all involves heavy text processing, I turned to Perl. Here’s a copy of my mailconvert.pl script, and I’ve included it inline at the bottom. It will take a directory hierarchy of individual email files, and for each folder will create a mailbox.mbox that contains all of the messages in that folder. It recognises emails by the inclusion of a "From: " header, and uses that address and the date header to create a complete from line seperator. Run it with the current working directory set to the root of the hierarchy. For example, cd to inside the maildir if you’re trying to convert the files extracted from the Enron tar.

I’ve tested with Apple Mail, and I’m able to import the files this generates. It’s a bit eerie seeing all the Enron mails show up in my inbox, and it’s a good reminder that these are messages that the senders never intended to be public. If you do use these mails yourself, please be respectful of their privacy.

Once you’re in mbox there’s a lot of tools available to convert them to Microsoft-friendly formats like psts. I’ll be covering those in a future article, along with some enhancements like grabbing the attachments and keeping the folder structure from the originals.

#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use POSIX;
use File::Find;
use Date::Parse;

# You need a date in the from line, though it seems redundant with the headers.
# Without a date there, Apple Mail at least won't parse the mbox files, so pick
# an arbitrary value to put in there if we don't find a header.
my $datedefault = "Tue, 18 Mar 2008 12:11:51";

# The name of the mbox file created from all the messages in the directory
my $outputfilename = "mailbox.mbox";

# Empty the file
open(OUTPUT, "> $outputfilename");
close(OUTPUT);

my $count = 0;

find(\&findcallback, cwd);

# This is called back for every file found, and appends the contents to the
# main mbox file for that directory, together with a from line of the format
# "From <email address> <asctime format date" and a blank line.
sub findcallback
{
my $file = $File::Find::name;

# If it's not a file then don't do anything
return unless -f $file;

# Avoid processing the output file
if ($file eq $outputfilename)
{
return;
}

open F, $file or print "couldn't open $file\n" && return;

my $text = "";
my $from = "";
my $date = "";
while (<F>)
{
my $line = $_;
# If this line is a From: header, and we haven't found one before, then
# grab the address to use in the "From " seperator between mail messages
if( ($from eq "") and ($line =~ /^From: .*$/) )
{
$from = $line;
$from =~ s/^From: /From /;
# remove the new line
$from =~ s/[\r\n]//g;
}
elsif ($line =~ /^From .*$/)
{
# If there's a line that looks like a "From " seperator, add a > to
# prevent it messing up the mbox parsing
$line = ">" . $line;
}

# If this is a Date: header, then grab the value to use after the address
if( ($date eq "") and ($line =~ /^Date: .*$/) )
{
my $inputdate = $line;
$inputdate =~ s/^Date: //g;

my $datevalue = str2time($inputdate);
$date = POSIX::gmtime($datevalue);
}
$text .= $line;
}

close F;

# If no date header was found, pick an arbitrary one with the correct format
if ($date eq "")
{
$date = $datedefault;
}

# Work out the final string if this looks like a valid mail file
if ((length($text)>0) and (length($from)>0))
{
my $outputstring .= $from . " " . $date . "\n" . $text . "\r\n\r\n";

open(OUTPUT, ">> $outputfilename");
print OUTPUT $outputstring;
close(OUTPUT);
}

}

A Facebook Ajax Example

Ajax
Photo of the original Ajax by Oboulko

One of the toughest parts of the Facebook API is their Ajax support. There’s a good page on their wiki with a small piece of sample code, but since Event Connector uses Ajax heavily, I thought it would be a good real-world example. Here’s the PHP source code.

I’ve removed the application settings from config.php, so you’ll need to create your own application in Facebook and follow the same steps you do for the Footprints sample before you can use it. There’s some inline comments explain the control flow, and covering some of the Ajax quirks. One thing to be aware of is the 10 second time-out in all Facebook page requests. If you’re doing any heavy work on the server, or it could get overloaded, you’ll need a strategy to prevent your users seeing an error screen, which is exactly why I went with Ajax for this situation.

More Facebook API posts

How X1 approaches enterprise search

Skyscrapers
Photo by 2Create

X1 are best known for their desktop search tool, but they offer an enterprise-wide solution that tries to integrate a lot of different data sources to allow searches that cover all of a company’s information. It mostly sounds very similar to Google’s search appliance, but they do have an interesting architecture that includes an Exchange component. It uses server-side MAPI, which limits it to Exchange 2003 and earlier unless you download the optional MAPI components for 2007. There’s also no mention of hooking into the event model, so I would be curious to know how much of a lag there is between a message arriving, and it being indexed. For my email search I’m working on Exchange Web Services support, since that’s the supported 2007 API to replace MAPI, and trying to get real-time access to the data by hooking into the Exchange event model.

It sounds as if they’re focusing more on the enterprise side of the world, after a recent change of management and a switch to a paid model for their desktop client. Back in November they mentioned signing up 60 large companies as customers for their enterprise service, which sounds promising, especially alongside their 40,000 desktop downloads at $50 each.