How to use IMAP as a Gmail API in PHP

Palomarstamp
Photo by Voxphoto

I’ve tended to avoid client/server APIs like IMAP or POP for my mail analysis work, because they’re inherently limited to a single account and a lot of the information I’m interested in comes from looking at an entire organization’s data. Mihai Parparita’s work with MailTrends impressed me though, so I’m going to show you how to access Gmail messages using IMAP as an API. I’ll be using a PHP script, since I have an irrational bias against Python. Something about semantically significant whitespace really gets my goat.

I’ve got a demonstration page up at http://funhousepicture.com/phpgmail/. You’ll need to enter your full gmail address and password if you want to try it out there, or you can download the sourcecode and run it on your own server. I’ve also included it inline below. After connecting, it will fetch all of the headers from your account, along with the full content of the first ten messages. This may take a few seconds

You’ll need PHP with support for the IMAP library enabled to use it yourself. I was surprised to find this wasn’t included by default in the OS X distribution, and after some considerable yak shaving trying to get my own copy of PHP compiled, along with all its dependencies, I gave up doing local development and relied on my hosted Linux server instead. Thankfully that worked right out of the box.

<?php

function gmail_login_page()
{
?>
<html>
<head><title>Gmail summary login</title>
<style type="text/css">body { font-family: arial, sans-serif; margin: 40px;}</style>
</head>
<body>
<div>This page demonstrates how to access your Gmail account using IMAP in PHP. </div><br/>
<div>Enter your full email address and password, and the next page will show a selection of information about your account.</div><br/>
<div>See <a href="http://petewarden.typepad.com/">http://petewarden.typepad.com/</a&gt; for more information.</div><br/>
<hr/><br/>
<div>
<form action="index.php" method="POST">
<input type="text" name="user"> Gmail address<br/>
<input type="password" name="password"> Password<br/>
<br/>
<input type="submit" value="Get summary">
</form>
</div>
<hr/>
</body>
</html>
<?php
}

function gmail_summary_page($user, $password)
{
?>
<html>
<head><title>Gmail summary for <?=$user?></title>
<style type="text/css">body { font-family: arial, sans-serif; margin: 40px;}</style>
</head>
<body>
<?php
   
    $imapaddress = "{imap.gmail.com:993/imap/ssl}";
    $imapmainbox = "INBOX";
    $maxmessagecount = 10;

    display_mail_summary($imapaddress, $imapmainbox, $user, $password, $maxmessagecount);
?>
</body>
</html>
<?php
}

function display_mail_summary($imapaddress, $imapmainbox, $imapuser, $imappassword, $maxmessagecount)
{
    $imapaddressandbox = $imapaddress . $imapmainbox;

    $connection = imap_open ($imapaddressandbox, $imapuser, $imappassword)
        or die("Can’t connect to ‘" . $imapaddress .
        "’ as user ‘" . $imapuser .
        "’ with password ‘" . $imappassword .
        "’: " . imap_last_error());

    echo "<u><h1>Gmail information for " . $imapuser ."</h1></u>";

    echo "<h2>Mailboxes</h2>\n";
    $folders = imap_listmailbox($connection, $imapaddress, "*")
        or die("Can’t list mailboxes: " . imap_last_error());

    foreach ($folders as $val)
        echo $val . "<br />\n";

    echo "<h2>Inbox headers</h2>\n";
    $headers = imap_headers($connection)
        or die("can’t get headers: " . imap_last_error());

    $totalmessagecount = sizeof($headers);

    echo $totalmessagecount . " messages<br/><br/>";

    if ($totalmessagecount<$maxmessagecount)
        $displaycount = $totalmessagecount;
    else
        $displaycount = $maxmessagecount;

    for ($count=1; $count<=$displaycount; $count+=1)
    {
        $headerinfo = imap_headerinfo($connection, $count)
            or die("Couldn’t get header for message " . $count . " : " . imap_last_error());
        $from = $headerinfo->fromaddress;
        $subject = $headerinfo->subject;
        $date = $headerinfo->date;
        echo "<em><u>".$from."</em></u>: ".$subject." – <i>".$date."</i><br />\n";
    }

    echo "<h2>Message bodies</h2>\n";

    for ($count=1; $count<=$displaycount; $count+=1)
    {
        $body = imap_body($connection, $count)
            or die("Can’t fetch body for message " . $count . " : " . imap_last_error());
        echo "<pre>". htmlspecialchars($body) . "</pre><hr/>";
    }

    imap_close($connection);
}

$user = $_POST["user"];
$password = $_POST["password"];

if (!$user or !$password)
    gmail_login_page();
else
    gmail_summary_page($user, $password);

?>

My own private Los Angeles

Gunplay

A friend who lives nearby sent me this photo. It’s pretty mind-blowing that there’s parts of LA where this is a necessary public service announcement, and got me thinking about how I experience the city. When I talked to the recruiter about jobs in the US the only guidance I gave was "anywhere but LA". I had grown up with LA Law, Baywatch and countless movies that left me certain that I’d hate it. Of course, all the interviews he arranged were in LA. I ended up accepting an offer here, with the idea I’d stay maybe a year.

A couple of days after I landed, I pulled out a street map and looked for any big patches of green, in the hope of finding some small place to walk in peace. I was surprised by the size of the blank spaces and picked one that looked promising. Rancho Sierra Vista was only a few minutes from where I was staying, and I found I could walk 9 miles straight through wilderness along Sycamore Canyon, right to the Pacific. Even more amazing was that this was the narrow axis of the parkland, it stretched for over 30 miles from Santa Monica to Camarillo. Ever since then, the Santa Monica Mountains have been my real Los Angeles.

Unlike any other city I’ve lived in, LA is entirely optional. Hardly anyone I know visits the east side, or even the sketchy neighborhoods near Santa Monica. The reliance on freeways means that downtown is a lot less important than you’d expect, with events and attractions scattered through the other hot locales like Hollywood. You can pick and choose which areas you want to visit and miss out on very little. It’s not like London where the center has all of the biggest shops, tourist traps and entertainment, reinforced by the flow of the tube lines. The only place that forces you to come into contact with Angelenos from the whole city is the freeway itself, with Humvees scattered between gardener’s pickups.

I’m not proud of my isolation from the majority of the city, but it does seem characteristic of LA. One of my favorite parts of trail work is getting local kids who have no idea there’s even wilderness on their doorstep excited about the outdoors. Many of their families are as ignorant of the beauty on offer as I was when I arrived, so getting the word out is crucial. The reason I’m writing up the local spots is so anybody who starts an internet search for hiking or camping hears about all the choices. I love my Los Angeles but I want to share it, even if that makes it a little less private.

Analyzing your Gmail

Mailtrends


Mihai Parparita
, a Google developer, has created a system to display information about your email over time. Mail Trends is a python script that connects to your Gmail account through IMAP, and generates a series of tables and graphs showing information about your mail account over time. The time aspect is key, it’s one of the most interesting parts of email, and something that distinguishes it from other implicit data we have access to. He has a demonstration using part of the Enron data set, and you can see the most prolific emailers, subjects and who sends you the most email. I was hoping it would also demonstrate searching by keyword, since being able to look for specific terms is very useful for research in Google Trends and similar buzz tracking sites for the web. One of my goals is to both show graphs of search keywords over time in your mail, in the same way that MarkMail does for its public mailing list search, and also have a animated tag clouds that show the most popular terms as they change over time. I’ll be watching closely for future developments, at least one of the blog commenters understands how this could build into something larger.

On the technical side, using IMAP is a great way to work around the lack of a proper Gmail API. He’s using the Python IMAPLib, I’ll have to look at the equivalents for other languages, since I have an irrational prejudice against any language in which whitespace is significant. Tabs in make files also bother me, but I’ve learnt to live with them. A hat tip to Brad and Googlified for pointing me towards Mail Trends.

How to build your own Facebook server

Sunstorm
Photo by Coccinelle69

In the last post I talked about the mechanics of how an app communicates with Facebook. With the alpha release of Ringside, there’s now an example of how to implement the server side of Facebook. It’s open-source and the two most interesting parts are their underlying mysql database and the PHP interface code that implements the API on top of that. Using mysql makes it hard to scale to massive numbers of users, so it’s not ready to power Facebook yet. On the other hand, having enough users to strain a single database server is a good problem to have. At that point you should have the resources to reimplement something more advanced under the hood.

Having a reference host for any plugin architecture is immensely helpful, especially one that’s open source. For example, if I was having trouble with the details of fetching events, I could open up ringside/api/includes/ringside/api/facebook/EventsGet.php and inspect exactly what their implementation is. There’s no guarantee that it’s the same as Facebook’s code, but it’s at least an unambiguous and exact specification of what somebody else thinks it should be doing. To get your own copy of the source using SVN, run
svn co https://ringside.svn.sourceforge.net/svnroot/ringside ringside

The other exciting part of Ringside’s release is their mysql schema. It could become a defacto standard for expressing the data that underlies all social networks. Anybody who’s able to take their own data source and translate it into the same tables can plug that into Ringside’s system. Turn the key, and you’ve got your own private Facebook. The schema is at ringside/api/config/ringside-schema.sql

If you want to customize it, the API source is full of great examples of how to work with the database to extend its capabilities, though the LGPL licence might require your changes to also be published.

What’s going on under the hood of Facebook’s API?

Clockwork

Photo by fallsroad

Facebook’s API comes wrapped in libraries for all the popular server languages, but there will come a day when you need to debug the raw HTTP transactions that they all boil down to. As a scripting language, the PHP implementation is easy to understand, and I ended up tweaking mine to output the exact text that’s flowing between me and Facebook. This was partly to help debugging, but also for my own curiosity. I’d like to model some of my interfaces on Facebook’s since it’s simple, robust and flexible.

You call a method by sending an HTTP request to "http://api.facebook.com/restserver.php&quot;. Arguments to the method are passed in the POST string sent as part of the request. Here’s an example for an event API call, split up on ampersands so that it won’t go off the edge of the blog, and with any secret values replaced with X:

uid=XXXXXXXXX&
eids=&
start_time=0&
end_time=1000000000000&
rsvp_status=&
method=facebook.events.get&
session_key=XXXXXXXXXXXXXXXXXXXXXXXX_XXXXXXXXX&
api_key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&
call_id=1206547876.5053
&v=1.0&
sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is generated by taking the normal PHP arguments to each method, along with stored login and API keys, and serializing them into this string. If CURL is present on the server, this is then used to send the request, otherwise PHP’s native HTTP access functions are used.

Assuming that the call name (specified in "method") and the other arguments check out, then the Facebook server will return a string as its response. This string is in XML, and looks something like this:

<?xml version="1.0" encoding="UTF_8"?>
<events_get_response xmlns="http://api.facebook.com/1.0/&quot;
xmlns:xsi="http://www.w3.org/2001/XMLSchema_instance&quot;
xsi:schemaLocation="http://api.facebook.com/1.0/ http://api.facebook.com/1.0/facebook.xsd&quot;
list="true">
  <event>
    <eid>5172087276</eid>
    <name>Blog World Expo example</name>
    <tagline>http://www.blogworldexpo.com/</tagline&gt;
    <nid>0</nid>
    <pic>http://profile.ak.facebook.com/object2/5/55/s5172087276_7478.jpg</pic&gt;
    <pic_big>http://profile.ak.facebook.com/object2/5/55/n5172087276_7478.jpg</pic_big&gt;
    <pic_small>http://profile.ak.facebook.com/object2/5/55/t5172087276_7478.jpg</pic_small&gt;
    <host>BlogWorld</host>

… <snip …
</event>
</events_get_response>

The library then takes this simple XML string, and parses it into a PHP hierarchical array of values that looks like this:

Array
(
    [0] => Array
        (
            [eid] => 5172087276
            [name] => Blog World Expo example
            [tagline] => http://www.blogworldexpo.com/
            [nid] => 0
            [pic] => http://profile.ak.facebook.com/object2/5/55/s5172087276_7478.jpg
            [pic_big] => http://profile.ak.facebook.com/object2/5/55/n5172087276_7478.jpg
            [pic_small] => http://profile.ak.facebook.com/object2/5/55/t5172087276_7478.jpg
            [host] => BlogWorld

… <snip> …
        )
)

This always matches the structure of the XML. Facebook use a restricted subset that avoids tag attributes and anything else that might make it hard to map to this JSON style format.

Another possibility is that an error will be returned. In that case, the XML will normally just be a couple of tags, the error message string and the numeric error code. This gets converted to a PHP exception.

To dig into this code yourself, I recommend looking through facebookapi_php5_restlib.php in the client folder of the Facebook SDK. That’s a good place to add your own debugging code too, though there’s already some that can be enabled by setting the $GLOBALS[‘facebook_config’][‘debug’] variable to true.

How to convert mbox files to an Outlook pst

Kenlay
Photo by MotherPie

[Update- There's now a good alternative that includes separate PSTs for each user]

After getting the Enron emails into the mbox format, the next step was to convert them into something that the Outlook/Exchange world can understand. Thankfully I already had a great conversion program in mind, Aid4Mail. At its core its a translator between a large number of mail formats, including Outlook, Outlook Express, Windows Mail, Eudora, Thunderbird, Netscape Messenger, Pegasus Mail and a whole bunch of generic formats including several mbox variants. It can read and write to all of these formats, and has a large number of options to transform the mail as you do so. For example you can choose to only convert mails sent between certain dates, or to ignore attachments. If you're working with mail, I highly recommend giving this program a try, it's the swiss army knife of email tools.

To do the Enron conversion, I selected generic unix mbox as the input format. On the next screen I navigated to the root folder that contained all my files, and then chose Outlook pst as the destination type. I left all the other options at their defaults, so no filtering was done and the folder hierarchy was preserved. It took around 16 hours to process all 500,000 messages, and the pst file came out at around 5 GB.

I'm able to open it in Outlook and browse through the messages, and can also add them to my Exchange server. There are some issues, it doesn't preserve the original user structure, since they're all in one pst, attachments aren't included, and some of the addresses are obsfucated. It's good enough to give me the testbed I need to put some of my tools through some real-world stress tests.

Once the upload has finished, you should be able to access the pst yourself at
http://funhousepicture.com/enron.pst
It's 5 GB, so it won't be all there for a good few hours, and be prepared for a long download time.

The joy of nearly being eaten

Kingsnake_2

After growing up in Britain, where the apex predator is the badger, I feel lucky to be living where there’s truly wild wildlife. There’s something about the knowledge that you could be eaten or poisoned around the next corner to add an edge of alertness to any trip. The possible downside is being somebody’s next meal, but the certain upside is appreciating you’re in a true wilderness.

Liz once saw the rear end of a mountain lion disappear down the trail, but I’ve had to content myself with plenty of bob cats, coyotes and rattlesnakes. Two weeks ago, we even had a rattler who refused to leave our worksite, so he watched us warily for a few hours. Above you can see me relocating a harmless California King Snake after our maintenance had disturbed its home. Below are a few more of the lovely beasties we’ve encountered.

Scorpion_2

It’s not unusual to come across these small scorpions when you turn over a rock. So far nobody’s been stung, and from what I understand our local variety aren’t too poisonous anyway. It makes me feel like I’m in a western every time I spot one though.

Blackwidow1_2

This action shot is a Black Widow in our back yard. We seem to have dozens around the outside of the house, they have the most beautiful sleek black bodies, with the distinctive red hourglass marking. We don’t have many closeup photos of them for obvious reasons.

Walkingstick_2

I’m not too worried about this Walking Stick insect eating me, but he’s one of the coolest designs I’ve seen in a long time. He’s definitely got the Apple elegance about him, the MacBook Air of the insect world.