In my last two articles on importing mail from Google in PHP I thought I’d got performance up to a pretty high level, but once I started testing with mailboxes with over 30,000 mails, I realized I had to be more creative.
The main trick I discovered in that investigation is using imap_fetch_overview() to get information on a lot of messages at once. This is a lot faster than grabbing the full header info for a single message at a time using imap_headerinfo(). The downside is that it doesn’t return as much information about each message. For me the most painful loss was that you only get the first recipient. Another wrinkle is that you don’t get the sender information separated into the email address and display portions, you just get a single string that may contain either both, or just the address. I had to write my own regex parser to pull out the two components.
I’ve updated my sample code to use the overview function, and it includes the code to split up the combined sender string too. You can try it online, or download it as evenfasterphpgmail.zip. The sender parsing code is also included below:
function extract_address_from_display($full)
{
$matchcount = preg_match_all(
"/(.*)<[^\._a-zA-Z0-9-]*([\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+).*>/i",
$full, $matches);
if ($matchcount)
{
$address = $matches[2][0];
$display = $matches[1][0];
}
else
{
$matchcount = preg_match_all(
"/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i",
$full, $matches);
if ($matchcount)
{
$address = $matches[0][0];
$display = $address;
}
else
{
$address = "";
$display = $full;
}
}
return array( "address" => $address, "display" => $display);
}