Even more ways to speed up IMAP Gmail importing in PHP

Bomberos
Photo by Zerega

In my last two articles on importing mail from Google in PHP I thought I’d got performance up to a pretty high level, but once I started testing with mailboxes with over 30,000 mails, I realized I had to be more creative.

The main trick I discovered in that investigation is using imap_fetch_overview() to get information on a lot of messages at once. This is a lot faster than grabbing the full header info for a single message at a time using imap_headerinfo(). The downside is that it doesn’t return as much information about each message. For me the most painful loss was that you only get the first recipient. Another wrinkle is that you don’t get the sender information separated into the email address and display portions, you just get a single string that may contain either both, or just the address. I had to write my own regex parser to pull out the two components.

I’ve updated my sample code to use the overview function, and it includes the code to split up the combined sender string too. You can try it online, or download it as evenfasterphpgmail.zip. The sender parsing code is also included below:

function extract_address_from_display($full)
{
    $matchcount = preg_match_all(
"/(.*)<[^\._a-zA-Z0-9-]*([\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+).*>/i",
$full, $matches);
    if ($matchcount)
    {
        $address = $matches[2][0];
        $display = $matches[1][0];
    }
    else
    {
        $matchcount = preg_match_all(
"/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i",
$full, $matches);
        if ($matchcount)
        {
            $address = $matches[0][0];
            $display = $address;
        }
        else
        {
            $address = "";
            $display = $full;
        }
    }
   
    return array( "address" => $address, "display" => $display);
}

Welcome to the United States of America

Greencard

I’ve just been accepted as a permanent resident here in the US, with the green card (actually mostly white) arriving a few days ago. It’s taken me 7 years of patience and struggle, but now I’ve graduated from a temporary work visa tied to a single employer, to an independent person, free to follow my dreams. It’s a giddy feeling, both the new-found security that I won’t have to leave the country and the liberation of having no restrictions on my professional life.

I’m counting down the days to naturalization now, just 5 years from now I can be a full citizen. I knew very quickly after arriving that I belonged here, as much as I miss my family and friends from Britain. America is full of encouragement for people dreaming big dreams, it’s the best place in the world for doing something that’s never been done. Thanks to everyone who’s kept me going through the long process of getting this sorted out, especially Liz.

How social networks control your company

Chat
Photo by Belinketeneghe

Brokerage and Closure by Ronald Burt is a must-read for anyone interested in innovation and social networks. He’s a sociologist with the Chicago Graduate School of Business who’s spent years mapping and analyzing the patterns of relationships in large companies like Raytheon. This book describes how new ideas, trust and power flow directly from these networks.

The title refers to the two forces that shape who you talk to. Closure is the technical term for how insular a group of peope are, measured by the strength of relationships between all the insiders, and the weakness of ties with outsiders. If you draw a graph of the communications within a group with high closure, you see a lot of lines between the members, and few contacts with others:
Closure
In everyday language, a cluster of people with high closure would be called a clique. They form because they have some big advantages. It’s a lot easier to trust someone you’ve no experience with if you share mutual friends, because the risk to their reputation will be severe if they let you down. The dense pattern of communications also makes sure that practices and beliefs get spread and standardized quickly throughout the group.

Large organizations are made up of many of these self-contained teams, each with their own shared experiences, ideas and ways of doing things. Brokerage is the act of bridging the gaps, or structural holes, between these groups in the network. People who have connections with multiple groups that would be otherwise unconnected are known as brokers or bridges.

Broker

They play an important role in innovation because they have the chance to introduce good ideas from one team into another, or combine partial insights from multiple groups into a new approach to a problem. They also have political advantages because they have more information about the motivations and goals of other teams, and can use that knowledge to help steer decision-making to avoid conflicts and gain support for initiatives.

Where Burt really shines is the application of this general model to the wealth of data from sociological studies within companies, together with his own personal experiences of working with large businesses. He sets out to prove 4 ‘stylized facts’ about how brokerage and closure works in practice:

Brokers do better. He uses network analysis together with personnel records to show that people who have strong connections outside their immediate team get paid more, and promoted faster.

Brokers have better ideas. Analyzing the ranking of improvements for a supply-chain management department together with the connectedness of the people suggesting them, he builds a case that the reason brokers do better is because of the quality of the ideas they come up with.

Brokerage is useless without closure. This is less of a slam-dunk, but he gathers evidence that brokers don’t help when the teams themselves are fragmented and poorly coordinated. Intuitively this makes sense, groups who can’t communicate internally won’t be able to execute even given the best ideas.

The echo chamber amplifies closure. Treating networks as information circuits ignores the primate biases that actually guide our social behavior. In particular, etiquette demands that we avoid contradicting a conversation partner when possible. This and similar habits mean that reputations are exaggerated in a feedback loop through gossip, since people you talk to will tend to agree with your assessment of someone, even if they don’t hold the same opinion. This gives the illusion of corroborating evidence for your views, and tends to tighten the bonds that bind a group together and more strongly exclude outsiders. This is a tough one to tease out from the data, but he shows that the more mutual contacts you share with someone, the stronger your opinion of them, even if that opinion disagrees with the assessments of your shared contacts.

This is vital reading for anyone dealing with social networks because of the applications of these theories to the design of our tools. At the start he talks about the delusion that having lots of contacts in a network adds value, when instead the really valuable connections are those outside your immediate group, and how this is where businesses like LinkedIn and Tacit should be focusing their efforts.

I’m particularly interested because most of my work has been aimed at making brokerage easier and faster. Defrag Connector was about establishing initial trust between conference attendees by revealing mutual friends. I’m analyzing email to reveal the existing communication networks, and identify good candidates for brokerage contacts because they’re experts in a helpful area, or have external contacts that would be useful. Most of his data comes from self-reported surveys of who people talk to, I’d love to run some of his work against my large company email data sets. He mentions Valdis Krebs in the foreword, but I was disappointed I didn’t see any references to his work deriving networks from implicit communication data.

Burt is writing for an academic audience, so he presents a lot of the primary data backing up his arguments, which can make it a tough read for generalists like me. He’s got a readable style though, and I love some of the anecdotes that pop up throughout, such as the quote from a manager explaining that when analyzing improvement ideas "that were either too local in nature, incomprehensible, vague or too whiny, I didn’t rate them."

Why the passive voice is considered harmful

Faceless
Photo by MadMannequin

I really, really hate the passive voice. I had to rewrite my bachelor’s thesis after my supervisor rejected my active version. People use it to add an aura of faceless authority to what they’re writing, as if it’s not just someone’s opinion, it’s the way the world is. Things occur, there’s nobody to argue with, they just are. George Orwell agreed too, including it as one of his 5 rules of effective writing.

Most companies I admire write their copy in the active voice, see Feedburner’s about page for a good example. It’s part of a stance that they are in a conversation with their customers as equals, not talking down to them. The passive voice says "There’s no one you can talk to, this is a one-way communication". Active verbs give the feeling that you’re hearing from a human being who might welcome a response. Blogs use the active voice, and that’s what makes them seem so fresh and energetic.

It’s tough when you’re starting off to steer clear of passivity. You want as much authority as you can fake, since a big hurdle is getting anyone to take a chance on a startup with no history, but the language you use affects your thoughts and actions. Using the passive voice is all about putting distance between you and your customers, and you’ll end up losing out. Be active and engage people instead.

Death of a startup

Graveyard
Photo by Auchinoon

My old roommate Dave taught me snowboarding, and one thing he said stuck with me: "If you don’t fall down at least once every day, you’re not pushing yourself hard enough". (He also comforted me with the claim that "chicks dig scars" after I impaled my leg on a fencepost on my first day out.) One of the things I’ve found liberating here in the US compared to England is that it’s possible to fail without being labeled as a failure. On that topic Bob Sutton has a post on why "Am I a success or a failure?" is the wrong question to ask.

I’ve never been through the death-throes of a startup, but Visual Sciences, a games startup I worked at for four years, collapsed in a painful bankruptcy throwing a lot of good friends out of work. Andrew Hyde laments the sense of shame that still comes when you’re involved in a failed business, and like me wishes there were more post-mortems out there to help us all learn. Nick Napp, founder of the promising Disruptor Monkey, has taken that up that challenge with a post explaining what happened to the company. It’s tough because it’s an emotionally charged topic, and there’s always details that have to remain private, but he’s done a great job covering what he’s learnt. Now I guess it’s up to me to pick one of my own professional failures and return the favor.

Easily create gorgeous graphs with the Google Charts API

I’ve looked at a lot of ways to create graphs dynamically on the web. PHP/SWF charts are fantastic if you want a beautiful results, a lot of options, and interactivity, but they require flash, which both limits the platforms that can use them, and can result in slower loading. For better compatibility, you need something that generates images on the fly.

I’d investigated using jpgraph, but the results looked really ugly and it takes up precious cycles on your own server. Then I discovered a free Google web service that generates images on the fly for you, the Charts API. The pictures above are examples of the high-quality results it produces, with clean fonts, nice 3D and most importantly antialiasing. The API is incredibly simple to use, you just pass in the data and options as parameters to the URL. You don’t even need to register or get a key. Here’s the URLs for the two images:

http://chart.apis.google.com/chart?cht=p3&amp;chs=480×200&amp;chd=s:Hellob&amp;chl=May|June|July|August|September|October http://chart.apis.google.com/chart?cht=lc&amp;chd=s:pqokeYONOMEBAKPOQVTXZdecaZcglprqxuux393ztpoonkeggjp&amp;chco=676767&amp;chls=4.0,3.0,0.0&amp;chs=480×200&amp;chxt=x,y&amp;chxl=0:|1|2|3|4|5|1:|0|50|100&amp;chf=c,lg,90,76A4FB,0.5,ffffff,0|bg,s,EFEFEF

While it’s easy to get started with this style, it does have some downsides. Since the data is encoded as part of the URL, there’s a hard limit on how many points you can have since some systems choke on URLs over 2000 characters long. The API also doesn’t support as many styles or options as PHP/SWF, and no animations is possible.

Despite those disclaimers, this is an amazing tool, and I’ll be having a lot of fun with it. One of my favorite features is the map graph type, which lets you easily specify just colors and states or countries, and it generates an image showing that on a simple map. It would be insanely easy to create some geographic data visualizations using it if you’ve got interesting data. Here’s an example of the results:

http://chart.apis.google.com/chart?chco=f5f5f5,edf0d4,6c9642,365e24,13390a&chd=s:fSGBDQBQBBAGABCBDAKLCDGFCLBBEBBEPASDKJBDD9BHHEAACAC&chf=bg,s,eaf7fe&chtm=usa&chld=NYPATNWVNVNJNHVAHIVTNMNCNDNELASDDCDEFLWAKSWIORKYMEOHIAIDCTWYUTINILAKTXCOMDMAALMOMNCAOKMIGAAZMTMSSCRIAR&chs=440×220&cht=t