Speed up your PHP with XHProf


I'll admit it, I'm doing performance-intensive code in PHP. I started my career writing demos in hand-coded assembler, but the need for development speed has pushed me towards using a scripting language. Part of making fast progress is reducing dependencies, which has meant sticking with PHP for the whole system, even for back-end analysis where another language might be more expected.

That's left me desperate to get more information on where the time is going when things slow down. My first weapon of choice is a simple pair of timing functions that I wrap around code to tell how long a whole block is taking. That's quick and easy, (and I've included my code below) but it doesn't give you any information about which parts of it are taking all the time.

For that you need a profiler, and if you do a search for php profiling, almost every result talks about XDebug. Unfortunately, as a profiler, it's a great debugger. You need to edit php.ini and restart the server, or pass in a URL input, you can only profile an entire script rather than just portions, and once you have generated the file you need to transfer them to one of several klunky desktop applications to explore the results.

After abandoning XDebug as too unwieldy, I spent some time searching for other solutions. I finally came across XHProf, and I'm loving it. It was developed as an internal tool for Facebook and open-sourced in March, and you can tell it's been written by people who actually use it. It took a little bit of fiddling to install it, I couldn't get it going through PECL and ended up downloading the source and manually compiling. It was a dream to use after that. It let me trigger the profling programatically around the code I cared about, and then browse the results through a web interface.

There's a couple of caveats, it's missing a few advanced features I'm used to from advanced desktop profilers like detailed information on the full call stack for functions rather than the immediate parents and timing for individual lines of code rather than functions, but in practice it's got all the features I need to diagnose performance problems. I was able to speed up my IMAP email importing dramatically, largely by removing the use of a global variable in an inner loop, it turned out to be far faster to pass the object as a function argument! That's the sort of problem that would have taken me far longer to find without XHProf.

Here's the primitive timing functions I mentioned at the start:

$g_start_time = 0;
$g_end_time = 0;

function pete_start_timer()
    global $g_start_time;
    list($usec, $sec) = explode(' ',microtime());
    $g_start_time = ((float)$usec + (float)$sec);

function pete_end_timer($dolog=true)
    global $g_start_time;
    global $g_end_time;
    list($usec, $sec) = explode(' ',microtime());
    $g_end_time = ((float)$usec + (float)$sec);
    $duration = ($g_end_time – $g_start_time);

    if ($dolog)
        $durationstring = 'pete_timer: %01.4f sec';
        error_log(sprintf($durationstring, $duration));
    return $duration;

Sewers and startups

Photo by Elsie Esq.

I had a chance to chat with Matt Mullenweg yesterday, and he focused on something I've been struggling with; building to last. It also got me thinking about plumbing.

Joseph Bazalgette is one of my engineering heros. He built London's first sewers in the 19th century, and started by estimating how large they'd need to be to cope with the current population. He then said "Well, we're only going to be doing this once and there's always the unforeseen" and doubled the diameter! Thanks to his foresight and the beautiful workmanship of the bricklayers, those same sewers are still serving Londoners today, despite a population many times larger.

The biggest enemy of early-stage startups is time. We can't afford premature scalation, because before we've finished building a system robust enough to handle millions of active users we'll have run out of money. That means we end up accumulating technical debt as we struggle to get customers and revenue with the least possible amount of code.

The danger is we end up succesful, but so deeply mired in technical debt that we spend all our time paying interest rather than making meaningful progress with the product (see the last decade of Windows). As Vernor Vinge evokes so well, there's a good chance some of our code will be in the lower layers of the stack essentially forever. It's a deep engineering sin to inflict shoddy sewers on future generations.

Matt's key insight was "When you're in the red, time is working against you. Once you're profitable, time is on your side". Getting to even Ramen profitability changes everything, and gives you the ability to build for the long term.

When I joined Apple back in 2003, the central build farm for all projects had both PowerPC and x86 Darwin boxes, and our code had to compile on both. Steve was playing a long game, years before the Intel switch he was obviously planning for it, (though I only caught the significance in retrospect).

Looking at WordPress, you can see the same combination of long-term planning sustained by profitability. A lot of focus in the startup world is on exits, but I'll be ecstatic if I'm still helping build Mailana in 20 years time. Seeing Matt's dedication to building something to last gave me hope, especially as he gave practical steps to get there.