The worst interview question ever

Interrogate
Photo by B Rosen

This one article sums up everything that's wrong with engineering interviews. The author likes to ask potential hires to explain whether you can call delete this within a C++ member function. What's so wrong with that you ask, it seems like fairly standard practice?

I've conducted a lot of interviews, and been on the other side of a few, and from my own experience and the research I know poorly structured interviews like this are a terrible mechanism for predicting how good people will be at performing a job. Just think about this interview question for a second; how much time in your coding job do you typically spend worrying about this sort of C++ trivia versus debugging, trying to understand legacy code, talking to other engineers, figuring out requirements, explaining your project to managers, etc, etc? The right answer for me is "I've no clue, looks like a terrible idea generally, but I'd google it if needed."

The reason these sort of questions keep coming up is the same reason the drunk kept looking under the lamp post for his keys, they're within the comfort zone of technical specialists, even though the answers aren't useful. For a long time, I did the same, even though I was frustrated with the results. Finally I received some official training at Apple, and what they taught me opened my eyes!

You can find a more detailed description here, but the most important part is "Ask about past behavior". It's the best predictor of future performance, and if you ask in the right way it's also very hard to for the candidate to exaggerate or lie. You can do something general like "Tell me about your worst project", but something more specific is even better, I'd often use "Tell me about a time you hit a graphics driver bug". The candidates will start off with a superficial overview, but if you follow up with more detailed questions (eg "So, did you handle talking to Nvidia?") you'll start to build a real picture of their role and behavior, and it's almost impossible to fake that level of detail.

If C++ experience is crucial, then a much better question would be "Tell me about a time you had to debug a template issue" or "Tell me about a project you implemented using reference-counted objects". Anybody who's read enough C++ books can answer the original question, but these versions will tell you who's actually spent time in the trenches.

Easier command-line arguments in PHP

Arguments
Photo by Between a Rock

One of my pet peeves is that no language I've used handles command-line arguments well. Everyone falls back to C's original argv indexed array of space-separated strings, even though there's decades-old conventions about the syntax of named arguments. There's some strong third-party libraries that make it easier, and the arcane getopt(), but nothing that's emerged as a standard. Since I'm doing a lot more PHP shell scripts these days, I decided to write a PHP CLI parser that met my requirements:

Specify the arguments once. Duplication of information is ugly and error-prone, so I wanted to describe the arguments in just one place.

Automated help. The usage description should be generated from the same specification that the parser uses so it stays up-to-date.

Syntax checking. I want to be able to say which arguments are required, optional or switches, and have the parser enforce that, and catch any unexpected arguments too.

Unnamed arguments. Commands like cat take a list of files with no argument name, I wanted those to be easily accessible.

Optional defaults. It makes life a lot easier if you don't have to check to see if an argument was specified in the main script, so I wanted to ensure you could set defaults for missing optional arguments.

Human-readable specification. getopt() is close to what I need, but as well as not generating a usage description, the syntax for describing the long and short arguments is a horrible mess of a string. I want the argument specification to make sense to anyone reading the code.

Here's the result, cliargs.php. To use it specify your arguments in the form:

array(
 '<
long name of argument>' => array(
     'short' => '<
single letter version of argument>',
     'type' => <
'switch' | 'optional' | 'required'>,
     'description' => '<
help text for the argument>',
     'default' => '<
value if this is an optional argument and it isn't specified>',
 ),
 …
 );

There's an example script in the package, and documentation in the readme.txt. The code is freely reusable with no restrictions; I'm just dreaming of a world where no one ever writes another CLI argument parser ever again.

Three lessons I learnt from porting Diablo

Diablo
Photo by Vizzzual

It was 1997, I'd just finished college, was really excited about getting my first job in the game industry, and I was a complete idiot. Luckily life was there to hand me a few lessons.

I'd always worked at name-badge jobs paying hourly rates, so when I was offered a whole 10,000 pounds a year, I thought it sounded amazing. It came out to around 550 pounds take-home pay a month, my rent was 400 pounds, which left me and my unemployed wife 150 pounds a month for food, transport and bills. The first lesson I learnt was to crunch the numbers on any deal, and not be distracted by a big headline figure.

The project, for Climax Inc ("Hi, I'm at Climax", not the best name), was to port Blizzard's hit game Diablo from the PC to the Playstation 1. I'd spent years obsessively coding in my bedroom, but this was the first time I'd done any professional work, so I was very definitely a junior Junior Programmer. I kept hitting frustrating problems just using the basic tools I needed for development (I'd never even touched a debugger before) and my code was so buggy I could barely get it to run. I was painfully shy, didn't know anyone else in the company, and they all seemed too busy to help. The only person who made time to help me dig myself out of my incompetence was the bloke sitting behind me, Gary Liddon. Over the course of a couple of weeks he was incredibly patient about hand-holding me through the basics of building and debugging. It was only after the team started getting organized that someone introduced Gary as the project lead, in charge of 20 programmers and with a decades-long career in games behind him.

The second lesson I learnt was that I wanted to work with people like Gary, willing to help the whole team, rather than hunting for individual glory. I've since worked with a lot of 'rock star' programmers, and while they always look good to management, they hate sharing information or credit and end up hampering projects no matter how smart they are as individuals. Gary used his massive brain to help make us all more effective instead, and I've always tried to live up to his example.

The code itself was a mess. There were hundreds of pieces of x86 assembler scattered throughout the code base, which was a problem since we were porting to the Playstation's MIPS processor. Usually just a couple of instructions long, and in the middle of functions, these snippets were pretty puzzling. Finally one of the team figured it out; somebody had struggled with C's signed/unsigned casting rules, and so they'd fallen back on the assembler instructions they understood! The whole team had a good laugh at that, and were feeling pretty superior about it all, until Gary quietly pointed out that the programmers responsible were busy swimming in royalties like Scrooge McDuck while we were porting their game for peanuts.

The third lesson I learnt was that you don't need great code to make a great product. I take pride in my work, but there's no shame in doing what it takes to get something shipped. I've seen plenty of projects die a lingering death thanks to creeping elegance!

After 6 months of spiralling into debt I finally managed to get another job, only 2,000 pounds more in salary but in a much cheaper part of the country. Not much of my code made it into the final game, and it was a pretty miserable time of my life to be honest, but sometimes the worst projects are the best teachers.

Boosting MongoDB performance with Unix sockets

Outdoorsocket
Photo by Stitch

As I've been searching for a solution to my big-data analysis problems, I've been very impressed by MongoDB's features, but even more by their astonishing level of support. After mentioning I was having trouble running Mongo in my benchmark, Kristina from 10Gen not only fixed the bug in my code, she then emailed me an optimization (using the built-in _id in my array), and after that 10gen's Mathias Stearn let me know the latest build contained some more optimizations for that path. After burning days dealing with obscure Tokyo problems I know how much time responsive support can save, so it makes me really want to use Mongo for my work.

The only fly in the ointment was the lack of Unix domain socket support. I'm running my analysis jobs on the same machine as the database, and as you can see from my benchmark results, using a file socket rather than TCP on local host speeds up my runs significantly on the other stores that support it. I already added support to Redis, so I decided to dive into the Mongo codebase and see what I could manage.

Here's a patch implementing domain sockets, including a diff and complete copies of the files I changed. Running the same benchmarks gives me a time of 35.8s, vs 43.9s over TCP, and 28.9s with the RAM cache vs 31.1s on TCP. These figures are only representative for my case, large values on a single machine, but generally they demonstrate the overhead of TCP sockets even if you're using localhost. To use it yourself, specify a port number of zero, and put the socket location (eg /tmp/mongo.sock) instead of the host name. I've patched the server, the command-line shell, and the PHP driver to all support file sockets this way.

I don't know what Mongo's policy is on community contributions, I primarily wrote this patch to scratch my own itch, but I hope something like this will make it into the main branch. Writing the code is the easy bit of course, the real challenge is testing it across all the platforms and configurations!

How to speed up key/value database operations using a RAM cache

Ram
Photo by Olduser

In my previous post I gave raw timings for a typical analysis job on top of various key/value stores. In practice I use another trick to speed up these sort of processes; caching values in RAM for the duration of a run and then writing them all out to the store in one go at the end. This helps performance because there's some locality in the rows I'm accessing, so it's worth keeping previously-fetched or written data in memory and reducing the amount of disk IO needed. If this helps you will depend on your database usage patterns, but I've found it invaluable for my analysis of very large data sets.

The way I do this is by creating a PHP associative array mapping keys to values, populating it as I fetch from the store, and delaying writes until a final flushToDisk() call at the end of the script. This is very inelegant, it means you have to watch PHP's memory usage since the default 32MB max is easy to hit, and ensuring that final flush call is made is error-prone. The performance boost is worth it though, here's the figures using the same test as before, but with the cache enabled:

Null: 22.1s

Ram: 23.9s

Redis domain: 27.5s

Memcache: 27.9s

Redis TCP: 29.6s

Tokyo domain: 29.9s

Mongo 31.1s

Tokyo TCP: 33.6s

MySQL: 182.9s

To run these yourself, download the PHP files and add a -r switch to enable the RAM cache, eg

time php fananalyze.php -f data.txt -s mongo -h localhost -p 27017 -r

They're all significantly faster than the original run with no caching, and Redis using domain sockets is approaching the speed with no store at all, suggesting that the store is not the bottle-neck for this test. In practice, most of my runs are with hundreds of thousands of profiles, not 10,000, and the RAM cache becomes even more of a win, though the space used expands too! I've included the code for the cache class below:

<?php

// A key value store interface that caches the read and written values in RAM,
// as a PHP associative array, and flushes them to the supplied disk-based
// store when storeToDisk() is called

require_once('keyvaluestore.php');

class RamCacheStore implements KeyValueStore
{
    public $values;
    public $store;
   
    public function __construct($store)
    {
        $this->store = $store;
        $this->values = array();
    }

    public function connect($hostname, $port)
    {
        $this->store->connect($hostname, $port);
    }
   
    public function get($key)
    {
        if (isset($this->values[$key]))
            return $this->values[$key]['value'];

        $result = $this->store->get($key);
        $this->values[$key] = array(
            'value' => $result,
            'dirty' => false,
        );
       
        return $result;
    }

    public function set($key, &$value)
    {
        $this->values[$key] = array(
            'value' => $value,
            'dirty' => true,
        );
    }
   
    public function storeToDisk()
    {
        foreach ($this->values as $key => &$info)
        {
            if ($info['dirty'])
            {
                $this->store->set($key, $info['value']);
                $info['dirty'] = false;
            }
        }
    }
   
}

?>