How to guess gender from a first name in PHP

AlienrestroomPhoto by Davezilla

If you've got someone's first name, it's possible to make a pretty accurate guess what their gender is. Obviously there's plenty of exceptions, Sean and Francis spring to mind, but for lots of applications you don't need 100% accuracy or coverage. In my case I want a better understanding the demographics of my users, so a figure that's within a few percent is fine.

There's a great Perl module called Text::GenderFromName that implements this idea, with accumulated wisdom dating all the way back to a 1991 awk script! I haven't found anything that fits well into my PHP projects though, so I finally bit the bullet and ported that Perl script to PHP. The result is up at

http://web.mailana.com/labs/genderfromname/

and you can get the source at

http://github.com/petewarden/genderfromname

Thanks to Eamon Daly and Jon Orwant for the original code, and apologies for the mechanical translation of Perl code to PHP. It's now painfully non-idiomatic, but it does work!

For best results you should also install the doublemetaphone PHP module, though it will function without it.

Introducing Fan Page Analytics

Godmap2

Fan Page Analytics is a new project I've just launched to help answer questions about Facebook pages. Here's some examples:

Which parts of the world have the most fans of God? In the US the map pretty clears shows the traditional Bible Belt, but looking worldwide the Phillipines is pretty god-fearing too.

How does ReadWriteWeb's fan base compare to TechCrunch's? From the map, RWW is much more broadly based, whereas TC's readership is heavily concentrated in the traditional US tech centers of California, Washington and Massachusetts. I only see one venture capital fan page in RWW's top 20 most related pages, but I count 8 in TC. On the other hand there's a couple of HR related pages for RWW, and none for TC, which suggests a less geeky audience.

That's all fascinating, but what problem does it solve? Suppose I'm planning the next DEMO conference. Glancing at the related pages shows that Charlene Li and Fred Wilson are people my audience care about, so they should be top of my list to attend and spread the word. ReadWriteWeb and GigaOm fans are more likely to be fans of DEMO than Techcrunch readers, so I might get more bang for my buck buying ad space on those sites. Looking at the locations, CA, WA and MA are way ahead, so I can craft some Facebook ads targeted only at those areas and tied in to some of the other related interests. I can even look at some examples of users in particular locations or with shared interests to understand if they're really my target market. These appear in the right side-bar when you click on an area or a location.

This is an initial release, so expect a few bugs, and it's not yet got complete coverage of fan pages, so apologies if yours isn't there yet. Hopefully you can still have some fun uncovering things like Glenn Beck's fan base in Outer Mongolia.

What can I find out about you if I know your email address?

Phonebook
Photo by HerzogBR

One of the least-understood developments of the last few years is the growth of databases of personal information linked to email addresses. Rapleaf is probably the leader in this field, but even Flickr lets companies search their API for users based on an email address. I wrote a service that queries all the data sources I could find to demonstrate how much is out there:

http://web.mailana.com/labs/findbyemail/

Give it a try for yourself, you might be surprised by how much companies can discover about you once they know your email address! Many services give out at least your full name and a location, which is often enough to get your address and a phone number from a service like whitepages.com.

How to make your intensity maps interactive

Intensitymap

Google’s Intensity Map charts are a really easy and clean way to show heat maps of geographic data. Unfortunately, there’s no way to take the next step in the user experience, and let people mouse over and click on the maps to see additional information about a particular area.

To solve that problem, I’ve written a PHP script that extracts the country and state boundaries from the maps and constructs a Javascript function to return the state code for any point on the maps. You can see an example of this in practice at http://web.mailana.com/labs/mapclicker/ or you can download the complete source, including the boundary extractor.

The example page will show the state or country code for any point you move the mouse over on either of the maps, and if you click, will display an alert showing the name. It assumes your image has the maximum 440×220 dimensions, but you can apply scaling to the coordinates if you are using something smaller.

Create amazing Flash maps with Mercator

I've been using Google's simple map visualizer to picture some of the geographic data I'm gathering, but I've been looking for something more customizable and interactive. After a lot of searching, I finally found Mercator, an open-source flash library for creating maps. I've only been using it for a few hours, but already I'm very excited! It's not only very slick, with built-in animation and a clean look, it's also got a lot of depth, with data and graphics for an astonishing array of countries, states and cities.

The only down-side was getting started. The beginner's guide is aimed at someone with a deeper knowledge of Flash application building than me, and there was no 'hello world' example I could download, though there are several more advanced projects available. So, for anyone else out there who struggles with Flex basics here's a complete project for Flex Builder 3 that just creates and displays a Mercator map. You can see it running at the head of this post, or over at http://web.mailana.com/labs/mapview/

Manfred's done an excellent job on Mercator, it's a poster child for the quality of open-source projects, and I'm looking forward to creating some rich visualizations on top of it.

A shell script for building MongoDB from source

Snailshells
Photo by Christina Matheson

I've been working a lot with MongoDB lately, including some tinkering to add domain socket support. Unfortunately those modifications mean that I can't just use a binary download when I need to install it on a new machine, I have to build from source instead. There's general instructions on building for Fedora 8, but I found that yum had either versions that were too old for some packages like scons, or as with PCRE it ended up complaining about other programs requiring older versions than Mongo asked for.

As a result, I ended up creating a shell script to handle all the building steps required. This is pretty specific to my own environment and very hackish, but if you're running into the same sort of issues trying to use RPMs for all the dependencies, hopefully this will give you some hints on building from source.

Download Makemongo

#!/bin/sh
# Builds MongoDB and its needed dependencies for Fedora 8
# By Pete Warden

# First, make sure we have git
yum install git

# Now grab the Mongo source from github
git clone git://github.com/mongodb/mongo.git

# Get an up-to-date version of Scons, yum installs a 0.97 version that doesn't work!
curl -L "http://downloads.sourceforge.net/project/scons/scons/1.2.0.d20090919/scons-1.2.0.d20090919.tar.gz?use_mirror=cdnetworks-us-1" > scons-1.2.0.tar.gz
tar -xf scons-1.2.0.tar.gz
cd scons-1.2.0.d20090919/
# Scons relies on Python, so make sure that's present
yum install -y python-devel
python setup.py install
cd ..

# Now grab boost sources - I do see some warnings in Mongo's config about this being out of date
yum install -y boost
yum install -y boost-devel

# Spidermonkey is needed for Javascript support, and there's no RPM
curl -O ftp://ftp.mozilla.org/pub/mozilla.org/js/js-1.7.0.tar.gz
tar zxvf js-1.7.0.tar.gz
cd js/src
make -f Makefile.ref
JS_DIST=/usr gmake -f Makefile.ref export
cd ../..

# You need an up-to-date version of PCRE, more recent than the one installed by yum
curl -O "ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.00.zip"
unzip pcre-8.00.zip
cd pcre-8.00
./configure --prefix=/usr
make
make install
cd ../mongo
scons all

# On my AMIs there's old versions of PCRE libraries left lying around, so move them
mv /usr/lib64/libpcrecpp.so.0 /usr/lib64/libpcrecpp.so.0.original
ln -s /usr/lib/libpcrecpp.so.0 /usr/lib64/libpcrecpp.so.0

# Create the database location and run Mongo
mkdir /mnt/data
mkdir /mnt/data/db
./mongod --dbpath=/mnt/data/db --bind_ip=/tmp/mongo.sock --port=0 &

# Now build the PHP drivers and add them to the php.ini
cd ..
git clone git://github.com/mongodb/mongo-php-driver.git
cd mongo-php-driver/
phpize
./configure
make
make install
echo "" >> /etc/php.ini
echo "extension=mongo.so" >> /etc/php.ini
/etc/init.d/httpd restart

St Mary’s Glacier

Glacierview

We had my parents to stay over Thanksgiving. They'd never been to Colorado before, so there was plenty to do around Boulder, but I also wanted a long weekend somewhere in the mountains. After digging around on the great Vacation Rentals by Owner site, I found a little two bedroom condo near Idaho Springs, about an hour from Denver.

Eastgermanflats
We drove up the I70, then about 9 miles up a twisty mountain road to get to the little community of St Mary's Glacier. I have to admit my heart sank when I saw the building we were staying in, it looked like an apartment block from East Germany in the 1970's. That impression wasn't helped by the two abandoned cars across the street, next to a boarded-up cabin with broken windows, along with multiple for-sale signs in our building. Climbing upstairs to the condo, the door across the hall had been forced open, splintering the frame. We were pretty sure it was being used by squatters, since the door was permanently open, but towards the end of our stay we spoke to the building manager and they'd apparently just lost their keys!

Glacierkitchen

I felt a flood of relief once we stepped inside, the place was beautiful. You can see a full tour on Randy's website, but he's done a fantastic job decorating. There was a dining nook with windows on three sides, giving a view of the lake and mountains, and the living room was great in the evening thanks to the roaring fireplace.

Glacierhike
The real point of the trip was outside the condo. Even though there hadn't been much snow, we were up at 10,400 feet so there was some pretty wild terrain. We had new snow-shoes to try out, so we trekked up the glacier itself, to 11,700 feet. We're fairly certain that Thor is the only Chihuahua to make it up to the top of the glacier, and we were all stunned by the views from the plateau. The only off-note was a couple of jack-asses on quad bikes who decided to ride them up the glacier despite all the sign-posts, only to be driven off by irate hikers.

If you're looking for an authentic slice of old Colorado, St Mary's Glacier is pretty fascinating, with no pretensions, lots of history and more ghost towns nearby than I could count. It's somewhere with character – if you're looking for an adventurous mountain vacation off the beaten track I'd highly recommend it. It's definitely not Aspen, but that's it's charm!

The worst interview question ever

Interrogate
Photo by B Rosen

This one article sums up everything that's wrong with engineering interviews. The author likes to ask potential hires to explain whether you can call delete this within a C++ member function. What's so wrong with that you ask, it seems like fairly standard practice?

I've conducted a lot of interviews, and been on the other side of a few, and from my own experience and the research I know poorly structured interviews like this are a terrible mechanism for predicting how good people will be at performing a job. Just think about this interview question for a second; how much time in your coding job do you typically spend worrying about this sort of C++ trivia versus debugging, trying to understand legacy code, talking to other engineers, figuring out requirements, explaining your project to managers, etc, etc? The right answer for me is "I've no clue, looks like a terrible idea generally, but I'd google it if needed."

The reason these sort of questions keep coming up is the same reason the drunk kept looking under the lamp post for his keys, they're within the comfort zone of technical specialists, even though the answers aren't useful. For a long time, I did the same, even though I was frustrated with the results. Finally I received some official training at Apple, and what they taught me opened my eyes!

You can find a more detailed description here, but the most important part is "Ask about past behavior". It's the best predictor of future performance, and if you ask in the right way it's also very hard to for the candidate to exaggerate or lie. You can do something general like "Tell me about your worst project", but something more specific is even better, I'd often use "Tell me about a time you hit a graphics driver bug". The candidates will start off with a superficial overview, but if you follow up with more detailed questions (eg "So, did you handle talking to Nvidia?") you'll start to build a real picture of their role and behavior, and it's almost impossible to fake that level of detail.

If C++ experience is crucial, then a much better question would be "Tell me about a time you had to debug a template issue" or "Tell me about a project you implemented using reference-counted objects". Anybody who's read enough C++ books can answer the original question, but these versions will tell you who's actually spent time in the trenches.

Easier command-line arguments in PHP

Arguments
Photo by Between a Rock

One of my pet peeves is that no language I've used handles command-line arguments well. Everyone falls back to C's original argv indexed array of space-separated strings, even though there's decades-old conventions about the syntax of named arguments. There's some strong third-party libraries that make it easier, and the arcane getopt(), but nothing that's emerged as a standard. Since I'm doing a lot more PHP shell scripts these days, I decided to write a PHP CLI parser that met my requirements:

Specify the arguments once. Duplication of information is ugly and error-prone, so I wanted to describe the arguments in just one place.

Automated help. The usage description should be generated from the same specification that the parser uses so it stays up-to-date.

Syntax checking. I want to be able to say which arguments are required, optional or switches, and have the parser enforce that, and catch any unexpected arguments too.

Unnamed arguments. Commands like cat take a list of files with no argument name, I wanted those to be easily accessible.

Optional defaults. It makes life a lot easier if you don't have to check to see if an argument was specified in the main script, so I wanted to ensure you could set defaults for missing optional arguments.

Human-readable specification. getopt() is close to what I need, but as well as not generating a usage description, the syntax for describing the long and short arguments is a horrible mess of a string. I want the argument specification to make sense to anyone reading the code.

Here's the result, cliargs.php. To use it specify your arguments in the form:

array(
 '<
long name of argument>' => array(
     'short' => '<
single letter version of argument>',
     'type' => <
'switch' | 'optional' | 'required'>,
     'description' => '<
help text for the argument>',
     'default' => '<
value if this is an optional argument and it isn't specified>',
 ),
 …
 );

There's an example script in the package, and documentation in the readme.txt. The code is freely reusable with no restrictions; I'm just dreaming of a world where no one ever writes another CLI argument parser ever again.

Three lessons I learnt from porting Diablo

Diablo
Photo by Vizzzual

It was 1997, I'd just finished college, was really excited about getting my first job in the game industry, and I was a complete idiot. Luckily life was there to hand me a few lessons.

I'd always worked at name-badge jobs paying hourly rates, so when I was offered a whole 10,000 pounds a year, I thought it sounded amazing. It came out to around 550 pounds take-home pay a month, my rent was 400 pounds, which left me and my unemployed wife 150 pounds a month for food, transport and bills. The first lesson I learnt was to crunch the numbers on any deal, and not be distracted by a big headline figure.

The project, for Climax Inc ("Hi, I'm at Climax", not the best name), was to port Blizzard's hit game Diablo from the PC to the Playstation 1. I'd spent years obsessively coding in my bedroom, but this was the first time I'd done any professional work, so I was very definitely a junior Junior Programmer. I kept hitting frustrating problems just using the basic tools I needed for development (I'd never even touched a debugger before) and my code was so buggy I could barely get it to run. I was painfully shy, didn't know anyone else in the company, and they all seemed too busy to help. The only person who made time to help me dig myself out of my incompetence was the bloke sitting behind me, Gary Liddon. Over the course of a couple of weeks he was incredibly patient about hand-holding me through the basics of building and debugging. It was only after the team started getting organized that someone introduced Gary as the project lead, in charge of 20 programmers and with a decades-long career in games behind him.

The second lesson I learnt was that I wanted to work with people like Gary, willing to help the whole team, rather than hunting for individual glory. I've since worked with a lot of 'rock star' programmers, and while they always look good to management, they hate sharing information or credit and end up hampering projects no matter how smart they are as individuals. Gary used his massive brain to help make us all more effective instead, and I've always tried to live up to his example.

The code itself was a mess. There were hundreds of pieces of x86 assembler scattered throughout the code base, which was a problem since we were porting to the Playstation's MIPS processor. Usually just a couple of instructions long, and in the middle of functions, these snippets were pretty puzzling. Finally one of the team figured it out; somebody had struggled with C's signed/unsigned casting rules, and so they'd fallen back on the assembler instructions they understood! The whole team had a good laugh at that, and were feeling pretty superior about it all, until Gary quietly pointed out that the programmers responsible were busy swimming in royalties like Scrooge McDuck while we were porting their game for peanuts.

The third lesson I learnt was that you don't need great code to make a great product. I take pride in my work, but there's no shame in doing what it takes to get something shipped. I've seen plenty of projects die a lingering death thanks to creeping elegance!

After 6 months of spiralling into debt I finally managed to get another job, only 2,000 pounds more in salary but in a much cheaper part of the country. Not much of my code made it into the final game, and it was a pretty miserable time of my life to be honest, but sometimes the worst projects are the best teachers.