I still remember my excitement when I discovered Google after years of struggling with awful search engines like Altavista, but every now and again it really doesn't find what I'm looking for.
I was just going to bed on Tuesday night when I remembered I had to start a job processing 500 GB of data, or it would never be done in time for a deadline. This process (merging adjacent lines of data into a single record) was painfully slow in the scripting languages I tried, so I'd written a tool in plain C to handle it. Unfortunately I'd never tried it on this size of data, and I quickly discovered an O(n^2) performance bug that ground progress to a halt. To fix it, I needed a hashmap of strings to speed up a lookup, so I googled 'c hashmap' to grab an implementation. I was surprised at the sparseness of the results, the top hit appeared to be a learning project by Eliot Back.
Before I go any further, if you need a good plain C hashmap that's been battle-tested and generally rocks, use libjudy. Don't do what I did, trying to build your own is a silly use of anyone's time! My only excuse is that I thought it would be quicker to grab something simpler than libjudy, and I'd had a martini…
I stayed up until 2am trying to get the hash map integrated, discovering some nasty performance bugs in the implementation as I did. For instance, the original code actually tried to completely fill the hash map before it reallocated, which means for a large map it often searches linearly through most of the entries if the key isn't present, since it only stopped when it found a gap. I also removed the thread primitives, and converted it over to use strings as keys, with a CRC32 hashing function.
I don't make any claims for the strength of the resulting code, but at least this version has a unit test and I've used it in anger. Thanks to Eliot for the original, here's my updated source:
Hopefully this will help out any other late-night coders like me searching for 'C hashmap'!