Boosting Redis performance with Unix sockets

Electricsocket
Photo by SomeDriftwood

I've been searching for a way to speed up my analysis runs, and Redis's philosophy of keeping everything in RAM looks very promising for my uses. As I mentioned previously, I hit a lot of speed-bumps trying to get it working with PHP, but I finally got up and running and ran some tests. The results were good, faster than using the purely disk-based Tokyo Tyrant setup I have been relying on.

The only niggling issue was that I knew from Tokyo that Unix file (aka domain) sockets have a lot less overhead and help performance compared to TCP sockets, even using localhost. Since the interface is almost identical, I decided to dive in and spend a couple of hours patching my copy of Redis 1.02 to support file sockets. The files I've changed are available at http://web.mailana.com/labs/redis_diff.zip.

My initial results using the redis-benchmark app that ships with the code show a noticeable performance boost across the board, sometimes up to 2x. Since this is an artificial benchmark it's unlikely to be quite this dramatic in real-world situations, but with Tokyo 30%-50% increases in speed were common.

I hope these changes will get merged into newer versions of Redis, it's a comparatively small change to get a big performance boost in situations when the client and server are on the same machine.

Using Redis on PHP

Kitcarson
Photo by Bowbrick

There's a saying that pioneers tend to come back stuck full of arrows. After a couple of days trying to get Redis working with PHP, I know that feeling!

I'm successfully using Tokyo Tyrant/Cabinet for my key/value store, but I do find for a lot of my uses disk access is a major bottleneck on performance. I do lots of application-level RAM caching to work around this, but Redis's philosophy of keeping everything in main memory looked like a much lower-maintenance solution.

Getting started is simple, on OS X I was able to simply download the stable 1.02 source, make and then run the default redis-server executable. Its interface is through a TCP socket, so I then grabbed the native PHP module from the project's front page and started running some tests.

The first problem I hit was that the PHP interface silently failed whenever a value longer than 1024 characters was set. Looking into the source of the module, it was using fixed-length C arrays (they were even local stack variables with pointers returned from leaf functions!) and failing to check if the passed-in arguments were longer. This took me an hour or two to figure out in unfamiliar code, so I was a bit annoyed that there weren't more 'danger! untested!' signs around the module, though the README did state it was experimental.

Happily a couple of other developers had already run into this problem, and once I brought up the issue on the mailing list, Nicolas Favre-Félix and Nasreddine Bouafif made their fork of PHPRedis available with bug fixes for this and a lot of other issues.

The next day I downloaded and ran the updated version. This time I was able to get a lot further, but on my real data runs I was seeing intermittent empty string being returned for keys which should have had values. This was tough to track down, and even when I uncovered the underlying cause it didn't make any sense. It happened seemingly at random, and I wasn't able to reproduce it in a simple test case. An email to the list didn't get any response, so the following day I heavily instrumented the PHP interface module to understand what was going wrong.

Finally I spotted a pattern. The command before the one that returned an empty string was always

SET 100
<1000 character-long value>

It turned out that some digit-counting code thought the number 1000 only had three digits, and truncated it to 100. The other 900 characters in the value remained in the buffer and were misinterpreted as a second command. That meant the real next command received a -ERR result. I coded up a fix and submitted a patch, and now it seems to be working at last.

Hitting this many problems so quickly has certainly made me hesitate to move forward using Redis in PHP. It's definitely not a well-trodden path, and while the list was able bring me a solution to my first problem, I was left to debug the second one on my own, and a question about Unix domain sockets versus TCP was left unanswered as well. If you are looking at Redis yourself in PHP, make sure you're mentally prepared for something pretty experimental, and don't count on much hand-holding from the developer community.

Of course, the same goes for almost any key/value store right now, it's the wild west out there compared to the stability of the SQL world. My next stop will be MongoDB to see if having a well-supported company behind the product improves the experience.

Nonsensical Infographics

Nonsensicalinfographic1
Nonsensical Infographic 1 by Chad Hagan

20×200 is an awesome concept, an online gallery selling limited editions of works by new artists starting at $20 each. While I strive to follow Tufte and make all my visualizations tell a clear story, I'm aware they sometimes turn out more pretty than functional, so I'm in love with Chad Hagan's 'Nonsensical Infographic' series on there. Now I just need to convert them into flash animations to make them even more beautiful and confusing.

Nonsensicalinfographic2 
Nonsensical Infographic 2 by Chad Hagan

Amazing scams are earning $1000+ CPMs for priceline.com

Scam
Photo by Toasty Ken

I'm a cheerful pessimist about human nature, I generally expect the worst but don't let it get me down. This report from the US Senate is beyond the pale though, it really makes me mad. It details how companies like Affinion, Webloyalty and Vertrue pay immense amounts of money (CPM rates of up to $2650!) to well-known firms like priceline.com and 1-800-flowers to get links inserted into their checkout process. These links look like discount offers, but clicking on them passes your credit card information to the scammers, and lets them set up a recurring monthly payment on your account without asking for permission, hoping you won't notice, at least for a while.

How much money is there to be made here? The report estimates just those three firms have earned over $1.4 billion so far! And how much of a scam is it? Vertrue estimates 98% of their call volume is cancellation requests, and Webloyalty admit that 90% of their members have no idea they're enrolled.

As an entrepreneur I know how much over-regulation can hurt startups and economic growth, but scams like these drive people down that path. As an industry we need to have enough sense to avoid crazily short-sighted schemes like these if we want to have a long-term future. All three companies are owned by big-name private equity firms, and big-name websites are hosting their ads. Everyone involved should be ashamed of themselves, and nobody else should touch them with a barge pole. Sadly, $1.4 billion is very persuasive…

Get new users for $2 each with Facebook Ads

Facebook
Photo by Intermayer

This is one of those posts I hesitate about writing, because it's tempting to hoard an advantage like this, but sharing always seems to benefit me more in the long run. I'm able to get new users for my (in-testing, very unfinished) Facebook app for as little as $2 each using Facebook Ads. Here's my most successful ad so far:

Facebookad 

It's short and simple, and around 0.07% of people who see it clicking. Naively I started off expecting click rates of around 1%, but since then I've talked to people with more experience in the ad world, and outside of search ads mine is actually pretty respectable. It's also cheap – I set my cost-per-click bid to 50 cents, but actually ended up paying 37 cents each.

This is only the start of my funnel, the landing page is the install dialog for my Facebook app. The only thing I have control over there is the app description and logo, and currently only 30% of the visitors click accept to install it. That means my cost per installation is around $1 per user.

After they've made it through that screen, they're finally on a page I control. Here I ask them to give me their email address, accept extended permissions and authorize me to access their Twitter account. I lose between 50% and 70% of users there, bumping my final cost per true user to between $2 and $3 each.

So what are the secrets to achieving similar results?

Land in Facebook. I have a massive advantage in that I've moved my service over to run as a Facebook app. It's low friction for users when they're staying within the same site, I doubt you could achieve the same CPC for external pages. I'm now a hostage to Facebook's whims of course, but for me the gain in user trust far outweighs the risks.

Start small. I'm still paying only $15 a day spread over several campaigns, that gives me enough data to tell what's working and refine my ads and landing pages before I ramp up to collect larger numbers of users. It's also a great way of flushing out bugs and scaling issues while annoying a relatively small number of users.

Test, test, test! I'm terrible at writing ad copy, really, really bad, and my first versions had awful click-through rates around 0.01%. I was able to use Facebook's statistics panel to tell which ads were the least-worst, and spot the patterns. In my case the shorter ones worked much better, as did the ones that focused on a single feature, which is how I ended up with the one above. I'm also constantly trying new versions of the landing page and sign-up flow to measure how I can improve the rest of the funnel.

Foreigners are cheap. There must be a lot less competition for UK and commonwealth Facebook views, because I'm able to get CPCs of 37 cents if I specify the English-speaking non-US in my targeting, versus around 60 cents each in the US. If you're in the testing phase, I would expect you could get representative data for almost half the price if you use non-Americans as guinea pigs.

I still think there's a lot of room for improvement in my funnel, so I'm hopeful I can keep driving the cost down, even if the ad market overall becomes more expensive with competition. I'm also not doing much with the targeting possibilities beyond picking countries, I think localized ads could get a strong response, and I need to run a census of my users to understand what demographics the service appeals to most and target them.

Eat mistakes, not jobs

Pacmantable
Photo by Garretc

Andy Kessler's talk yesterday got me thinking hard about why I found his argument so unconvincing. He focused on how innovation will destroy jobs, the way container ships put all the stevedores out of work. I think he's missing a completely different outcome of innovation, and one that excites me a lot more.

Stevedores were performing a process that achieved the results we were after, they weren't dropping half the boxes into the ocean as they unloaded, so containerization just made the process more efficient.

Where Andy went off the rails is when he applied that model to worlds like education. We are really, really bad at teaching our kids, enormous numbers of them don't even make it through high school. It's as if we're losing half the cargo every time we unload a ship. Innovation in education gives us the chance to achieve better results with the existing resources, giving our teachers tools so they leave fewer kids behind. It's about effectiveness not efficiency, because we're falling so far short of our goals right now.

Would we expect a school district that increased its students' overall GPA to then fire some teachers to save money and return to the old GPA, since we lived with that before? Of course not, we'd celebrate the achievement and try to replicate it elsewhere.

What really excites me about technology innovation is that we can help people do important things that weren't possible before. MIT's opencourseware is an awesome example, the lectures and materials work as an accelerator and multiplier to traditional learning methods, helping students all over the world get better results. There's no wave of professors being fired, if anything it's taking pressure off them to do mundane and routine introductory lectures and focus on the value-added personal teaching instead.

I love increasing productivity because it lets people do tricky jobs much better, which I find a lot more satisfying than automating people out of a job. I'm much happier preventing screw-ups than eating people!

Andy Kessler’s keynote at Defrag stunk

Andy Kessler

Andy Kessler just gave the opening keynote speech at Defrag '09, and I really hated it. The title was Be Solyent, Eat People, and since I'm fascinated by the topic of productivity and job replacement I was looking forward to a thoughtful analysis of a complex topic. Instead it felt like a rant by an undergraduate who'd just read Atlas Shrugged for the first time. He laid out a taxonomy of 'unproductive' jobs, which he generally classified as servers as opposed to creators, and then split those servers into 'sloppers', 'sponges', 'slimers' and 'thieves'.

What gobsmacked me was his seeming contention that basically anyone who wasn't a programmer was a parasite. He mentioned a lot of jobs that should be largely automated, from the uncontentious idea of stevedores being replaced by container ships, to the eyebrow-raising example of librarians and finally to the gob-smacking idea that teachers are on the way out!

He seemed to be taking an uncontroversial idea, that there are buggy-whip making jobs that will be replaced by new processes, and taking it to ridiculous and offensive extremes.He used doctors as an example of a 'sponge' profession where artificial barriers to entry kept the incumbents charging high fees and gouging their customers. I'm extremely sympathetic to Adam Smith's quote 'People of the same trade seldom meet together, even for merriment and
diversion, but the conversation ends in a conspiracy against the
public, or in some contrivance to raise prices
', but we tried unregulated doctors for most of the nineteenth century here in the US, and it didn't work so well.

All of Andy's ideas are controversial extrapolations of accepted ideas, but he gave no evidence that any of his assertions actually hold. All it did was annoy me without offering any enlightenment, I'd love to engage with his ideas but there was nothing to hang a debate on, just pure opinion.

Never trust a hippy

Neilyoungones

This is a tricky post to write, because some of my best friends are hippies, I've been accused of being a hippy myself and I live in Boulder, but after reading this article about a first-time entrepreneur's messy breakup with a business partner I couldn't resist.

When it comes to business, pay no attention to what a potential partner says. Judge them on what they do. This is especially important if they're charismatic and overtly spiritual because what they say will be both flattering and very appealing, you'll be tempted to bend over backwards for them. I'm speaking from painful personal experience; my two worst business outcomes were situations where I really liked a partner and stopped thinking critically about what they were offering.

I followed a charismatic hippy manager into his new startup for no equity and worked like a dog for a year. He replaced my friends (he'd needed all our resumes to get the initial contract) with cheap college interns, compressed the schedule and played a lot of other nasty tricks until I finally snapped when a colleague was reprimanded for being late on a Sunday. I'd spent many evenings with the guy and his wife and kids before 'our' startup launched, I really liked him, and he'd painted a beautiful vision of a family-friendly workplace with a great culture. My mistake was that I'd failed to push for any tangible evidence he was serious about his promises. Trust but verify.

The sad thing is, I don't think he was faking the beliefs that he kept talking about, but he was able to use them to convince himself that they justified whatever the most convenient thing for himself was. During the nightmare he often invoked providing for his family as a reason to cut salaries and hoard the benefits of success, which sounds great until you saw it meant a second home for him while employees struggled to afford healthcare for their kids.

Since then I've been much more comfortable with 'coin-operated machines', as a former partner described himself. I find someone who's up-front and honest about their motivations is a lot easier to deal with than anyone who claims they're acting in your best interests.

On Hacker News, a commenter pointed out that Steve Jobs is a hippy, which is true, but I don't think it's possible to find someone who's more blunt and straightforward in his reactions than Steve! All I want is honesty and trust, and I find that's a lot easier to achieve with someone who's unafraid to admit selfish behavior than anyone who's worried about preserving a virtuous self-image.

This is one of the hardest posts I've had to write, I'm admitted a strong prejudice based on a small sample size, and I got a lot of flak when I posted my original comment on HN. In the spirit of openness I'm trying to be honest about what my biases are and how I got to them, even if they aren't particularly flattering. I look forward to the comments!

Getting Tokyo Tyrant to work with files larger than 2GB

Godzillavskitten
Photo by Gen Kanai

I use Tokyo Tyrant/Cabinet as the key-value database for Mailana, and after some initial hiccups I've been very happy with its performance. Last night though it stopped working in the middle of preparing several hundred nightly emails, and I wanted to document the problem and the fix to help anyone else who hits this.

After a bit of investigation, I noticed that the Tyrant server kept dieing with "File size limit exceeded". My casket.tch hash database file had grown to 2GB, and running on a 32 bit EC2 server Tokyo couldn't cope with anything larger. There's a standard called Large File Support on Linux that allows you to access >2GB files, but it requires a few things to work:

– A modern version of Linux. I'm on 2.6, so it has support for LFS built in.

– A modern file system that supports large files. I'm on XFS, so that was also ok.

– You need to recompile your program to use the 64 bit versions of file operations. Happily Tokyo was using the correct off_t type for file offsets, rather than int, so I was able to add the -D_FILE_OFFSET_BITS=64 compile flag to the configure script in both Cabinet and Tyrant, rebuilt them both and they then ran with 64 bit file offsets on a 32 bit system.

There was one other quirk I discovered. By default Tokyo only uses a 32 bit index for the hash database, so you also need to pass in the l option at runtime to cope with the larger files, eg:

/usr/local/bin/ttserver -host /sqlvol/tokyo.sock -port 0 -le /sqlvol/casket.tch#opts=l

After doing those changes, I was able to restart my server and run the daily email updates again. The meta-data for my database seemed to have been corrupted by the issue, but all my data integrity checks passed, so I patched around the problem. Specifically in tchdb.c:tchdbopenimpl() the file size returned from fstat() didn't match the one stored in the meta-data header, so I skipped the check:

sbuf.st_size < hdb->fsiz

Plug and Play Tech Center spam

I don't usually post spam, but for anyone out there who gets an email like this and googles it, no, I don't think it's that dream investor you've been waiting for. The fact they can't even figure out my first name is a strong sign, and I'm not the only one getting these.

From: Nickolas Turner <nturner@plugandplaytechcenter.com>

Subject: Funding Opportunity through Plug and Play Tech Center

Dear Mailana,

Are you looking for funding? Please contact Alireza@plugandplaytechcenter.com
to get in touch with our seed and early stage venture arm, as well as our
partners.

Best of luck in your ventures.

Regards,

Nick Turner

Business Relationship Associate

Plug and Play Tech Center

(650) 207-7001