Five short links

FiveflowerPhoto by Jannis Andrija Schnitzer

On being wrong in Paris – A great general meditation on the slippery nature of facts, but the specific example is very resonant. We tend to think of places having clear boundaries, but depending on who I was talking to I'd describe my old house as either in "Los Angeles", "Near Thousand Oaks" or "Simi Valley". Technically I wasn't in LA, but the psychological boundaries aren't that neat.

The devil in the daguerrotype details – The detail you can see on this old photograph is amazing, and I love how they delve into the capture method. I was disappointed there was nothing on the role of lenses as a limiting factor on resolution though, I'd love to know more about that.

Katta – A truly distributed version of Lucene, designed for very large data sets. I haven't used it myself yet, but I'm now very curious.

Hbase vs Cassandra – An old but fair comparison of the two technologies. This mirrored the evaluation I went through when picking the backend database for Jetpac, and I ended up in the same place.

It's cheaper to keep 'em – Your strategy is sometimes pre-determined by what numbers you're paying attention to. If you start off with the assumption your job is to get new users as cheaply and fast as possible, you'll never realize how important retaining existing customers can be.

Sad Alliance

A friend inspired me to dig around in my digital attic, and resurrect a video of one of my live VJ performances. It's playing off the music of Richie Hawtins and Pete Namlook, and created on the fly using my home-brewed software, a MIDI controller, and a live camera feedback loop. There's no clips or pre-recorded footage, everything's my own response to the audio as it's happening.

Lessons from a Cassandra disaster

Disaster
Photo by Earthworm

Yesterday one of my nightmares came true; our backend went down for seven hours! I'd received an email from Amazon warning me that one of the instances in our Cassandra cluster was having reliability issues and would be shut down soon, so I had to replace it with a new node. I'm pretty new to Cassandra and I'd never done that in production, so I was nervous. Rightfully so at it turned out.

It began simply enough, I created a new server using a Datastax AMI, gave it the cluster name, pointed it at one of the original nodes as a seed, and set 'bootstrapping' to true. It seemed to do the right thing, connecting to the cluster, picking a new token and streaming down data from the existing servers. After about an hour it appeared to complete, but the state shown with nodetool ring was still Joining, so it never became part of the cluster. After researching this on the web without any clear results, I popped over to the #cassandra IRC channel and asked for advice. I was running 0.8.1 on the original nodes and 0.8.8 on the new one, since that was the only Datastax AMI available, so the only suggestion I had was to upgrade all the nodes to a recent version and try again.

This is where things started to get tough. There's no obvious way to upgrade a Datastax image and IRC gave me no suggestions, so I decided to try to figure how to do it myself from the official binary releases. I took 0.8.7 and looked at where the equivalent files to the ones in the archive lived on disk. Some of them were in /usr/share/cassandra, others in /usr/bin, so I made backup copies of those directories on the machine I was upgrading. I then copied over the new files, and tried restarting Cassandra. I hit an error, and then I made the fatal mistake of trying to restore the original /usr/bin by first moving out the updated one, thus bricking that server.

Up until now the Cassandra cluster had still been functional, but the loss of the node I had the code contact initially meant we lost access to the data. Luckily I'd set things up so that the frontend was mostly independent of the backend data store, so we were still able to accept new users, but we couldn't process them or show their profiles. I considered rejigging the code so that we could limp along with two of the three nodes working, but my top priority was safeguarding the data, so I decided to focus on getting the cluster back up as quickly as I could.

I girded my loins and took another try at upgrading a second node to 0.8.7, since that was the most likely cause of the failure-to-join issue according to IRC. I was painstaking about how I did it this time though, and after a little trial and error, it worked. Here's my steps:

There were a couple of gotchas. You shouldn't copy the bin/cassandra.in.sh file from the distribution, that contains settings like the location of the library files that you want to retain from the Datastax AMI, and if you see this error:

ERROR 22:15:44,518 Exception encountered during startup.

java.lang.NullPointerException

at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:606)

it means you've forgotten to run Cassandra as an su user!

Finally I was able to upgrade both remaining nodes to 0.8.7, and retry adding a new node. Maddeningly it still made it all the way through the streaming and indexing, only to pause on Joining forever! I turned back to IRC and explained what I'd been doing, and asked for suggestions. Nobody was quite sure what was going on, but a couple of people suggested turning off bootstrapping and retrying. To my great relief, it worked! It didn't even have to restream, the new node slotted nicely into the cluster within a couple of minutes. Things were finally up and running again, but the downtime definitely gave me a few grey hairs. Here's what I took away from the experience:

Practice makes perfect. I should have set up a dummy cluster and tried a dry run of the upgrade there. It's cheap and easy to fire up extra machines for a few hours, and would have saved a lot of pain.

Paranoia pays. I was thankful I'd been conservative in my data architecture. I'd specified three-way replication, so that even if I'd bricked the second machine, no data would have been lost. I also kept all the non-recoverable data either on a separate PostGres machine, or in a Cassandra table that was backed up nightly. The frontend was still able to limp along with reduced functionality when the backend data store was down. There's still lots of potential showstoppers of course, but the defence-in-depth approach worked during this crisis.

Communicate clearly. I was thankful that I'd asked around the team before making the upgrade, since I knew there was a chance of downtime whenever you had to upgrade a database server. We had no demos to give that afternoon, so the consequences were a lot less damaging than they could have been.

The Cassandra community rocks. I'm very grateful for all the help the folks on the #cassandra IRC channel gave me. I chose it for the backend because I knew there was an active community of developers who I could turn to when things went wrong, even when the documentation was sparse. There's no such thing as a mature distributed database, so having experienced gurus to turn to is essential, and Cassandra has a great bunch of folks willing to help.

How to brick your Ubuntu EC2 server

Bricks
Photo by Mutasim Billah

sudo mv /usr/bin /usr/bin.latest - Don't do this!

What on earth possessed me to run that command? I had just attempted to upgrade to Cassandra 0.8.7 on a DataStax AMI which started as 0.8.1. That involved manually copying files, so trying to be careful I made a backup of any directories I was touching, including /usr/bin. The upgrade didn't work, so I decided to roll back by swapping out the updated and backed-up directories, and the command above was the first stage.

As far as I can tell, there's no way to do anything useful with the machine after that. Ubuntu requires sudo before you can perform any system related, and that command's broken by that folder change, it can't find /usr/bin/python. I had the default Ubuntu setup where you can't run su or ssh in as root, and running /usr/bin.latest/sudo gave a cryptic 'sudo: must be setuid root' error, possibly because of the dot in the path name? Even worse, the Cassandra data files required permissions to access, so I couldn't copy them off.  

It turned into an interesting puzzle for the Unix folks in my twitter stream, thanks for all the ideas. I'm just glad I have three-way replication, and in the worst case nightly backups of any non-reproducible data. The pain of losing hours that I could be spending on features makes this a memorable lesson though. Now I just have to persuade my replacement Cassandra node to get beyond "Joining" after I add it. At least this is giving me plenty to blog about!