Five short links

Five
Photo by Alan Levine

You can't sacrifice partition tolerance – A convincingly (and amusingly) argued case that you can never trade off the P in the CAP theorem.

Topic discovery with Apache Pig and Mallet – This sort of thing used to be magic, now you can assemble it from off-the-shelf components.

The insanely confusing path to legal immigration – I've almost made it through my immigration story, knock on wood I'll be taking my citizenship oath in May after twelve years, but it's been tough to explain to my American friends quite how convoluted the process is. This chart will help!

dancer.js – The web continues to devour the software world. Javascript can now handle both the fast rendering and the audio analysis you need for this music-responsive visualization.

Click dataset – Over 50 billion real-world HTTP requests. I'm certain there are identifiable elements in this data, but I think Arvind's right that researchers have proved this so convincingly that they won't bother to highlight them, and malicious users will never talk about it, so for some approximation of matters, it doesn't matter.

The Data Science Toolkit is now on Vagrant!

Vagrant
Picture by Jacob Haas

I have fallen in love with Vagrant over the last year, it turns an entire logical computer as a single unit of software. In simple terms, you can easily set up, run, and maintain a virtual machine image with all the frameworks and data dependencies pre-installed. You can wipe it, copy it to a different system, branch it to run experimental changes, keep multiple versions around, easily share it with other people, and quickly deploy multiple copies when you need to scale up. It's as revolutionary as the introduction of distributed source control systems, you're suddenly free to innovate because mistakes can be painlessly rolled back, and you can collaborate other people without worrying that anything will be overwritten.

Before I discovered Vagrant, I'd attempted to do something similar with my Data Science Toolkit package, distributing a VMware image of a full linux system with all the software and data it required pre-installed. It was a large download, and a lot of people used it, but the setup took more work than I liked. Vagrant solved a lot of the usability problems around downloading VMs, so I've been eager to create a compatible version of the DSTK image. I finally had a chance to get that working over the weekend, so you can create your own local geocoding server just by running:

vagrant box add dstk http://static.datasciencetoolkit.org/dstk_0.41.box

vagrant init

The box itself is almost 5GB with all the address data, so the download may take a while. Once it's done go to http://localhost:8080 and you'll see the web interface to the geocoding and unstructured data parsing functions.

I've updated the US address data using the most recent Census data from 2012, rebuilt the system around Ubuntu 12.04, and incorporated a lot of virtual memory setting changes that improve the stability of the system when it's dealing with large volumes of API calls. I've released an EC2 AMI with all these changes too, and the full instructions for setting up your own server are at http://www.datasciencetoolkit.org/developerdocs#amazon.

 

Pick the mountain, not the route

 Mountain

Photo by Mike P

For most of my life I've managed to avoid being a boss. Since I helped start Jetpac the responsibility has crept up on me, and not coincidentally so have a few grey hairs. I'm still loving my job, but managing is tough, but not always in the ways I expected.

One of my biggest surprises is how bad I was at leading a team of engineers. I'd spent a long time as a senior guy on various decent-sized teams, taking a lot of initiative and making a lot of decisions, so I thought leading would be a big but incremental step. Instead, I've actually had to unlearn a lot of what I'd picked up over the last 15 years.

In particular, I found that my enjoyment of the debate about ways to implement features went from being valuable to toxic. As a team contributor, I was used to chiming in while we all hashed out the right approach, matching wits and learning in a back-and-forth debate. For the first few months here I did the same thing, and was mystified that something just didn't seem right. The discussions seemed more stilted, and it never felt like everyone had truly bought in to the conclusions. People weren't as enthusiastic as I'd expect, and problems that we should have caught in the planning stages only became apparent much later.

It took me a while, but I finally realized I was in a different position. When I gave my opinion it carried more weight, and if I jumped in on every interesting detail I'd end up cutting the discussions short. That meant I never benefitted from the experience of the super-smart engineers I'm lucky enough to work with. I realized I have a different role, and I have to have a much lighter touch.

Instead of getting deeply involved in the implementation approaches, I've found it's worked much better to focus on the end-user goals of what we're trying to do, and communicating those to the engineering team. An important part of that is asking questions about how they think different approaches will meet particular goals. "Do you think this will get more people exploring this feature?". "Will that get more people entering recommendations?". The key difference from my previous approach is that I give them ownership of the way they reach those goals. As long as they're able to meet them, I don't care how they get there! The team take pride in their work, so I've never had to worry about code quality, and the end-user results have been amazing. We've ended up with some very successful innovations that I'd never have dreamt up in a million years!

If you're helping lead a team, think about what you truly care about. I bet it's outcomes you want, and you'll need to step back from your own preferences on the details if you want a team of creative people to achieve them. Point them at the right mountain, make sure you're giving them good crampons, maps, and guide books, but let them pick a route up themselves!

Five short links

Fiveknives
Photo by Alex Loach

Passing data from server to Javascript on page load – A strong treatment of a grubby little subject that anyone who writes a non-trivial web app has to think about. We have a much more ad hoc version of this, and I'd probably stick to a whitelist of known operations rather than passing in a function name as a string, but I like the approach.

Vaurien, the Chaos TCP proxy – I'm itching to use this, without any pressing justification. There's just something very appealing about throwing glitches and noise into any system and seeing what happens.

How food shapes our cities – This gave me a sense of wonder at how far we've come so quickly, with just a couple of centuries (or these days a few hundred miles and a border or two) separating us from desperately unreliable food supplies.

dstk_excel – Despite its issues, I love github's new search, it helped me discover this Excel interface to my Data Science Toolkit! I love people sometimes.

Heather Arthur – And then sometimes people suck. It's actually good to see this get some attention, being respectful about other people's didn't come naturally to me. It took a good first lead programmer to point out that while I was being snobbish about the original Diablo code I was working on porting, the original engineers were rolling in money like Scrooge McDuck, so who was the idiot?

 

How to debug Javascript errors on iOS

Error
Photo by Nick J Webb

There are lots of advantages to developing for iOS devices in Javascript, either as a mobile website or through a native app that hosts a UIWebView. Debuggability is definitely not one of them though! You'll find yourself flying blind when you need to track down errors, especially compared to the awesome state of browser debuggers. There are techniques that can help though, so I wanted to give a quick overview of what we've ended up doing for Jetpac.

Local logging

If you're targeting Mobile Safari, it's comparatively easy to see your error messages when you're debugging, just enable it in the settings. It gets tricky with a UIWebView though, and we ended up using this custom URL scheme hack (which requires some native code changes) to get log messages appearing in the device console. It's also worth knowing that you can view the console even when you didn't run the app through the debugger (for example if you've installed it through the app store) by plugging in and looking in Organizer->Devices. You can even buy apps to let you view the console natively, which should make you think twice about putting any private information you don't want other apps to access in log messages!

Web inspector

You should check out the new iOS 6 remote debugger, which works with both Safari and UIWebView code. It's been extremely useful for digging into CSS issues, and saved our bacon when tracking down some weird script loading problems.

Catching errors in the wild

The most challenging part is getting information on problems that are happening to users with the released app. If you can't reproduce the issue locally with a device plugged in, how can you tell what went wrong?

The first step is attaching a callback to window.onerror(), which will be called whenever there's an uncaught exception. In iOS 5, you only get the error message, not the file or line, and for various reasons we've had to minify and inline code anyway, so iOS 5's addition of the line number and file name isn't very helpful. What we really need is the call stack, which just doesn't get returned in any form on Mobile Safari.

Scarily, Javascript is such a flexible language that it's possible to do a crazy level of modification of the function calling internals, enough to write user-level tracing for every function! I actually got a version of this Function.prototype hack partially running as an experiment, but the breadth of it scared me. I also realized that I didn't need every function in the call stack, I mostly just wanted to know what part of my code had triggered the problem. What I ended up doing was manually wrapping functions I know about, and outputting information about them in onerror(). It's still an extremely hacky hack, but it's been very useful as we've been tracking down tough release problems, so here's the code I ended up using:

This won't run out of the box, but it should give you an idea of what we're doing. As part of our server-side code we have an error-reporting endpoint that we post the details of any release errors to, /jserror, and that sends on an email to the team.

The heavy lifting happens in the wrapFunctions() call, which replaces each function in an object with a wrapper that first calls the supplied 'before' function (in our case just pushing onto the callstack), then the original function, followed by 'after'. There are no guarantees about the correctness of the code in all cases, the prototype stuff especially scares me, but it has worked in practice on our code base.

I tend to use this pretty sparingly to wrap our own code, rather than jQuery or other frameworks, since most of the errors are in our functions, and I'm worried about sprinkling too much voodoo over our code base. Despite those caveats, it's been a massive help in tracking down our issues.

Security by silo

Silo
Photo by Trey Ratcliff

A while ago I was having drinks with a Google employee, and we started discussing privacy problems. He asked me why Buzz had received so much bad press for its email analysis when Facebook and other social networks had been doing the same thing for years? He also pressed me on why the iPhone tracking story had become such a big issue.

People have a mental model of what devices and services are for, and get freaked out when someone changes the rules. Nobody understands constantly-changing space-shuttle-control-panel privacy settings within services, but everyone knows that LinkedIn is for business relationships, and Facebook is for friends. Users try to protect their privacy by limiting information to sites that serve the audiences they want it to reach.

When Google changed from an email and search provider to a service that could broadcast semi-public updates to her friends, it became unclear where information she'd previously shared would end up. When Apple switched from a phone and computer builder to something that followed your movements, that crossing of boundaries was the real problem. Nobody would have blinked an eye at the idea of a Garmin device keeping a file showing where you'd been.

If you're worried about how users will react to something innovative you're trying, think about how they understand your purpose. Why did they sign up for you in the first place? Ignore the grand vision in your head, what do they think you do? If what you're doing makes sense for that goal, you'll be surprised at how generous and supportive they can be, even for potentially scary applications. If you're working towards something they don't expect, if you're moving outside of the silo they think you're in, you may be in trouble!

Five short links

Ruffle
Photo by Philip Chapman-Bell

The Normal Well-Tempered Mind – I never knew the AI community had a favorite philosopher, but I can see why Daniel C Dennett is it. There are so many ideas in this conversation that made me think about how our minds work in a very different light. Even better is his disclaimer: "Everything I just said is very speculative. I'd be thrilled if 20 percent of it was right." That's an attitude I'll try hard to emulate.

Space Station Challenge – Figure out how to eke more power out of the solar panels by carefully changing their positions over an orbit. It's all the constraints that make this coding challenge so much fun.

Understand the favicon – A purist would be appalled, but the hacker within me loves how we're learning to push the limits of what's possible thanks to a deep understanding of platform quirks. Like the space station challenge, a complex but ultimately understandable set of constraints makes fertile ground for artful programming. Check out this beautiful subversion of browser's text rendering engines if you're into that sort of thing.

Pulse Tech Talk 2 – One of the best things about living in San Francisco is the plethora of great tech talks on your doorstep. Check out AirBnB's series too, they have some mind-blowing speakers.

Love and other conspiracies of the X-Files – I have a confession – I've watched all nine seasons, and I'm gearing up to rewatch them soon. They're not all good in a conventional sense, but almost every episode is interesting, and Josh captures some of the roots of why they could be so compelling.

Does reality improve when your numbers do?

Hockeystick
Photo by Judy and Ed

I had a tough meeting with an advisor this week. I was proudly showing off how we've managed to triple the amount of time that first-time users spend on Jetpac, when he interrupted. He wanted to know why he should care? It forced me to quickly run him backwards through our decision-making process, looking at why we'd chosen that as one of the numbers we wanted to improve. We'd started there because we noticed that our most successful users, those who enjoy the app enough to keep coming back, tend to interact with the app a lot on their initial visit. Users who take more actions spend more time on the app, the correlation has always been strong in our case, so time was a good approximation of how much they were interacting. That had became the goal, and I had been so focused that it took me a moment to reconstruct how we'd got there.

The dangerous part was that there were lots of ways we could keep users on the app longer without improving the experience at all, or even making it worse! Luckily we have a lot of different methods of understanding how the experience is holding up, from surveys, crowd-sourced user tests, and contacts with power users, but it's still a risk. 

When I was in college, a lecturer who was a grizzled engineering veteran warned us "You'll start off wanting to measure what you value, but you'll end up valuing what you can measure". You need to have fresh eyes looking at how you're evaluating your own progress, not only so you avoid the more obvious problem of vanity metrics, but also so you don't follow your numbers down a rabbit hole. Any measure is only a projected shadow of reality. When somebody asks "So what?", you always need to be able to point to something in the outside world that gets better when the metric does!

What should a lead engineer code on?

Lead
Photo by Cindy Cornett Seigel

If you're a programmer who's been thrust into management, you'll probably want to keep coding. It's the only way to truly understand what's happening inside the engineering team, and nobody wants to become a pointy-haired boss. Your non-programming responsibilities will take a lot of your time though, so how can you pick the right tasks to take on? I've worked with several outstanding lead engineers at Apple and elsewhere, and here's what I've noticed about what their coding responsibilities look like.

Boring

The only way to motivate good hackers is to give them something interesting and challenging to work on. As a greybeard engineer, you've probably gone through your career fighting for chance to work on tough, rewarding problems, so your reflex will be to jump in on the most daunting and fun tasks. If you're a good manager, you'll stop yourself! Look for tasks that nobody else wants to take on instead. You shouldn't need the motivation yourself, leading the team should be enough, and you'll be able to offer your engineers a more rewarding bunch of work. It also builds respect for you in the team if they can see you're willing to sacrifice something meaningful for their benefit.

Ubiquitous

In the short-lived Police Squad, Johnny Shoeshine always supplied the 'word on the street' for all sorts of implausible topics. Being a lead is a lot like that! You need to know the nitty-gritty details of what's happening in the code base, and understand intimately how it's evolving so you can offer meaningful advice and head off potential problems early. The only way to do that is to touch as much of the code base as possible as often as possible. That means picking tasks that are cross-module, whether it's integrating multiple parts of the code, or a service that's used everywhere.

Non-blocking

The sad reality of a manager's life is that you're unpredictably called away from your day to day duties, especially when deadlines are looming. That can be disasterous if other team members are relying on you to deliver code so they can make progress, or if bugs are going unfixed because you're unavailable. You need to find something that can be worked on incrementally in small chunks, and doesn't prevent others from making progress if you do get waylaid for a week.

Following this philosophy, one of the things I've ended up building is the activity log analysis system. It's not something anybody else wants to work on, we need to record events almost everywhere in the code so it touches every module, and it doesn't stop us shipping if improvements get delayed.

If you're a lead, give boring a chance, you'll be amazed at how effective an approach it can be!

How I learned to stop meddling

Maryworth

I ran across Fred Wilson's latest post this morning, and I have something to confess. I'm a meddler. If I see someone struggling with a task I know well, I have a strong urge to jump in and 'help'. This isn't always a bad thing, in the past it's helped me train up more junior folks, and experienced folks could always tell me to go take a hike.

That's all changed since I've become a CTO. Even though it's a small team, I'm a 'boss', which means that people are prone to humoring me more. It took me a while to realize, but no matter how diplomatic I think I am, my guys don't feel as comfortable telling me to bugger off.

Over the last couple of months, I've had to learn a new style of interacting with them. Instead of giving 'helpful' suggestions on the best approach to solving a problem, I'll lay out the goals and some thoughts at the start, and then step back and let them find their own path to an implementation. I'm always available to answer questions and give advice when they ask for it, and we'll often do an informal post-mortem on what did and didn't work at the end of the sprint, but otherwise I try to give them the freedom to code their own way.

I'm lucky enough to be working with a bunch of very smart folks, so the results have been impressive, the solutions have been much more imaginative and effective than they were. It's been humbling to see how strong a negative effect my frequent interventions had, but thinking back on my own career it makes sense. "Voice and choice" were the keys to the jobs I loved. If I'd been involved in planning my own work, and then made decisions about how to tackle it, it turned from being a servile task I was grudgingly performing for someone else, into my project that I worked extra hard on because I truly felt ownership. I would even go out of my way to work in areas that were difficult and unpopular because those were the ones where I had the most freedom. Nobody wanted to interfere with my work on video format conversion code in Motion, for fear they'd be pulled into the quagmire too!

The liberating thing has been how much it has freed me up to work on other vital parts of my job, but that's a subject for another post. If any of this is sounding familiar to you, try really giving your team voice and choice, you'll be amazed at the results!