Five short links

Handprints
Photo by Ryan Somma

DocHive – Transforming scanned documents into data. A lot of the building blocks exist for this as open source, the hard part has always been building something that non-technical people can use, so I'm looking forward to seeing what a journalist-driven approach will produce.

How I helped create a flawed mental health system – There are a lot more homeless people sleeping on my block in San Francisco than there were even just two years ago. I'm driven to distraction by the problems they bring, but this personal story reminded me that they're all some parent's son.

Can you parse HTML using regular expressions? – An unlikely title for some of the funniest writing I've read in months.

Forest Monitoring for Action – A great project analyzing satellite photos to produce data about ecological dama around the world. I ran across this at the SF Open Data meetup, it's well worth attending if this sort of thing floats your boat.

Data visualization tools – A nicely presented and well-curated collection.

Why you should try UserTesting.com

Humancannonball
Photo by Zen Sutherland

If you're building a website or app you need to be using UserTesting.com, a service that crowd-sources QA. I don't say that about many services, and I have no connection with the company (a co-worker actually discovered them) but they've transformed how we do testing. We used to have to stalk coffee shops and pester friends-of-friends to find people who'd never seen Jetpac before and were willing to spend half an hour of their life being recorded while they checked it out. It meant the whole process took a lot of valuable time, so we'd only do it a few times a month. This made life tough for the engineering team as the app grew more complex. We have unit tests, automated Selenium tests, and QA internally, but because we're so dependent on data caching and crunching, a lot of things only go wrong when a completely new user first logs into the system.

These are the steps to getting a test running:

 - Specify what kind of users you need. In our case we look for people between 15 and 40 years old, with over 100 friends on Facebook, who've never used Jetpac before, and who have an iPad with iOS 5 or greater.

– Write a list of tasks you want them to perform. For us, this is simply opening up the app, signing in with Facebook, and using various features.

– Prepare a list of questions you'd like them to answer at the end. We ask for their overall rating of the app, as well as questions about how easy particular features are to find and use.

Once you've prepared those, you have a template that you can re-use repeatedly, so that new tests can be started with just a few seconds of effort. The final step is paying! It does cost $39 per user, so it's not something you want to overuse, but it's saves so much development time, it's well worth it for us.

It normally takes an hour or two for our normal three-user test batches to be completed, and at the end we're emailed links to screencasts of each tester using the app. Since we're on the iPad, the videos are taken using a webcam pointing at the device on a desk, which sounds hacky but works surprisingly well. All of the users so far have been great about giving a running commentary about what they're seeing and thinking as they go through the app, which has been invaluable as product feedback. It's actually often better than the feedback we get from being in the room with users, since they're a lot more self-conscious then!

The whole process is a pleasure, with a lot of thoughtful touches throughout the interface, like the option to play back the videos at double speed. The support staff has been very helpful too, especially Matt and Holly for offering to refund two tests when I accidentally cc-ed them on an unhappy email about the bugs we were hitting in our product.

The best thing about discovering UserTesting.com has been how it changes our development process. We can suddenly get way more information than we could before about how real users are experiencing the app in the wild. It has lowered the barrier dramatically to running full-blown user tests, which means we do a lot more of them, catch bugs faster, and can fix them more easily. I don't want to sound like too much of an informercial, but it's really been a god send to us, and I highly recommend you check them out too.

Strange UIWebView script caching problems

Hieroglyphics
Photo by Clio20

I've just spent several days tracking down a serious but hard to reproduce bug, so I wanted to leave a trail of Googleable breadcrumbs for anyone else who's hitting similar symptoms.

As some background, Jetpac's iPad app uses a UIWebView to host a complex single-page web application. There are a lot of independent scripts that we normally minify down into a handful of compressed files in production. Over the last few weeks, a significant percentage of our new users have had the app hang on them the first time they loaded it. We couldn't reproduce their problems in-house, which made debugging what was going wrong tough.

From logging, it seemed like our app setup Javascript code was failing, so the interface never appeared. The strange thing was that it was rarely the same error, and often the error locations and line numbers wouldn't match with the known file contents, even after we switched to non-minified files. Eventually we narrowed it down to the text content of some of our <script> tags being pulled from a different <script> tag elsewhere in the file, seemingly at random!

That's going to be hard to swallow, so here's the evidence to back up what we were seeing:

We had client-side logging statements within each script's content, describing what code was being executed at what time, combined with <script> onload handlers that logged what src had just been processed. Normal operation would look like this:

Executing module storage.js

Loaded script with src 'https://www.jetpac.com/js/modules/storage.js&#039;

Executing module profile.js

Loaded script with src 'https://www.jetpac.com/js/modules/profile.js&#039;

Executing module nudges.js

Loaded script with src 'https://www.jetpac.com/js/modules/nudges.js&#039;

In the error case, we'd see something like this:

Executing module storage.js

Loaded script with src 'https://www.jetpac.com/js/modules/storage.js&#039;

Executing module profile.js

Loaded script with src 'https://www.jetpac.com/js/modules/profile.js&#039;

Executing module storage.js

Loaded script with src 'https://www.jetpac.com/js/modules/nudges.js&#039;

Notice that the third script thinks it's loading nudges.js, but the content comes from storage.js!

Ok, so maybe the Jetpac server is sending the wrong content? We were able to confirm through the access log the file with the bogus content (nudges.js in the example above) was never requested from the server. We saw the same pattern every time we managed to reproduce this, and could never reproduce it with the same code in a browser.

As a clincher, we were able to confirm that the content of the bogus files was incorrect using the iOS 6 web inspector.

The downside is that we can't trigger the problem often enough to create reliable reproduction steps or a test app, so we can't chase down the underlying cause much further. It has prompted us to change our cache control headers since it seems like something going wrong with the iOS caching, and the logging has also given us a fairly reliable method of spotting when this error has happened after the fact. Since it is so intermittent, we're triggering a page reload if we do know we've lost our marbles. This generally fixes the problem, since it it does seem so timing dependent, though the hackiness of the workaround doesn't leave me with a happy feeling!

If you think you're hitting the same issue, my bet is you aren't! It's pretty rare even for us, but if you want to confirm try adding logging like this in your script tags, and log inside each js file to keep track of which you think is loading:

<script src="foo.js" onload="console.log('loaded foo.js');"/>

In foo.js

console.log('executing foo.js');

Comparing the stream of log statement will tell you if things are going wrong. You'd expect every 'executing foo.js' to be followed by a 'loaded foo.js' in the logs, unless you're using defer or async attributes.

Things users don’t care about

Yawning

Photo by DJ Badly


How long you spent on it.

How hard it was to implement.

How clean your architecture is.

How extensible it is.

How well it runs on your machine.

How great it will be once all their friends are on it.

How amazing the next version will be.

Whose fault the problems are.

What you think they should be interested in.

What you expected.

What you were promised.

How important this is to you.

 

I have to keep relearning these lessons. Finding an experience that people love is far more precious and rare than most of us realize.

Five short links

Pentagon
Photo by Sanchtv

How Zappo's user agreement failed in court – A lot of people have an almost-mystical belief in the power of terms of service and license agreements. Courts generally look for some minimal evidence that people are reading and understanding what they're agreeing to before they'll agree to enforce those terms. They also don't look kindly on agreements that allow one party to change the terms at any time, without notice or assent. The law may be an ass, but there are humans at the controls and you can't get too far from common sense before they'll intervene.

Free Foundation Trilogy Audio Books – Every now and again I have to stop and be amazed at the cornucopia of art that the internet has made instantly available to me. I remember struggling to piece together book series from libraries and stores, in a way that the web has made obsolete. I was reminded of that when I ran across this eight hour epic production by one of my favorite childhood authors, though I'm slightly afraid to listen in case Asimov hasn't aged well.

The Evil Bit RFC – The perfect antidote to endless angels-on-a-pinhead debates around web specifications (*cough* webfinger *cough*). It's way more fun to sketch the rules of a perfect world than to wrestle with the tradeoffs of an ugly one.

Unwise Microwave Experiments – Who knew you could use your microwave to create lava? I'll be trying these shortly before I move out of my current apartment.

Mind your nanoseconds – It's pretty thrilling to know that we're now doing things so fast in computing that the speed of light is becoming a significant barrier, and Admiral Grace Hopper does an amazing job turning very abstract ideas into something very concrete.

What can small startups learn from their event data?

Snowhearts
Photo by Lovestruck

There are lots of examples of big companies A/B testing their way to greater success, but it's hard to figure out how to get started with event data as an early-stage company. For one thing, you probably don't have very much data to go on! I'm going to talk about a few of the things I've learned at Jetpac as we've built something from nothing.

Sample size matters

We're still focused on getting a user experience people love, and a value proposition they understand, before we attack distribution. That means we have hundreds of users a day, not thousands or tens of thousands. My initial approach was to try a feature experiment, look at the results after a couple of days and then decide which was more successful. I rapidly discovered that this wasn't working, as a week later the statistics on a feature's usage might be much worse. Looking at measures like the number of slides viewed, it became clear how big the natural day-to-day variation was, often plus or minus 50%!

It's easy to forget basic statistics like I did, but the sample size is really, really important. There are robust methods to figure out the confidence of your results, but for my applications I've found anything less than a hundred is useless, a few hundred becomes indicative, and over a thousand is reliable.

This doesn't often come up because optimizing based on event tracking doesn't usually happen until you have large numbers of users, so it's easy to gather big samples quickly. For us, it's meant that we can only run a limited set of experiments, so we have to be very careful about how we choose the most important hypotheses. It also means that we tend to try out new features on 100% of our users, and compare against historical data, since that's the fastest way to gather samples and time is our scarcest resource!

What people do doesn't always reveal what they want to do

One of the most productive outcomes of having a rich set of data about how our users behave is that we've learned to argue about data, rather than opinions. When somebody has a product idea, we can dig into the existing data and see what evidence there is to support it. A lot of the time this is effective, but the approach has some subtle flaws. We still try to prototype some of the ideas on users, even when existing behavior seems to rule against them, and a few of them work despite the data. Sometimes we've figured out that what the old data is really showing that people didn't use a similar feature because its placement in the interface was poor, or they didn't understand what the text meant, or the icon was unattractive.

People are funny beasts, and looking at how they're using an application and saying, "they want to do X and Y, but Z isn't popular" is helpful, but not sufficient. Sometimes a small variation on "Z" will make a big difference, they may actually want to do it but be discouraged by the way it's been presented. At a big company there's probably a lot of institutional knowledge about what's worked in the past, but you won't have that as a small startup.

There's no substitute for talking to users

So if data's not the whole answer, what can you do? I've been lucky enough to work with a very talented user experience designer and one of his super-powers is sitting down and watching new users try out the app, and talking to them about what they're thinking. A classic example of how this can work is mis-swiping. Our whole app is based around touch-swiping through hundreds of travel photos your friends have shared with you, and he noticed that a lot of people seemed to be having trouble moving to the next picture. He mentioned this at a standup, and I was intrigued. I queried our activity data, and discovered that over 8% of swipes that were started in the app weren't seen as vigorous enough to result in a slide advance! This matched what he was seeing, so we redesigned the swipe gesture to be more sensitive. After we made the changes, we videoed a couple more new users and saw that they had a lot more success. A few days later I had a big enough sample size to be confident that the percentage of mis-swipes had dropped to just 2%!

Guerilla user-testing

User testing sounds awesome, but don't you need a lab to do it properly? I'm sure that would help, but we've managed to get a long way with some simple approaches that anyone can adopt. Bryce's killer technique is tilting a laptop screen down so it's looking at an ipad on the desk, and then having the user play with the app while he records the video of their hands and the audio. I know bringing folks into the office can be very time-consuming though, so we've also ended up getting a lot of value out of in-app surveys from Qualaroo, and simple user tests from UserTesting.com. The videos you get out of the latter can be incredibly useful, and it's a godsend being able to put in a request and within an hour have several fresh users work through any part of your application you want tested.

Knowing in real-time how people are using Jetpac has been amazingly helpful, it's been an incredible learning experience. Even if you're a small fry why not dive into your event data yourself?