There are lots of examples of big companies A/B testing their way to greater success, but it's hard to figure out how to get started with event data as an early-stage company. For one thing, you probably don't have very much data to go on! I'm going to talk about a few of the things I've learned at Jetpac as we've built something from nothing.
Sample size matters
We're still focused on getting a user experience people love, and a value proposition they understand, before we attack distribution. That means we have hundreds of users a day, not thousands or tens of thousands. My initial approach was to try a feature experiment, look at the results after a couple of days and then decide which was more successful. I rapidly discovered that this wasn't working, as a week later the statistics on a feature's usage might be much worse. Looking at measures like the number of slides viewed, it became clear how big the natural day-to-day variation was, often plus or minus 50%!
It's easy to forget basic statistics like I did, but the sample size is really, really important. There are robust methods to figure out the confidence of your results, but for my applications I've found anything less than a hundred is useless, a few hundred becomes indicative, and over a thousand is reliable.
This doesn't often come up because optimizing based on event tracking doesn't usually happen until you have large numbers of users, so it's easy to gather big samples quickly. For us, it's meant that we can only run a limited set of experiments, so we have to be very careful about how we choose the most important hypotheses. It also means that we tend to try out new features on 100% of our users, and compare against historical data, since that's the fastest way to gather samples and time is our scarcest resource!
What people do doesn't always reveal what they want to do
One of the most productive outcomes of having a rich set of data about how our users behave is that we've learned to argue about data, rather than opinions. When somebody has a product idea, we can dig into the existing data and see what evidence there is to support it. A lot of the time this is effective, but the approach has some subtle flaws. We still try to prototype some of the ideas on users, even when existing behavior seems to rule against them, and a few of them work despite the data. Sometimes we've figured out that what the old data is really showing that people didn't use a similar feature because its placement in the interface was poor, or they didn't understand what the text meant, or the icon was unattractive.
People are funny beasts, and looking at how they're using an application and saying, "they want to do X and Y, but Z isn't popular" is helpful, but not sufficient. Sometimes a small variation on "Z" will make a big difference, they may actually want to do it but be discouraged by the way it's been presented. At a big company there's probably a lot of institutional knowledge about what's worked in the past, but you won't have that as a small startup.
There's no substitute for talking to users
So if data's not the whole answer, what can you do? I've been lucky enough to work with a very talented user experience designer and one of his super-powers is sitting down and watching new users try out the app, and talking to them about what they're thinking. A classic example of how this can work is mis-swiping. Our whole app is based around touch-swiping through hundreds of travel photos your friends have shared with you, and he noticed that a lot of people seemed to be having trouble moving to the next picture. He mentioned this at a standup, and I was intrigued. I queried our activity data, and discovered that over 8% of swipes that were started in the app weren't seen as vigorous enough to result in a slide advance! This matched what he was seeing, so we redesigned the swipe gesture to be more sensitive. After we made the changes, we videoed a couple more new users and saw that they had a lot more success. A few days later I had a big enough sample size to be confident that the percentage of mis-swipes had dropped to just 2%!
User testing sounds awesome, but don't you need a lab to do it properly? I'm sure that would help, but we've managed to get a long way with some simple approaches that anyone can adopt. Bryce's killer technique is tilting a laptop screen down so it's looking at an ipad on the desk, and then having the user play with the app while he records the video of their hands and the audio. I know bringing folks into the office can be very time-consuming though, so we've also ended up getting a lot of value out of in-app surveys from Qualaroo, and simple user tests from UserTesting.com. The videos you get out of the latter can be incredibly useful, and it's a godsend being able to put in a request and within an hour have several fresh users work through any part of your application you want tested.
Knowing in real-time how people are using Jetpac has been amazingly helpful, it's been an incredible learning experience. Even if you're a small fry why not dive into your event data yourself?