How to track iOS memory crashes

February 11, 2013 By Pete Warden in Uncategorized 2 Comments

I love being able to use HTML5 content within Jetpac, but hosting it in Apple's UIWebView component can use a lot of memory. That matters because iOS apps crash when they run out of memory, and to make things worse they crash so hard that you don't even get a report! The process is killed, even low-level exit handlers don't get run, our code is shut down with no chance to do anything.

We do sometimes see low memory warnings, but these aren't as useful as you might think. They can occur in fairly benign circumstances and be cured by Javascript running a round of garbage collection, which means they aren't great predictors of crashes. They also don't always fire before there's a memory exhaustion crash, so we can't rely on taking preventative measures in those handlers.

To help understand what's going on, I've added low-level OS memory tracking instrumentation to help us track the free memory situation over time. I've combined it with our home-brewed Javascript function tracing to get quite a fine-grained view of which operations are using the most space, and we found some fascinating issues, like simple Canvas drawing operations appearing to leak a full image's worth of memory every time!

It's important for us to understand how widespread the crashes are in the wild though, and without crash reports we can't keep track of how well we're doing with our fixes. I've been talking to the folks at Crittercism, who we use and love for our general crash reporting, and they don't yet have a solution, but I did have a brain wave that I'm trying.

We have no chance to run code if the app crashes hard, but we do if the user deliberately quits by pressing the home button. We have an in-house activity log server, so if we fire off an event when a user starts up the app, and one when they deliberately close it, we can estimate how many times it crashed. We get great reports from Crittercism for normal crashes, so by subtracting those we can figure out roughly how many users are affected by this. The numbers will be a bit biased by lost connections (since we need to communicate with our log server to record an app close), but it will be far better than nothing.

I've submitted a new version of the app with the logging included, so I should have a better idea of how this works in practice within the next few weeks. Here's hoping it helps!

Five short links

February 8, 2013 By Pete Warden in Uncategorized Leave a comment

Mural by Monte Thrasher

Heads by Monte Thrasher – Normally my short link images are side-notes, but the pentagonal helmet image led me to discover what I think is my favorite mural ever. Check out Twiggy, the world's ugliest dog, and the inflatable skull, and much more.

Stately – US state outlines as a font, in their correct positions. I love the explosion of font hacks we've seen recently!

Carmen Geocoder – The MapBox folks have done some great detective work to find open data sources for this project. I'm looking forward to seeing how this might work together with the Data Science Toolkit.

R Language interface to the DSTK – On that topic, it was great to see Ryan Elmore release an interface to the toolkit in R.

Big Data – Beyond the Hype – An opinionated and thought-provoking exploration of where the data world is headed.

The dignity of customer service

February 6, 2013 By Pete Warden in Uncategorized Leave a comment

Photo by Ricky Brigante

When I started my first job as a supermarket clerk, I dreaded going in. I needed the money to feed my weed and D&D habit though, so I gritted my teeth and dragged myself to work every Saturday. After a few weeks something strange happened – I found myself enjoying my time behind the checkout!

I'd grown up with the idea of any kind of service job as shameful and embarassing. In Britain, there were still a lot of holdover attitudes from the Downton Abbey days of servants and lords. Somebody working in a job where you had to do things for people off the street risked losing face and descended down the class hierarchy. The usual defense was surliness; "You're looking down on me? I'll show you I'm not at your beck and call!".

I was lucky enough to be at Tescos, which in the early 90's was the scrappy upstart of the supermarket world, and their killer advantage was a different approach to customer service. As my supervisor patiently explained to me:

"The women you'll see have just spent an hour dragging screaming toddlers around the store after a full day of work. They'll likely be in a foul mood by the time they reach your checkout, but don't take it personally. They're so focused on their own worries, they don't even see you! If they lash out, just listen to them patiently, and let them know they're being heard. They take their cues off you, so if you react calmly instead of being upset, most of them calm down too. And if you take the high ground and smile sweetly, you don't give the few nasty ones the satisfaction of getting to you. This is the hardest part of your job, so take pride in doing it well."

It was never an easy job but I found there was a real dignity in it, once I treated good customer service as something I could take internal pride in, rather than being shameful and servile.

I've been thinking about this a lot after reading Andrew Sullivan's posts on the subject. He's made the same journey from Britain to the US, and shares my joy in the American dedication to good customer service. A lot of his readers don't agree, complaining that it's soul-violating to be forced to "treat callers like royalty". I'm not trying to romanticize any entry-level job, but a professional attitude to customer interactions was the best armor I had against the emotional assaults that the general public inflicts. It gave me good boundaries so I could approach awkward customers as a problem to be handled, not a reflection on my own self-worth.

This became especially important once I moved to Manchester, and to pay for college spent a year working at an infamous chain called Kwik Save. It was known for its "No Frills" brand, and everything about the store lived up to the slogan. Even box cutters were precious items given to honored employees, while the rest of us improvised using keys or pens to poke through the sticky tape. Management also embraced a distinctly old-fashioned approach to customer service – "The people who shop here are scum, we have to treat them like scum" as the manager Mr Albinson put it!

I kept to my Tesco training as much as I could, but I found being in an environment of bad customer service was far more soul-destroying than my old job. Employees would argue and even yell at shoppers, and they'd get sucked into all sorts of petty disputes. Everyone left work a lot more upset than they ever did at Tescos. Losing the shield of professionalism made the inevitable friction with customers far more soul-destroying than it had to be.

That all means that articles like Timothy Noah's leave me hopping mad. A close member of my family is a long-time Pret employee, with multiple awards for great customer service (which are good chunks of cash, and shared with the whole team), and he's not been brainwashed by a cultish corporation. He's a good guy working a tough job, and part of it is treating customers extremely well. Sure, Pret employees may not be allowed to have an off day, but only in the same way that they aren't allowed to drop the sandwiches on the floor. Doctors, lawyers, and anyone who has to deal with the public has a work persona they need to adopt to effectively do their job. It's patronizing to assume that clerks and servers aren't making the same kind of tradeoffs as people in more prestigious professions.

Expecting everyone who wants to give you money to deal with your emotional baggage is a luxury few of us can afford. There are a lot of genuine problems out there, like the awful working conditions that many service employees have to suffer, but most companies that care about employees appearing happy have figured out that treating them decently is a big help. Don't take away the dignity of great customer service givers by assuming they're silently suffering as they smile, and need your protection. They're secure on their side of the professional barrier, and the most helpful thing you can do is give them the respect they deserve.

How do analytics really work at a small startup?

February 4, 2013 By Pete Warden in Uncategorized Leave a comment

I was lucky enough to spend a few hours today with my friend Kevin Gates, one of the creators of Google's internal business intelligence systems, and it turned out to be a very thought-provoking chat. His mind was somewhat boggled that we were so data-obsessed at such an early stage in our life. Most people running analytics work at a large company and have a big stream of users to run experiments on. Our sample sizes are much smaller, which makes even conceptually simple approaches like A/B tests problematic. Just waiting long enough to get a statistically-significant results becomes a big bottleneck.

We've found ways around a lot of the technical issues, for example focusing on pre/post testing rather than A/B to speed up the process, but there's a bigger philosophical question. Is it even worth focusing on data when you only have tens of thousands of users?

The key for us is that we're using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Our quest is to understand what users are doing and what they want. Everything we're looking at should be actionable, should answer a product question we're wrestling with. To help answer that, I sketched out a diagram of how the information flows through our tools to the team:

The silhouettes show where people are looking at the results of our data crunching. The primary things that everyone on our team religiously watches are the daily report emails, and the UserTesting.com videos that show ordinary people using new features of our app. The daily reports are built on top of our analytics database, which is a Postgres machine with a homebrewed web UI to create, store, and regularly run reports on the event logs it holds. We built this when our requirements expanded beyond KissMetrics more funnel-focused UI, but we still use their web interface for some of our needs. Qualaroo is an awesome offshoot of KissMetrics that we use for in-app surveys, and we also refer to MailChimp's Mandrill dashboard and Urban Airship's statistics to understand how well our emails and push notifications are working. We have to use AppAnnie to keep track of our iOS download numbers and reviews over time.

We also have about twenty key statistics that we automatically add to a 'State of the App' Google Docs spreadsheet every day. This isn't something we constantly refer to, but it is crucial when we want to understand trends over weeks or months.

Over the last 18 months we've experimented with a lot of different approaches and sources of data, but these are the ones that have proved their worth in practice. It doesn't look the same as a large company approach to analytics, but this flow has been incredibly useful in our startup environment. It has helped us to make better and faster decisions, and most importantly spot opportunities we'd never have seen otherwise. If you're a small company and are feel like you're too early to start on analytics, you may be surprised by how easy it is to get started and how much you get out of it. Give simple services like KissMetrics a try, and I bet you'll end up hooked!

How good are our geocoders?

February 3, 2013 By Pete Warden in Uncategorized Leave a comment

Photo by Oatsy 40

My last post was a quick rant about the need for a decent open geocoder, but what's wrong with the ones we have? I've created a command-line tool to explore their quality: https://github.com/petewarden/geocodetest.

As a first pass, I pulled together a list of six addresses, some from my past and a few from spreadsheets users have uploaded to OpenHeatMap. The tool runs through the list (or any file of addresses you give it) and geocodes them through DSTK and Nominatim, returning a CSV of whether the locations are within 100m of Google's result. Run the script with -h to see all the options. Here are the results, produced by running ./geocodetest.rb -i testinput.txt

dstk,google,nominatim,address
Y,Y,Y,2543 Graystone Place, Simi Valley, CA 93065
Y,Y,N,400 Duboce Ave, #208, San Francisco CA 94117
Y,Y,N,11 Meadow Lane, Over, Cambridge, CB24 5NF, UK
N,Y,N,VIC 3184, Australia
N,Y,Y,Lindsay Crescent, Cape Town, South Africa
N,Y,N,3875 wilshire blvd	los angeles	CA

The first three are standard test cases for me, so it's not be a massive surprise that my DSTK (based on Schuyler Erle and GeoIQ's original work) works better than Nominatim for two of them. It does highlight one of the reasons I've struggled to use Nominatim though – it's not good at coping with alternative address forms. This makes it quite brittle, especially around addresses like the UK where there are multiple common permutations of village, city, and county names. Nominatim doesn't return any results for #2 or #3 at all, when I'd hope for at least a town-level approximation.

The Australian postal code is about 30 km from Google's result, whereas the open GeoNames data in the DSTK gets me to within 400m of Google. Nominatim does much better on the SA address, since I haven't imported OSM data into the DSTK for anywhere but the UK. I did have to correct the original user-entered spelling of 'Cresent' first though, and I'd love to see an open geocoder that was robust to this sort of common mistake. The last address is another sloppy one, but we should be able to cope with that one too!

Part of the reason there hasn't been more progress on open geocoders is that the problems are not very visible. I hope having an easy test harness changes that, and while this first pass is far from scientific, it's already inspired me to put in several fixes to my own code. I'm a big fan of the effort that's been put into the Nominatim project (I'm using their OSM loading code myself) I'm just disappointed that the results haven't been good enough to build services like OpenHeatMap on top of. I'll be expanding this tool to cover more addresses and so build a better 'map' of how we're doing, and what remains to be done. I'm excited by the opportunities to make progress here, I'll be busy working more on my own efforts and I can't wait to hear other folks thoughts too.

Why is open geocoding important?

February 1, 2013 By Pete Warden in Uncategorized Leave a comment

Photo by Werner Kunz

A few years ago I had what I thought was a simple problem. I had a bunch of place names, and I needed to turn them into latitude and longitude coordinates. To my surprise, it turned out to be extremely hard. Google has an excellent geocoder, but you're only allowed to use it for data you're displaying on Google maps, and there are rate limits and charges if you use it in bulk. Yahoo has an excellent array of geo APIs with much better conditions, but there are still rate limits and their future was in doubt even then!

So, I ended up hacking up my own very basic solution based on open data. It turned out to be a fascinating problem, one you could spend a lifetime on, trying to draw a usable, detailed picture of the world from freely available data. I bulked up the underlying data and algorithms, and it became the core of the Data Science Toolkit. Turning addresses into coordinates may sound like a strange obsession, but it has become my white whale.

There are some folks who agree that this is an important problem, but I've been surprised there aren't more. Placenames describe our world, and we need an open and democratic way for machines to interpret them. Almost any application that uses locations needs to do this operation, and right now we have no alternative to commercial systems.

What are the practical impacts of this? We've got no control over what our neighborhoods are called, or how they're defined. We can't fix problems in the data that impact us, like correcting the location of our address so that delivery drivers can find us. We can't build applications that take in large amounts of address data unless we can afford high fees, which cuts out a whole bunch of interesting projects.

This is on my mind because I'm making another attack on improving the DSTK solution. I've already added a lot of international postal codes thanks to GeoNames, but next I want to combine the public domain SimpleGeo point-of-interest dump with OpenStreetMap data to see if I can synthesize more addressable ranges for at least some more countries. That will be an interesting challenge, but if I get some usable it opens the door to adding more coverage through any open data set that combines street addresses and coordinates. I can't wait to see where this takes me!

Five short links

January 31, 2013 By Pete Warden in Uncategorized Leave a comment

Photo by Alan Levine

You can't sacrifice partition tolerance – A convincingly (and amusingly) argued case that you can never trade off the P in the CAP theorem.

Topic discovery with Apache Pig and Mallet – This sort of thing used to be magic, now you can assemble it from off-the-shelf components.

The insanely confusing path to legal immigration – I've almost made it through my immigration story, knock on wood I'll be taking my citizenship oath in May after twelve years, but it's been tough to explain to my American friends quite how convoluted the process is. This chart will help!

dancer.js – The web continues to devour the software world. Javascript can now handle both the fast rendering and the audio analysis you need for this music-responsive visualization.

Click dataset – Over 50 billion real-world HTTP requests. I'm certain there are identifiable elements in this data, but I think Arvind's right that researchers have proved this so convincingly that they won't bother to highlight them, and malicious users will never talk about it, so for some approximation of matters, it doesn't matter.

The Data Science Toolkit is now on Vagrant!

January 28, 2013 By Pete Warden in Uncategorized 1 Comment

Picture by Jacob Haas

I have fallen in love with Vagrant over the last year, it turns an entire logical computer as a single unit of software. In simple terms, you can easily set up, run, and maintain a virtual machine image with all the frameworks and data dependencies pre-installed. You can wipe it, copy it to a different system, branch it to run experimental changes, keep multiple versions around, easily share it with other people, and quickly deploy multiple copies when you need to scale up. It's as revolutionary as the introduction of distributed source control systems, you're suddenly free to innovate because mistakes can be painlessly rolled back, and you can collaborate other people without worrying that anything will be overwritten.

Before I discovered Vagrant, I'd attempted to do something similar with my Data Science Toolkit package, distributing a VMware image of a full linux system with all the software and data it required pre-installed. It was a large download, and a lot of people used it, but the setup took more work than I liked. Vagrant solved a lot of the usability problems around downloading VMs, so I've been eager to create a compatible version of the DSTK image. I finally had a chance to get that working over the weekend, so you can create your own local geocoding server just by running:

vagrant box add dstk http://static.datasciencetoolkit.org/dstk_0.41.box

vagrant init

The box itself is almost 5GB with all the address data, so the download may take a while. Once it's done go to http://localhost:8080 and you'll see the web interface to the geocoding and unstructured data parsing functions.

I've updated the US address data using the most recent Census data from 2012, rebuilt the system around Ubuntu 12.04, and incorporated a lot of virtual memory setting changes that improve the stability of the system when it's dealing with large volumes of API calls. I've released an EC2 AMI with all these changes too, and the full instructions for setting up your own server are at http://www.datasciencetoolkit.org/developerdocs#amazon.

Pick the mountain, not the route

January 25, 2013 By Pete Warden in Uncategorized Leave a comment

Photo by Mike P

For most of my life I've managed to avoid being a boss. Since I helped start Jetpac the responsibility has crept up on me, and not coincidentally so have a few grey hairs. I'm still loving my job, but managing is tough, but not always in the ways I expected.

One of my biggest surprises is how bad I was at leading a team of engineers. I'd spent a long time as a senior guy on various decent-sized teams, taking a lot of initiative and making a lot of decisions, so I thought leading would be a big but incremental step. Instead, I've actually had to unlearn a lot of what I'd picked up over the last 15 years.

In particular, I found that my enjoyment of the debate about ways to implement features went from being valuable to toxic. As a team contributor, I was used to chiming in while we all hashed out the right approach, matching wits and learning in a back-and-forth debate. For the first few months here I did the same thing, and was mystified that something just didn't seem right. The discussions seemed more stilted, and it never felt like everyone had truly bought in to the conclusions. People weren't as enthusiastic as I'd expect, and problems that we should have caught in the planning stages only became apparent much later.

It took me a while, but I finally realized I was in a different position. When I gave my opinion it carried more weight, and if I jumped in on every interesting detail I'd end up cutting the discussions short. That meant I never benefitted from the experience of the super-smart engineers I'm lucky enough to work with. I realized I have a different role, and I have to have a much lighter touch.

Instead of getting deeply involved in the implementation approaches, I've found it's worked much better to focus on the end-user goals of what we're trying to do, and communicating those to the engineering team. An important part of that is asking questions about how they think different approaches will meet particular goals. "Do you think this will get more people exploring this feature?". "Will that get more people entering recommendations?". The key difference from my previous approach is that I give them ownership of the way they reach those goals. As long as they're able to meet them, I don't care how they get there! The team take pride in their work, so I've never had to worry about code quality, and the end-user results have been amazing. We've ended up with some very successful innovations that I'd never have dreamt up in a million years!

If you're helping lead a team, think about what you truly care about. I bet it's outcomes you want, and you'll need to step back from your own preferences on the details if you want a team of creative people to achieve them. Point them at the right mountain, make sure you're giving them good crampons, maps, and guide books, but let them pick a route up themselves!

Five short links

January 24, 2013 By Pete Warden in Uncategorized Leave a comment

Photo by Alex Loach

Passing data from server to Javascript on page load – A strong treatment of a grubby little subject that anyone who writes a non-trivial web app has to think about. We have a much more ad hoc version of this, and I'd probably stick to a whitelist of known operations rather than passing in a function name as a string, but I like the approach.

Vaurien, the Chaos TCP proxy – I'm itching to use this, without any pressing justification. There's just something very appealing about throwing glitches and noise into any system and seeing what happens.

How food shapes our cities – This gave me a sense of wonder at how far we've come so quickly, with just a couple of centuries (or these days a few hundred miles and a border or two) separating us from desperately unreliable food supplies.

dstk_excel – Despite its issues, I love github's new search, it helped me discover this Excel interface to my Data Science Toolkit! I love people sometimes.

Heather Arthur – And then sometimes people suck. It's actually good to see this get some attention, being respectful about other people's didn't come naturally to me. It took a good first lead programmer to point out that while I was being snobbish about the original Diablo code I was working on porting, the original engineers were rolling in money like Scrooge McDuck, so who was the idiot?

	Anonymous on De-ICE Disco at the Googl…
	Pete on De-ICE Disco at the Googl…
	De-ICE Disco at the… on Join me at the Tesla Protests…
	De-ICE Disco at the… on Join me at the Tesla Protests…
	Deepseek or alternat… on Why does a Local AI Voice Agen…

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

How to track iOS memory crashes

Five short links

The dignity of customer service

How do analytics really work at a small startup?

How good are our geocoders?

Why is open geocoding important?

Five short links

The Data Science Toolkit is now on Vagrant!

Pick the mountain, not the route

Five short links