Deep Learning is Eating Software

pacman

Photo by John Watson

When I had a drink with Andrej Karpathy a couple of weeks ago, we got to talking about where we thought machine learning was going over the next few years. Andrej threw out the phrase “Software 2.0”, and I was instantly jealous because it captured the process I see happening every day across hundreds of projects. I held my tongue until he got his blog post out there, but now I want to expand my thoughts on this too.

The pattern is that there’s an existing software project doing data processing using explicit programming logic, and the team charged with maintaining it find they can replace it with a deep-learning-based solution. I can only point to examples within Alphabet that we’ve made public, like upgrading search ranking, data center energy usage, language translation, and solving Go, but these aren’t rare exceptions internally. What I see is that almost any data processing system with non-trivial logic can be improved significantly by applying modern machine learning.

This might sound less than dramatic when put in those terms, but it’s a radical change in how we build software. Instead of writing and maintaining intricate, layered tangles of logic, the developer has to become a teacher, a curator of training data and an analyst of results. This is very, very different than the programming I was taught in school, but what gets me most excited is that it should be far more accessible than traditional coding, once the tooling catches up.

The essence of the process is providing a lot of examples of inputs, and what you expect for the outputs. This doesn’t require the same technical skills as traditional programming, but it does need a deep knowledge of the problem domain. That means motivated users of the software will be able to play much more of a direct role in building it than has ever been possible. In essence, the users are writing their own user stories and feeding them into the machinery to build what they want.

Andrej focuses on areas like audio and speech recognition in his post, but I’m actually arguing that there will be an impact across many more domains. The classic “Machine Learning: The High-Interest Credit Card of Technical Debt” identifies a very common pattern where machine learning systems become embedded in deep stacks of software. What I’m seeing is that the problem is increasingly solved by replacing the whole stack with a deep learning model! Taking the analogy to breaking point, this is like consolidating all your debts into a single loan with lower payments. A single model is far easier to improve than a set of deeply interconnected modules, and the maintenance becomes far easier. For many large systems there’s no one person who can claim to understand what they’re actually doing anyway, so there’s no real loss in debuggability or control.

I know this will all sound like more deep learning hype, and if I wasn’t in the position of seeing the process happening every day I’d find it hard to swallow too, but this is real. Bill Gates is supposed to have said “Most people overestimate what they can do in one year and underestimate what they can do in ten years“, and this is how I feel about the replacement of traditional software with deep learning. There will be a long ramp-up as knowledge diffuses through the developer community, but in ten years I predict most software jobs won’t involve programming. As Andrej memorably puts it, “[deep learning] is better than you”!