Deep Learning is Eating Software

pacman

Photo by John Watson

When I had a drink with Andrej Karpathy a couple of weeks ago, we got to talking about where we thought machine learning was going over the next few years. Andrej threw out the phrase “Software 2.0”, and I was instantly jealous because it captured the process I see happening every day across hundreds of projects. I held my tongue until he got his blog post out there, but now I want to expand my thoughts on this too.

The pattern is that there’s an existing software project doing data processing using explicit programming logic, and the team charged with maintaining it find they can replace it with a deep-learning-based solution. I can only point to examples within Alphabet that we’ve made public, like upgrading search ranking, data center energy usage, language translation, and solving Go, but these aren’t rare exceptions internally. What I see is that almost any data processing system with non-trivial logic can be improved significantly by applying modern machine learning.

This might sound less than dramatic when put in those terms, but it’s a radical change in how we build software. Instead of writing and maintaining intricate, layered tangles of logic, the developer has to become a teacher, a curator of training data and an analyst of results. This is very, very different than the programming I was taught in school, but what gets me most excited is that it should be far more accessible than traditional coding, once the tooling catches up.

The essence of the process is providing a lot of examples of inputs, and what you expect for the outputs. This doesn’t require the same technical skills as traditional programming, but it does need a deep knowledge of the problem domain. That means motivated users of the software will be able to play much more of a direct role in building it than has ever been possible. In essence, the users are writing their own user stories and feeding them into the machinery to build what they want.

Andrej focuses on areas like audio and speech recognition in his post, but I’m actually arguing that there will be an impact across many more domains. The classic “Machine Learning: The High-Interest Credit Card of Technical Debt” identifies a very common pattern where machine learning systems become embedded in deep stacks of software. What I’m seeing is that the problem is increasingly solved by replacing the whole stack with a deep learning model! Taking the analogy to breaking point, this is like consolidating all your debts into a single loan with lower payments. A single model is far easier to improve than a set of deeply interconnected modules, and the maintenance becomes far easier. For many large systems there’s no one person who can claim to understand what they’re actually doing anyway, so there’s no real loss in debuggability or control.

I know this will all sound like more deep learning hype, and if I wasn’t in the position of seeing the process happening every day I’d find it hard to swallow too, but this is real. Bill Gates is supposed to have said “Most people overestimate what they can do in one year and underestimate what they can do in ten years“, and this is how I feel about the replacement of traditional software with deep learning. There will be a long ramp-up as knowledge diffuses through the developer community, but in ten years I predict most software jobs won’t involve programming. As Andrej memorably puts it, “[deep learning] is better than you”!

9 responses

  1. To me it seems like a case of “if all you have is a hammer, everything looks like a nail”. Of course, DL will be increasingly more important at companies like Google, Tesla and other “companies at scale”. But most lines of code are being written for cases, where one is not in a position to be “a teacher”. These software projects are very small and very specific. How will DL work there?

  2. It’s the combination of traditional programming skills with machine learning skills that will have the most impact in the short to mid term. Once ML tools and meta-tools mature this will shift more towards ML.

  3. I spent over 20 years in the semiconductor industry. The EDA tools used by designers seems like a disruption opportunity to me – very high per-seat pricing, hard to use, convoluted & constantly changing tool chains, … I could go on.

  4. I think within 5-10 years, kids will be training their ML pets at age 5, and it will continue throughout one’s education. Programmers will continue to operate in dark corners.

  5. “For many large systems there’s no one person who can claim to understand what they’re actually doing anyway, so there’s no real loss in debuggability or control”. Is this Software Engineering surrender?

  6. This article gets across the big points well. Example, this part on the need for domain expertise is spot on, “The essence of the process is providing a lot of examples of inputs, and what you expect for the outputs. This doesn’t require the same technical skills as traditional programming, but it does need a deep knowledge of the problem domain.”

  7. Pingback: Are deep neural nets "Software 2.0"? - Michael S. Chimenti's Bioinformatics Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: