Yesterday a friend emailed, asking “What’s going on with deep learning? I keep hearing about more and more companies offering it, is it something real or just a fad?“. A couple of years ago I was very skeptical of the hype that had emerged around the whole approach, but then I tried it, and was impressed by the results I got. I still try to emphasize that they’re not magic, but here’s why I think they’re worth getting excited about.
They work really, really well
Neural networks have been the technology-of-the-future since the 1950’s, with massive theoretical potential but lacklustre results in practice. The big turning point in public perception came when a deep learning approach won the equivalent of the World Cup for computer vision in 2012. Just look at the results table, the Super Vision team, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, absolutely trounced their closest competitors. It wasn’t a fluke, here’s a good overview of a whole bunch of other tasks where the approach is either beating more traditional approaches or providing comparable results. I can back this up with my own experience, and they’ve consistently won highly-competitive Kaggle competitions too.
I’m focused on computer vision, but deep neural networks have already become the dominant approach in speech recognition, and they’re showing a lot of promise for making sense of text too. There’s no other technique that applies to so many different areas, and that means that any improvements in one field have a good chance of applying to other problems too. People who learn how to work with deep neural nets can keep re-using that skill across a lot of different domains, so it’s starting to look like a valuable foundational skill for practical coders rather than a niche one for specialized academics. From a research perspective it makes the approach worth investing in too, because they show a lot of promise for tackling a wide range of topics.
With neural networks you’re not telling a computer what to do, you’re telling it what problem to solve. I try to describe what this means in practice in my post about becoming a computer trainer, but the key point is that the development process is a lot more efficient once you hand over implementation decisions to the machine. Instead of a human with a notebook trying to decide whether to look for corners or edges to help spot objects in images, the algorithm looks at a massive number of examples and decides for itself which features are going to be useful. This is the kind of radical change that artificial intelligence has been promising for decades, but has seldom managed to deliver until now.
There’s lots of room for improvement
Even though the Krizhevsky approach won the 2012 Imagenet competition, nobody can claim to fully understand why it works so well, which design decisions and parameters are most important. It’s a fantastic trial-and-error solution that works in practice, but we’re a long way from understanding how it works in theory. That means that we can expect to see speed and result improvements as researchers gain a better understanding of why it’s effective, and how it can be optimized. As one of my friends put it, a whole generation of graduate students are being sacrificed to this effort, but they’re doing it because the potential payoff is so big.
I don’t want you to just jump on the bandwagon, but deep learning is a genuine advance, and people are right to be excited about it. I don’t doubt that we’re going to see plenty of other approaches trying to improve on its results, it’s not going to be the last word in machine learning, but it has been a big leap forward for the field, and promises a lot more in years to come.