When I chat to people about deep learning, they often ask me what I think its future is. Is it a fad, or something more enduring? What new opportunities are going to appear? I don’t have a crystal ball, but I have now spent a lot of time implementing deep neural networks for vision, and I’m also old enough to have worked through a couple of technology cycles. I’m going to make some ‘sophisticated wild-ass guesses’ about where I think things will head.
Deep learning eats the world
I strongly believe that neural networks have finally grown up enough to fulfil their decades-old promise. All applications that process natural data (speech-recognition, natural language processing, computer vision) will rely on it. This is already happening on the research side, but it will take a while to percolate fully through to the commercial sector.
Training and running a model will require different tools
Right now experimenting with new network architectures and train models is done with the same tools we use to run the models to generate predictions. To me, trained neural networks look a lot like compiled programs in a very limited assembler language. They’re essentially just massive lists of weights with a description of the order to execute them in. I don’t see any reason why the tools we use to develop them, which we use to change, iterate, debug, and train networks, should be used to execute them in production, with its very different requirements around interoperability with existing systems and performance constraints. I also think we’ll end up with small numbers of research-oriented folks who develop models, and a wider group of developers who apply them with less understanding of what’s going on inside the black box.
Traditional approaches will fight back
Deep learning isn’t the end of computer science, it’s just the current top dog. Millions of man-years have gone into researching other approaches to computer vision for example, and my bet is that once researchers have absorbed some of the lessons behind deep learning’s success (eg use massive numbers of training images, letting the algorithm pick the features), we’ll see better versions of the old algorithms emerge. We might even see hybrids, for example I’m using an SVM as the final layer of my network to enable fast retraining on embedded devices.
There will be a gold-rush around production-ready tools
Deep learning eating the world means rapidly growing demand for new solutions, as it spreads from research into production. The tools will need to fit into legacy ecosystems, so things like integration with OpenCV and Hadoop will become very important. As they get used at large scale, the power and performance costs of running the networks will become a lot more important, as opposed to the raw speed of training that current researchers are focused on. Developers will want to be able to port their networks between frameworks, so they can use the one that has the right tradeoffs for their requirements, rather than being bound to whatever system they trained the model on as they are right now.
What does it all mean?
With all these assumptions, here’s where I think we’re headed. Researchers will focus on expanding and improving the current crop of training focused libraries and IDEs (Caffe, Theanos). Other developers will start producing solutions that can be used more widely. They’ll be able to compete on ease-of-use, performance (not just raw speed, but also power consumption and hardware costs), and which environments they run in (language integration, distributed systems support via Hadoop or Spark, embedded devices).
One of the ways to improve performance is with specialized hardware, but there are some serious obstacles to overcome first. One of them is that the algorithms themselves are in flux, I think there will be a lot of changes over the next few years, which makes solidifying them into chips hard. Another is that almost all the time in production systems is spent doing massive matrix multiplies, which existing GPUs happen to be great at parallelizing. Even SIMD instructions on ordinary CPUs are highly effective at giving good performance. If deep networks need to be run as part of larger systems, the latency involved in transferring between the CPU and specialized hardware will kill speed, just as it does with a lot of attempts to use GPUs in real-world programs. Finally, a lot of the interest seems to be around encoding the training process into a chip, but in my future, only a small part of the technology world trains new models, everyone else is just executing off-the-shelf networks. With that all said, I’m still fascinated by the idea of a new hardware approach to the problem. Since I see neural networks as programs, building chips to run them is very tempting. I’m just wary that it may be five or ten years before they make commercial sense, not two or three.
Anyway, I hope this gives you some idea of how things look from my vantage point. I’m excited to be involved with a technology that’s got so much room to grow, and I can’t wait to see where it goes from here!