What does the future hold for deep learning?

clockwork

Photo by Pierre J.

When I chat to people about deep learning, they often ask me what I think its future is. Is it a fad, or something more enduring? What new opportunities are going to appear? I don’t have a crystal ball, but I have now spent a lot of time implementing deep neural networks for vision, and I’m also old enough to have worked through a couple of technology cycles. I’m going to make some ‘sophisticated wild-ass guesses’ about where I think things will head.

Deep learning eats the world

I strongly believe that neural networks have finally grown up enough to fulfil their decades-old promise. All applications that process natural data (speech-recognition, natural language processing, computer vision) will rely on it. This is already happening on the research side, but it will take a while to percolate fully through to the commercial sector.

Training and running a model will require different tools

Right now experimenting with new network architectures and train models is done with the same tools we use to run the models to generate predictions. To me, trained neural networks look a lot like compiled programs in a very limited assembler language. They’re essentially just massive lists of weights with a description of the order to execute them in. I don’t see any reason why the tools we use to develop them, which we use to change, iterate, debug, and train networks, should be used to execute them in production, with its very different requirements around interoperability with existing systems and performance constraints. I also think we’ll end up with small numbers of research-oriented folks who develop models, and a wider group of developers who apply them with less understanding of what’s going on inside the black box.

Traditional approaches will fight back

Deep learning isn’t the end of computer science, it’s just the current top dog. Millions of man-years have gone into researching other approaches to computer vision for example, and my bet is that once researchers have absorbed some of the lessons behind deep learning’s success (eg use massive numbers of training images, letting the algorithm pick the features), we’ll see better versions of the old algorithms emerge. We might even see hybrids, for example I’m using an SVM as the final layer of my network to enable fast retraining on embedded devices.

There will be a gold-rush around production-ready tools

Deep learning eating the world means rapidly growing demand for new solutions, as it spreads from research into production. The tools will need to fit into legacy ecosystems, so things like integration with OpenCV and Hadoop will become very important. As they get used at large scale, the power and performance costs of running the networks will become a lot more important, as opposed to the raw speed of training that current researchers are focused on. Developers will want to be able to port their networks between frameworks, so they can use the one that has the right tradeoffs for their requirements, rather than being bound to whatever system they trained the model on as they are right now.

What does it all mean?

With all these assumptions, here’s where I think we’re headed. Researchers will focus on expanding and improving the current crop of training focused libraries and IDEs (Caffe, Theanos). Other developers will start producing solutions that can be used more widely. They’ll be able to compete on ease-of-use, performance (not just raw speed, but also power consumption and hardware costs), and which environments they run in (language integration, distributed systems support via Hadoop or Spark, embedded devices).

One of the ways to improve performance is with specialized hardware, but there are some serious obstacles to overcome first. One of them is that the algorithms themselves are in flux, I think there will be a lot of changes over the next few years, which makes solidifying them into chips hard. Another is that almost all the time in production systems is spent doing massive matrix multiplies, which existing GPUs happen to be great at parallelizing. Even SIMD instructions on ordinary CPUs are highly effective at giving good performance. If deep networks need to be run as part of larger systems, the latency involved in transferring between the CPU and specialized hardware will kill speed, just as it does with a lot of attempts to use GPUs in real-world programs. Finally, a lot of the interest seems to be around encoding the training process into a chip, but in my future, only a small part of the technology world trains new models, everyone else is just executing off-the-shelf networks. With that all said, I’m still fascinated by the idea of a new hardware approach to the problem. Since I see neural networks as programs, building chips to run them is very tempting. I’m just wary that it may be five or ten years before they make commercial sense, not two or three.

Anyway, I hope this gives you some idea of how things look from my vantage point. I’m excited to be involved with a technology that’s got so much room to grow, and I can’t wait to see where it goes from here!

4 responses

  1. Pingback: Futureseek Daily Link Review; 8 July 2014 | Futureseek Link Digest

  2. “… in my future, only a small part of the technology world trains new models, everyone else is just executing off-the-shelf networks.”

    That’s one view and it’s a good one from the cloud-services point of view, but consider this: Do you want your domestic robot to be able to learn on the job? If so, where does that learning capability reside? Do you really want to teach how you organise your towels? Indeed, can a centrally-learned system distinguish between everyone’s own airing cupboards?

    In my future (;-) this makes local learning a big desirable. Sure, I want to deliver factory-trained robots that know how to fold towels but I want them to have local learning capability so I can teach where to put them mine away. The same applies to many situations where on-job training is necessary.

    This creates a need for efficient local learning, whichever platform the learning is on. Artificial Learning’s bet is new hardware for local learning offers overwhelming advantages.

    Everything else you said makes a lot of sense! Cheers –Pete Newman http://www.artificiallearning.com

  3. I agree with you on the need for on-the-job learning, it’s just that in my imagined future it’s implemented using a fairly thin layer on top of pre-trained models. The ‘Decaf’ approach works really well in practice for learning entirely new categories by using a network trained on the original 1,000 Imagenet classes and running an SVM on the penultimate layer. For the towel example I’d imagine a lot of work at the factory training it how to recognize towels and some typical cupboards, and then a fairly simple training stage on delivery where you show it how you like yours arranged, with that high-level logic happening after all the heavy neural network lifting.

    I could well be proven wrong of course, I’ll be following how things progress over the next few years!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 801 other followers

%d bloggers like this: