I’m coming up to a year at Google now, and I’ve been continuing to have an amazing time with the deep learning team here. Deep networks are not a silver bullet for all AI problems, but they do mean we are moving from a cottage industry of bespoke machine learning specialists hand-carving algorithms for each new problem, to mass production where general software engineers can get good results by applying the same off-the-shelf approaches to a lot of different areas. If you have a good object recognition network architecture, you can get damn fine results on scene recognition, location estimation, and a whole host of other tasks using the same model, just by varying the training data, or even just retraining the top layer.
The tools aren’t particularly easy to use right now which makes deep learning seem very intimidating, but work like Andrej Karpathy’s ConvNetJS shows that the code can be expressed in much more understandable ways. As the libraries and documentation mature, we’ll see tools that let any software engineer create their own deep learning solution by just creating a training set that expresses their problem and feed it into an automated system. I imagine there will be separate approaches for the big areas of images, speech, and natural language, but we’re at the point where we can produce semantically meaningful intermediate representations from all those kinds of real-world data, and then straightforwardly train against those. Anyway, enough of my excited ramblings, I mostly wanted to share some interesting deep learning articles I’ve seen recently.
How Google Translate Squeezes Deep Learning onto a Phone – I’ve been lucky enough to work with the former WordLens team to get their amazing augmented reality visual translator using deep neural networks for the character recognition, directly on the device. It was nice to see the technology from the WordLens and Jetpac acquisitions come together with all of the experience and smarts of the wider Google teams to make something this fun.
Composing Music with Recurrent Neural Networks – Mozart’s job is still safe for a while based on the final results, but it’s a great demonstration of how it’s getting easier for non-specialists to start working with neural networks. It also has the best explanation of LSTMs that I’ve seen!
gemmlowp – Benoit Jacob, of Eigen fame, has been doing a fantastic job of optimizing the kind of eight-bit matrix multiply routines that I find essential for running networks on device. Even better, because Google’s very supportive of open source, we’ve been able to release it publicly on Github. It’s been a great project to collaborate on, and I’m happy that we’ve been able to share the results.
Visualizing GoogLeNet Classes – I love that we’re still at the stage where we don’t really know how these networks work under the hood, and investigations like these are great ways of exploring what these strange mechanisms we’ve created are actually doing.
How a Driverless Car Sees the World – Yes, I have drunk the Google Kool-aid, but I’ve long thought this is one of the coolest projects happening right now. This is a great rundown of some of the engineering challenges it’s facing, including wheelchairs doing donuts in the road.