How to Quantize Neural Networks with TensorFlow

Screen Shot 2016-05-02 at 9.59.55 PM

Picture by Jaebum Joo

I’m pleased to say that we’ve been able to release a first version of TensorFlow’s quantized eight bit support. I was pushing hard to get it in before the Embedded Vision Summit, because it’s especially important for low-power and mobile devices, so it’s exciting to get it out there. All this documentation will be appearing on the main TensorFlow site also, but since I’ve talked so much about why eight-bit is important here, I wanted to give an overview of what we’ve released in this post too.

When modern neural networks were being developed, the biggest challenge was getting them to work at all! That meant that accuracy and speed during training were the top priorities. Using floating point arithmetic was the easiest way to preserve accuracy, and GPUs were well-equipped to accelerate those calculations, so it’s natural that not much attention was paid to other numerical formats.

These days, we actually have a lot of models being being deployed in commercial applications. The computation demands of training grow with the number of researchers, but the cycles needed for inference expand in proportion to users. That means pure inference efficiency has become a burning issue for a lot of teams.

That is where quantization comes in. It’s an umbrella term that covers a lot of different techniques to store numbers and perform calculations on them in more compact formats than 32-bit floating point. I am going to focus on eight-bit fixed point, for reasons I’ll go into more detail on later.

Why does Quantization Work?

Training neural networks is done by applying many tiny nudges to the weights, and these small increments typically need floating point precision to work (though there are research efforts to use quantized representations here too).

Taking a pre-trained model and running inference is very different. One of the magical qualities of deep networks is that they tend to cope very well with high levels of noise in their inputs. If you think about recognizing an object in a photo you’ve just taken, the network has to ignore all the CCD noise, lighting changes, and other non-essential differences between it and the training examples it’s seen before, and focus on the important similarities instead. This ability means that they seem to treat low-precision calculations as just another source of noise, and still produce accurate results even with numerical formats that hold less information.

Why Quantize?

Neural network models can take up a lot of space on disk, with the original AlexNet being over 200 MB in float format for example. Almost all of that size is taken up with the weights for the neural connections, since there are often many millions of these in a single model. Because they’re all slightly different floating point numbers, simple compression formats like zip don’t compress them well. They are arranged in large layers though, and within each layer the weights tend to be normally distributed within a certain range, for example -3.0 to 6.0.

The simplest motivation for quantization is to shrink file sizes by storing the min and max for each layer, and then compressing each float value to an eight-bit integer representing the closest real number in a linear set of 256 within the range. For example with the -3.0 to 6.0 range, a 0 byte would represent -3.0, a 255 would stand for 6.0, and 128 would represent about 1.5. I’ll go into the exact calculations later, since there’s some subtleties, but this means you can get the benefit of a file on disk that’s shrunk by 75%, and then convert back to float after loading so that your existing floating-point code can work without any changes.

Another reason to quantize is to reduce the computational resources you need to do the inference calculations, by running them entirely with eight-bit inputs and outputs. This is a lot more difficult since it requires changes everywhere you do calculations, but offers a lot of potential rewards. Fetching eight-bit values only requires 25% of the memory bandwidth of floats, so you’ll make much better use of caches and avoid bottlenecking on RAM access. You can also typically use SIMD operations that do many more operations per clock cycle. In some case you’ll have a DSP chip available that can accelerate eight-bit calculations too, which can offer a lot of advantages.

Moving calculations over to eight bit will help you run your models faster, and use less power (which is especially important on mobile devices). It also opens the door to a lot of embedded systems that can’t run floating point code efficiently, so it can enable a lot of applications in the IoT world.

Why Not Train in Lower Precision Directly?

There have been some experiments training at lower bit depths, but the results seem to indicate that you need higher than eight bit to handle the back propagation and gradients. That makes implementing the training more complicated, and so starting with inference made sense. We also already have a lot of float models already that we use and know well, so being able to convert them directly is very convenient.

How Can You Quantize Your Models?

TensorFlow has production-grade support for eight-bit calculations built it. It also has a process for converting many models trained in floating-point over to equivalent graphs using quantized calculations for inference. For example, here’s how you can translate the latest GoogLeNet model into a version that uses eight-bit computations:

curl -o /tmp/inceptionv3.tgz
tar xzf /tmp/inceptionv3.tgz -C /tmp/
bazel build tensorflow/contrib/quantization/tools:quantize_graph
bazel-bin/tensorflow/contrib/quantization/tools/quantize_graph \
--input=/tmp/classify_image_graph_def.pb \
--output_node_names="softmax" --output=/tmp/quantized_graph.pb \

This will produce a new model that runs the same operations as the original, but with eight bit calculations internally, and all weights quantized as well. If you look at the file size, you’ll see it’s about a quarter of the original (23MB versus 91MB). You can still run this model using exactly the same inputs and outputs though, and you should get equivalent results. Here’s an example:

bazel build tensorflow/examples/label_image:label_image
bazel-bin/tensorflow/examples/label_image/label_image \
--input_graph=/tmp/quantized_graph.pb \
--input_width=299 \
--input_height=299 \
--mean_value=128 \
--std_value=128 \
--input_layer_name="Mul:0" \

You’ll see that this runs the newly-quantized graph, and outputs a very similar answer to the original.

You can run the same process on your own models saved out as GraphDefs, with the input and output names adapted to those your network requires. I recommend that you run them through the freeze_graph script first, to convert checkpoints into constants stored in the file.

How Does the Quantization Process Work?

We’ve implemented quantization by writing equivalent eight-bit versions of operations that are commonly used during inference. These include convolution, matrix multiplication, activation functions, pooling operations and concatenation. The conversion script first replaces all the individual ops it knows about with quantized equivalents. These are small sub-graphs that have conversion functions before and after to move the data between float and eight-bit. Below is an example of what they look like. First here’s the original Relu operation, with float inputs and outputs:


Then, this is the equivalent converted subgraph, still with float inputs and outputs, but with internal conversions so the calculations are done in eight bit.


The min and max operations actually look at the values in the input float tensor, and then feeds them into the Dequantize operation that converts the tensor into eight-bits. There’s more details on how the quantized representation works later on.

Once the individual operations have been converted, the next stage is to remove unnecessary conversions to and from float. If there are consecutive sequences of operations that all have float equivalents, then there will be a lot of adjacent Dequantize/Quantize ops. This stage spots that pattern, recognizes that they cancel each other out, and removes them, like this:


Applied on a large scale to models where all of the operations have quantized equivalents, this gives a graph where all of the tensor calculations are done in eight bit, without having to convert to float.

What Representation is Used for Quantized Tensors?

We approach converting floating-point arrays of numbers into eight-bit representations as a compression problem. We know that the weights and activation tensors in trained neural network models tend to have values that are distributed across comparatively small ranges (for example you might have -15 to +15 for weights, -500 to 1000 for activations on an image model, though the exact numbers will vary). We also know from experiment that neural nets tend to be very robust in the face of noise, and so the noise-like error produced by quantizing down to a small set of values will not hurt the precision of the overall results very much. We also want to pick a representation that’s easy to perform calculations on, especially the large matrix multiplications that form the bulk of the work that’s needed to run a model.

These led us to pick a representation that has two floats to store the overall minimum and maximum values that are represented by the lowest and highest quantized value. Each entry in the quantized array represents a float value in that range, distributed linearly between the minimum and maximum. For example, if we have minimum = -10.0, and maximum = 30.0f, and an eight-bit array, here’s what the quantized values represent:

Quantized | Float
   0      | -10.0
 255      |  30.0
 128      |  10.0

The advantages of this format are that it can represent arbitrary magnitudes of ranges, they don’t have to be symmetrical, it can represent signed and unsigned values, and the linear spread makes doing multiplications straightforward. There are alternatives like Song Han’s code books that can use lower bit depths by non-linearly distributing the float values across the representation, but these tend to be more expensive to calculate on.

The advantage of having a strong and clear definition of the quantized format is that it’s always possible to convert back and forth from float for operations that aren’t quantization-ready, or to inspect the tensors for debugging purposes. One implementation detail in TensorFlow that we’re hoping to improve in the future is that the minimum and maximum float values need to be passed as separate tensors to the one holding the quantized values, so graphs can get a bit dense!

How do we Determine Ranges?

The nice thing about the minimum and maximum ranges is that they can often be pre-calculated. Weight parameters are constants known at load time, so their ranges can also be stored as constants. We often know the ranges for inputs (for examples images are usually RGB values in the range 0.0 to 255.0), and many activation functions have known ranges too. This can avoid having to analyze the outputs of an operation to determine the range, which we need to do for math ops like convolution or matrix multiplication which produce 32-bit accumulated results from 8-bit inputs.

If you’re doing any kind of arithmetic on 8-bit inputs, you’ll naturally start to accumulate results that have more than 8 bits of precision. If you add two 8 bit values, the result needs 9 bits. If you multiply two 8 bit numbers, you get 16 bits in the output. If you total up a series of 8-bit multiplications, like we do for matrix multiplication, the results grow beyond 16 bits, with the accumulator typically needing at least 20 to 25 bits, depending on how long the dot products involved are.

This can be an issue for our quantization approach, since we need to take an output that’s much wider than 8 bits and shrink it down to feed into the next operation. One way to do it for matrix multiplies would be to calculate the largest and smallest possible output values, assuming all of the input values were at extremes. This is safe, since we know mathematically that no results can fall outside this range, but in practice most weights and activation values are much more evenly distributed. This means that the actual range of values we see is much smaller than the theoretical one, so if we used the larger bounds we’d be wasting a lot of our 8 bits on numbers that never appeared. Instead, we use the QuantizeDownAndShrinkRange operator to take a 32-bit accumulated tensor, analyze it to understand the actual ranges used, and rescale so that the 8-bit output tensor uses that range effectively. There are strategies that involve observing the actual minimums and maximums encountered with large sets of training data, and hard-coding those to avoid analyzing the buffer for ranges every time, but we don’t currently include that optimization.

How is the Rounding Done?

One of the hardest and most subtle problems we hit during quantization was the accumulation of biases. As I mentioned above, neural networks are very resilient to noise, but unless you’re very careful with rounding it’s easy to introduce biases in a single direction that build up during computation and wreck the final accuracy. You can see the final formula in the code, but the important part was that we needed to subtract the rounded version of the minimum from the rounded version of the float input value, rather than subtracting float minimum from the input and then rounding.

What’s Next?

We’ve found that we can get extremely good performance on mobile and embedded devices by using eight-bit arithmetic rather than floating-point. You can see the framework we use to optimize matrix multiplications at gemmlowp. We still need to apply all the lessons we’ve learned to the TensorFlow ops to get maximum performance on mobile, but we’re actively working on that. Right now, this quantized implementation is a reasonably fast and accurate reference implementation that we’re hoping will enable wider support for our eight-bit models on a wider variety of devices.

If you’re interested, I highly recommend digging through the quantization code in TensorFlow, especially looking at the kernels that implement quantized ops. These all include reference implementations that we’re hoping will help portability to new hardware devices.

We also hope that this demonstration will encourage the community to explore what’s possible with low-precision neural networks. Thanks to everyone who helped put the quantization support together, it’s been great getting this out there!

How to break into machine learning


Photo by Erich Ferdinand

An engineer recently asked me how she could turn an interest in machine learning into a full-time job. This can be a daunting prospect, because the whole field has until recently been very separate from traditional engineering, with only a few specialists at large companies using it in production, often far from traditional product teams. I took a very random path to focusing on deep learning full time, but so did most of the people I work with. It’s not clear that there is one good route, but I wanted to share the advice I had to offer in case it’s helpful to others.

Become a Designated Machine Learner

Every manager should point at one member of their team and say “You are now our machine learning expert”. If your manager doesn’t do that for you, announce it yourself to anyone who will listen. This may sound like madness, but machine learning is rapidly invading almost every product area, so whether you’re in games or enterprise software, your group needs to at least stay up to date with what’s happening with the technology. If you aren’t, then your competitors are!

You may have to fight your own imposter syndrome, but becoming the go-to person for everyone’s questions about machine learning is a fantastic way to teach yourself the essentials. You’ll have to say “Good question, let me go figure that out” a lot at first, but every expert I know does the same! Even if you don’t end up building anything in production, at least you’ll be able to point at relevant research and experiments if you decide to change to a new position.

Enter Competitions

I have been a massive fan of Kaggle since it got off the ground. If your job’s not offering you the opportunities in machine learning you want, then joining that community is a great way to teach yourself a lot of practical skills. If you look through the forums, a lot of the contestants will describe exactly how they solved old competitions, so I would recommend following a few of their recipes to get started. Once you’re able to do that, pick a new contest that’s similar to one of those, and start playing around with all of the different options to see how you can improve the results. Most of machine learning is the software equivalent of banging on the side of the TV set until it works, so don’t be discouraged if you have trouble seeing an underlying theory behind all your tweaking!

Find a Community

As I mentioned above, the most frustrating thing about machine learning is how arbitrary it all is. I’m lucky enough to be at a large company surrounded by people I can talk to about things like why my model isn’t learning, but most engineers don’t have that luxury. That’s another advantage of Kaggle, from what I’ve seen their forums offer a lot of support and encouragement. I would also look out for real-world meetups where you can swap stories and commiserate. If you can’t find something related to your field, try starting a mailing list or group yourself, or propose a session at a conference.

There is a long tradition of mentorship in machine learning, especially around deep learning, but I think we should be doing a lot better job of capturing all that oral tradition. As someone who was recently an outside myself, I want to see the field democratized. I think the reliance on word-of-mouth is more about poor written communication than anything inherent in the subject.

Write Documentation

On that topic, my TensorFlow for Poets post came out of work I was doing to help myself understand how to reliably retrain the top layer of a deep network. I didn’t know how before I started, but by carefully documenting the process and making sure I could reproduce it consistently, I learned a lot about how it all works. I also got a lot of helpful feedback as I shared drafts of the guide with colleagues.

One interesting thing about human nature is that people are a lot more willing to correct somebody else’s mistaken ideas than they are to propose their own. As long as you’re happy to keep eating humble pie, that means writing up your own tentative understanding and getting it reviewed is a lot more effective way of getting others to share their knowledge than asking flat out! That’s another reason I try to do documentation, purely for the corrections.


Unless you’re doing a degree at a recognized university, I personally don’t recommend going for a credential in machine learning. I do love courses like the Udacity Deep Learning program, but for the content not as a resumé builder. Having practical experience, even just on competitions like Kaggle, will be a lot more helpful in interviews.

As an engineer, I also find many machine learning research papers hard to get much benefit from. They tend to assume a lot of prior knowledge from the academic world, and prefer presenting their ideas in math rather than code. They can be useful once you’re experienced, but don’t worry if you’re left baffled by them at first.

Anyway, I hope some of these ideas are useful. Definitely read them with a skeptical eye, nobody really knows anything in this field, and I’ll be interested to hear what other suggestions people have!

Nano-computers are coming!


Photo by Steve Jurvetson

A few days ago I got an email from a journalist asking about the Starshot project. Of course he was looking for my much-more famous namesake Pete Worden, but I’ve been fascinated by the effort too. Its whole foundation is that we’ll soon be able to miniaturize space probes down to a few grams and have them function on tiny amounts of power. Over the past few years I’ve come to realize that’s the future of computing.

Imagine having a self-contained system that costs a few cents, is only a couple of millimeters wide, with a self-contained battery, processor, and basic CCD image sensor. Using modern deep learning techniques, you could train it to recognize crop pests or diseases on leaves and then scatter a few thousand across a field. Or sprinkle them through a jungle to help spot endangered wildlife. They could be spread over our bridges to spot corrosion before it gets started, or for any of the Semantic Sensor uses I’ve talked about before.

I know how useful these systems will be once they exist, but there are some major engineering challenges to solve before we get there. That’s why I’m excited to be going to the Embedded Vision Summit in a couple of weeks. Jeff Bier has gathered together a fantastic group of developers and industry leaders who are working on making this future happen. We’ll also have a strong presence from the TensorFlow team, to show how important embedded devices are to us. Jeff Dean will be keynoting and I’ll be discussing the nitty-gritty of using the framework on tiny devices.

If you’re intrigued by the idea of these “nano-computers”, and want to find out more (or even better if you’re already working on them like several folks I know!) I highly recommend joining me at the Summit in Santa Clara, May 2nd to 4th.

Hiking Montara Mountain


I finished off Firewatch yesterday, and it made me nostalgic for the days when I’d hike almost every weekend. I realized that part of it was because I don’t know enough of the local trails in San Francisco, so I decided to explore the wonderful Bay Area Hiker site for nearby hikes that would get me into the wilderness without taking up the whole day.

I ended up choosing the Montara Mountain trail, and I’m very glad I did! It’s just outside of underrated Pacifica (which I’m always surprised isn’t the Malibu of San Francisco) and I was especially excited to get a closer look at the vast Peninsula Watershed area that’s currently closed to the public.

The trail guide from BA Hiker was excellent, despite dating from 2003. There was lots of room to park at the trailhead, possibly due to the $6 fee, and very clear signage for the trail that included distances. After the rains we’ve had this winter, the wildflowers were starting to blossom.IMG_2941

It was great seeing my old friend from Los Angeles, Ceanothus (or wild lilac) with a full set of blossoms too.


The wet weather made life very pleasant for a banana slug I encountered slithering across the trail as well.


The trailbed was in great condition, there were obviously some good crews taking care of the swales and drainages so it was all very hikeable despite El Ninó. A bridge on the Brooks Falls trail that forms part of the return loop was washed out though, so I made it an out and back. It was a seven mile trip with 1,600 feet of elevation gain, with most of the outward part a steady uphill slog with one or two steeper sections. The views from the higher sections make the effort worthwhile though.


I caught a glimpse of the Watershed where the trail finished, blocked by a gate and fence, but by then it was starting to rain a little so I headed back down quickly.


It was a great hike, taking a little under three hours despite how little I’ve hiked recently, and the trailhead’s only thirty minutes from central San Francisco, so it’s convenient enough that I hope I’ll be able to fit it in even on busy weekends. Despite being so close to the city, once I got past the first mile it felt very wild, so I got a refreshing taste of nature as well. I’m looking forward to many more trips, and maybe a few more explorations of other nearby hikes on BA Hiker, since this one was so much fun!

TensorFlow for Poets

When I first started investigating the world of deep learning, I found it very hard to get started. There wasn’t much documentation, and what existed was aimed at academic researchers who already knew a lot of the jargon and background. Thankfully that has changed over the last few years, with a lot more guides and tutorials appearing.

I always loved EC2 for Poets though, and I haven’t seen anything for deep learning that’s aimed at as wide an audience. EC2 for Poets is an explanation of cloud computing that removes a lot of the unnecessary mystery by walking anyone with basic computing knowledge step-by-step through building a simple application on the platform. In the same spirit, I want to show how anyone with a Mac laptop and the ability to use the Terminal can create their own image classifier using TensorFlow, without having to do any coding.

I feel very lucky to be a part of building TensorFlow, because it’s a great opportunity to bring the power of deep learning to a mass audience. I look around and see so many applications that could benefit from the technology by understanding the images, speech, or text their users enter. The frustrating part is that deep learning is still seen as a very hard topic for product engineers to grasp. That’s true at the cutting edge of research, but otherwise it’s mostly a holdover from the early days. There’s already a lot of great documentation on the TensorFlow site, but to demonstrate how easy it can be for general software engineers to pick up I’m going to present a walk-through that takes you from a clean OS X laptop all the way to classifying your own categories of images. You’ll find written instructions in this post, along with a screencast showing exactly what I’m doing.


It’s possible to get TensorFlow running natively on OS X, but there’s less standardization around how the development tools like Python are installed which makes it hard to give one-size-fits-all instructions. To make life easier, I’m going to use the free Docker container system, which will allow me to install a Linux virtual machine that runs on my MacBook Pro. The advantage is that I can start from a known system image, and so the instructions are a lot more likely to work for everyone.

Installing Docker

There’s full documentation on installing Docker at, and it’s likely to be updated over time, but I will run through exactly what steps I took to get it running here.

  • I went to in my browser.
  • Step one of the instructions sent me to download the Docker Toolbox.
  • On the Toolbox page, I clicked on the Mac download button.
  • That downloaded a DockerToolbox-1.10.2.pkg file.
  • I ran that downloaded pkg to install the Toolbox.
  • At the end of the install process, I chose the Docker Quickstart Terminal.
  • That opened up a new terminal window and ran through an installation script.
  • At the end of the script, I saw ASCII art of a whale and I was left at a prompt.
  • I went back to step one of the instructions, and ran the suggested command in the terminal:
    docker run hello-world
  • This gave me output confirming my installation of Docker had worked:
    Hello from Docker.
    This message shows that your installation appears to be working correctly.

Installing TensorFlow

Now I’ve got Docker installed and running, I can get a Linux virtual machine with TensorFlow pre-installed running. We create daily development images, and ones for every major release. Because the example code I’m going to use came in after the last versioned release, 0.7.1, we’ll have to do some extra work below to update the source code using git, but once 0.8 comes out you could replace the ‘0.7.1’ below with the 0.8.0 instead, and skip the ‘Updating the Code’ section. The Docker section in the TensorFlow documentation has more information.

To download and run the TensorFlow docker image, use this command from the terminal:

docker run -it

This will show a series of download and extraction steps. These are the different components of the TensorFlow image being assembled. It needs to download roughly a gigabyte of data, so it can take a while on a slow network connection.

Once that’s complete, you’ll find yourself in a new terminal. This is now actually the shell for the Linux virtual machine you’ve downloaded. To confirm this has been successful, run this command:

ls /tensorflow

You should see a series of directories, including a tensorflow one and some .build files, something like this:

Screen Shot 2016-02-27 at 3.22.15 PM

Optimizing Docker

Often Docker is just used for testing web apps, where computational performance isn’t that important, so the speed of the processor in the virtual machine isn’t crucial. In our example we’re going to be doing some very heavy number-crunching though, so optimizing the configuration for speed is important.

Under the hood, Docker actually uses VirtualBox to run its images, and we’ll use its control panel to manage the setup. To do that, we’ll need to take the following steps:

  • Find the VirtualBox application on your Mac. I like to use spotlight to find and open it, so I don’t have to hunt around on the file system.
  • Once VirtualBox is open, you should see a left-hand pane showing virtual machines. There should be one called ‘default’ that’s running.
  • Right-click on ‘default’ to bring up the context menu and chose ‘Close->ACPI Shutdown’. The other close options should also work, but this is the most clean.
  • Once the shutdown is complete, ‘default’ should have the text ‘Powered off’ below it. Right click on it again and choose ‘Settings…’ from the menu.
  • Click on the ‘System’ icon, and then choose the ‘Motherboard’ tab.
  • Drag the ‘Base Memory’ slider as far as the green section goes, which is normally around 75% of your total laptop’s memory. So in my case it’s 12GB, because I have a 16GB machine.
  • Click on the ‘Processor’ tab, and set the number of processors higher than the default of 1. Most likely on a modern MacBook Pro 4 is a good setting, but use the green bar below the slider as a guide.
  • Click ‘OK’ on the settings dialog.
  • Right-click on ‘default’ and choose ‘Start->Headless Start’.

You should find that your terminal was kicked out of the Linux prompt when you stopped the ‘default’ box, but now you’ve restarted it you can run the same command to access it again:

docker run -it

The only difference is that now the virtual machine will have access to a lot more of your laptop’s computing power, and so the example should run a lot faster!

Downloading Images

The rest of this walk-through is based on the image-retraining example on the TensorFlow site. It shows you how to take your own images organized into folders by category, and use them to quickly retrain the top layer of the Inception image recognition neural network to recognize those categories. To get started, the first thing you need to do is get some example images. To begin, go to the terminal and enter the ‘exit’ command if you still see the ‘root@…’ prompt that indicates you’re still in the Linux virtual machine.

Then run the following commands to create a new folder in your Downloads directory to hold training images, and download and extract the flower photos:

cd $HOME
mkdir tf_files
cd tf_files
curl -O
tar xzf flower_photos.tgz
open flower_photos

This should end up with a new finder window opening, showing a set of five folders:

Screen Shot 2016-02-27 at 4.07.18 PM

This means you’ve successfully downloaded the example flower images. If you look at how they’re organized, you should be able to use the same structure with classes you care about, just replacing the folder names with the category labels you’re dealing with, and populating them with photos of those objects. There’s more guidance on that process in the tutorial.

Running the VM with Shared Folders

Now you’ve got some images to train with, we’re going to start up the virtual machine again, this time sharing the folder you just created with Linux so TensorFlow can access the photos:

docker run -it -v $HOME/tf_files:/tf_files

You should find yourself back in a Linux prompt. To make sure the file sharing worked, try the following command:

ls /tf_files/flower_photos

You should see a list of the flower folders, like this:

root@2c570d651d08:~# ls /tf_files/flower_photos
LICENSE.txt daisy dandelion roses sunflowers tulips

Updating the Code

For this example, we need the very latest code since it’s just been added. Unfortunately getting it is a little involved, with some use of the source control program git. I’ll walk through the steps below.

Pulling the code requires a default email address, which you can set to anything, since we’re not planning on pushing any changes back.

git config --global ""
git config --global "Your Name"

Now you should be able to pull the latest source.

cd /tensorflow/
git pull origin master

You’ll find yourself in a vim window. Just type ‘:quit’ to exit.

You should now have fully up-to-date code. We want to sync it to a version we know works though, so we’ll run this command:

git checkout 6d46c0b370836698a3195a6d73398f15fa44bcb2

Building the Code

If that worked, the next step is to compile the code. You may notice there’s some optimization flags in the command that help speed it up on processors with AVX, which almost all modern OS X machines have.

cd /tensorflow/
bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain

This part can take five to ten minutes, depending on the speed of your machine, as it’s compiling the full source code for TensorFlow. Don’t worry if you see a lot of warnings, this is normal (though we’re working on reducing them going forward).

Running the Code

I can now run the retraining process using this command:

bazel-bin/tensorflow/examples/image_retraining/retrain \
--bottleneck_dir=/tf_files/bottlenecks \
--model_dir=/tf_files/inception \
--output_graph=/tf_files/retrained_graph.pb \
--output_labels=/tf_files/retrained_labels.txt \
--image_dir /tf_files/flower_photos

You’ll see a message about downloading the Inception model, and then a long series of messages about creating bottlenecks. There’s around 3,700 photos in total to process, and my machine does around 200 a minute, so it takes around twenty minutes in total. If you want to know more about what’s happening under the hood while you wait, you can check out the tutorial for a detailed explanation.

I’ve changed the default /tmp destination for things like the output graph and cached bottlenecks to the shared /tf_files folder, so that the results will be accessible from OS X and will be retained between different runs of the virtual machine.

Once the bottlenecks are cached, it will then go into the training process, which takes another five minutes or so on my laptop. At the end, you should see the last output line giving the final estimated accuracy, which should be around 90%. That means you’ve trained your classifier to guess the right flower species nine times out of ten when shown a photo!

Using the Classifier

The training process outputs the retrained graph into /tmp/output_graph.pb, and to test it out yourself you can build another piece of sample code. The label_image example is a small C++ program that loads in a graph and applies it to a user-supplied image. Give it a try like this:

bazel build tensorflow/examples/label_image:label_image && \
bazel-bin/tensorflow/examples/label_image/label_image \
--graph=/tf_files/retrained_graph.pb \
--labels=/tf_files/retrained_labels.txt \
--output_layer=final_result \

You should see a result showing that it identified a daisy in that picture, though because the training process is random you may occasionally have a model that makes a mistak on the image. Try it with some of the other photos to get a feel for how it’s doing.

Next Steps

The first thing you’ll probably want to do is train a classifier for objects you care about in your application. This should be as simple as creating a new folder in your Downloads/tf_images directory, putting subfolders full of photos in it, and re-running the classifier commands. You can find more detailed advice on tuning that process in the tutorial.

Finally, you’ll want to use this in your own application! The label_image example is a good template to look at if you can integrate C++ into your product, and we even support running on mobile, so check out the Android sample code if you’d like to run on a smart phone.

Thanks for working through this process with me, I hope it’s inspired you to think about how you can use deep learning to help your users, and I can’t wait to see what you build!

How to Build an App if You’re Not a Developer


I often hear from friends who have an idea for an app, but aren’t software engineers. They want to know how they make progress without having to learn a whole new set of technical skills or fund a development team. They know I’ve worked at Apple and Google, and built my own app for Jetpac, so they’re hoping I can offer some guidance.

Happily there’s actually a lot you can do before you have to dive deep into engineering, so here’s my step by step guide. This based on the process Cathrine, Julian, Chris and I followed at Jetpac, so it’s actually the same process I recommend even if you do have engineers!


The hardest part of the development process is figuring out what your app should do. This may be hard to believe when you’re staring at a mountain of technical challenges, but understanding in detail how your app should behave is essential to getting it built. Changing the requirements once you’ve partially built it will cost you a lot more time than you expect, so trying to get as much feedback from users as early as possible is key.

The quickest way to start is to begin a new Powerpoint or Keynote slide deck. On the first slide, put a rough draft of the first screen a user will see. Don’t worry about making it pretty, just put in words for all the buttons a user can press, and any welcome text. If you want to get fancy, download blank iPhone graphics from Apple and put your content inside those frames. One the second slide, put what you expect the user to see after they’ve taken their next action. This can be a whole new screen, or just some change in the first screen. Keep doing that until you have at least one example ‘workflow’ showing the screens someone might see if they use the app for one session.

Now comes the most painful part. Find someone who you’re hoping might want to use the finished app, who’s part of your target audience. Try to make sure they’re not a close friend, and if you can don’t reveal it’s your app, what you want is as honest an opinion as possible. Start off by asking them if they’d be interested in downloading an app for ‘X’, where ‘X’ is your short description (e.g Instagram could be ‘an app for taking artistic photos and sharing them’). If they say no, or seem unenthusiastic, you’ve either got a problem with how you’re describing the app, they’re not actually part of your target market, or you need to rethink what your app does. If you can’t pass that basic test with at least one person, you will not get any downloads!

Assuming you’re at the point where your description has them interested, show them the first screen, and then walk them through the day in the life of the user like you’re telling a story. Have them ask you questions as you go about anything they find confusing and make notes. At the end ask them if they could see themselves using the app?

Once you’ve done this with a few people, go back to your description and Powerpoint slides, and try to address the problems that came up with new approaches. Then go back and do it all again with new people!

Don’t expect to get positive answers to any of these questions at first! It’s almost certain that you’ll have to keep repeating this process for weeks or months until you’ve truly understood what your users are looking for. Don’t feel like you’re being dumb, everyone has to go through this pain, and in fact not having an engineering background helps you because you’re not tempted to spend time writing real code that’s solving the wrong problems. Learn as much as you can from your users as early as possible, and you’ll get to a successful app much faster.


This is a trick that Cathrine and Julian came up with, so I can’t claim the credit, but it worked very well while we were prototyping Jetpac. The app was all about showing people gorgeous photos, so once we’d got through the slideshow phase, we needed a prototype that didn’t require much engineering, but looked really good, or we wouldn’t be able to gauge user reactions very well.

PDFs can contain links to other pages, and so by creating a series of screens as individual pages and having button images link to different ones, you can fake up a very attractive simulation of your app where users themselves can actually tap to make things happen. There’s obviously a lot of limitations, but if you create the PDF and then run it inside a PDF viewer that supports full-screen and links on a phone, it works surprisingly well.

How much visual design effort you want to put into this stage depends on the audience for your app. If it’s a utility and doesn’t have to be pretty, then you can mock up the PDF yourself even if you don’t possess any artistic skills. Otherwise, you’ll need help from a graphic designer. The good part is that you will have a good set of requirements from the Powerpoint process. You can hand over the outlines of the screens you need and then ask for what you need improved visually.

This is the first step where you may need to spend money on a professional. You can try to get away with a cousin who knows Photoshop, but you’re likely to get what you pay for. My recommendation is to either accept that it will be ugly and do it all yourself, or hire a proper freelance graphic designer and be prepared to pay their usual rates.

Once you have a PDF running on a phone, try handing it over to potential users and watch what they do. One approach we used was to put the phone flat on a desk and have another phone in a clamp recording video from above. We’d ask the person to describe what they were seeing and thinking as they tried to navigate the app, and then all watch the results afterwards to understand what did and didn’t work, and what confused people.

This is another stage you should spend as much time on as possible. Fixing problems now is far, far cheaper and faster than once decisions are baked into code.

Mobile Website

Wait a second, isn’t this a guide to building apps, not websites? You’re right, but I actually recommend prototyping using the mobile web as an intermediate stage to help you design your product. It’s much easier and cheaper to find web developers and designers, there are much better design tools, you can actually do a lot of things yourself with minimal technical skills, and the development environment lets you get things done much faster. There’s also very few technical things that you can’t do from a website on your phone. You can even take photos, grab GPS locations, and run advanced WebGL graphics in a mobile browser these days, and most of these features work across both iOS and Android, so you don’t have to develop different apps for both operating systems. The main downsides are that you don’t get native buttons and other UI elements, and things like page loading and animations can easily look bad.

Depending on what your app needs to do, you can try a variety of different approaches to development. If it’s a fairly simple set of content that you want people to be able to browse and search, you can even use an off-the-shelf website builder like Wix, Squarespace or WordPress that has good mobile templates, and just create the pages you need yourself. For anything else, you’ll need some engineers and designers to help.

The good news is that you should have a very clear idea of what you need after going through all the prototyping, so you can present a project with a very well-defined scope to any teams you’re evaluating. Having a good set of requirements will help them come up with realistic cost and time estimates, and greatly increases the odds that it will actually be completed on schedule and within the budget. Hiring and managing an engineering team needs a whole different article (or maybe even a book) to do it justice, but they key points to remember are that changes in requirements have way more impact than you can possibly believe, and you should expect to see work in progress at regular intervals, don’t let them ‘go dark’ for too long.

There will be two main areas of engineering effort. The backend is all of the cloud-based work you’ll do on servers, using something like Amazon EC2, and which holds all of the shared data for your app. For example, this is where all the photos for Instagram are stored, and all user account information. The frontend is the user interface that people see, so it includes building all the buttons, text and screens that make up the visible part of the app, and the Javascript code that uses an API to store and retrieve information from the backend servers.

Again, try to get whatever you have into as many users hands as possible, to catch any problems and improve things as early (and cheaply) as you can. I’m a big fan of, since they were able to get the app to users almost instantly, and get us a 20 minute screencast video of the testers using the app and describing what they saw and thought, all for around $40 a session. The feedback we got from that was invaluable.


Once you’ve got a basic mobile website working, you can use the PhoneGap tool to wrap it in native code so it can be downloaded from the app store and installed just like a fully-native app. This may seem like a cheat, but it’s possible to polish a mobile website until it feels very smooth and native. We were even featured by Apple, despite our app using this approach, since we worked hard to make everything feel ‘native’. It does require a lot of engineering and design attention to detail to get to that level though.

Native Development

I would only consider native development once you’ve started to get real traction with the faster and cheaper approaches I’ve outlined above. It’s still a major engineering effort to support two different operating systems, development will be slower, and you’ll need more specialized engineers and designers to handle the work. You’ll also be a lot slower at shipping updates, it’s tougher to get statistics on how people are using your product, and techniques like A/B testing of changes are much harder to do.

Anyway, I hope you find this guide useful. If there’s one thing I want you to take away from this it’s that you can make a lot of progress without writing a line of code, the first and hardest work you’ll do on your app is figuring out what users actually want!

Five Deep Links

surferlarge – When I’m up to my neck in debugging obscure numerical bugs, it’s nice to remind myself again why I’m working in this area. Transferring styles from paintings to images is one of those magical results that I would never have guessed I’d see in decades, but here it is! I’ll keep checking back whenever wrestling with my code gets too tricky.

Turkey + Dinner Plates = Thanksgiving – Last week I gave a talk to some journalists about some of my teams work at Google. It was intimidating to be on a roster with Geoff Hinton and other legends, but I was glad to be able to lift the veil a little bit.

RankBrain – Talking of lifting the veil, I’m excited we’ve been able to reveal how we’re using deep learning in search ranking.

Neural Network DSPs – CEVA are doing interesting work on running neural networks on low-power embedded devices, which I think will form the foundation of semantic sensors over the next few years.

Neural Networks with Few Multiplications – An interesting approach to speeding up neural networks by approximating the math.


Get every new post delivered to your Inbox.

Join 1,296 other followers