When I first started investigating the world of deep learning, I found it very hard to get started. There wasn’t much documentation, and what existed was aimed at academic researchers who already knew a lot of the jargon and background. Thankfully that has changed over the last few years, with a lot more guides and tutorials appearing.
I always loved EC2 for Poets though, and I haven’t seen anything for deep learning that’s aimed at as wide an audience. EC2 for Poets is an explanation of cloud computing that removes a lot of the unnecessary mystery by walking anyone with basic computing knowledge step-by-step through building a simple application on the platform. In the same spirit, I want to show how anyone with a Mac laptop and the ability to use the Terminal can create their own image classifier using TensorFlow, without having to do any coding.
I feel very lucky to be a part of building TensorFlow, because it’s a great opportunity to bring the power of deep learning to a mass audience. I look around and see so many applications that could benefit from the technology by understanding the images, speech, or text their users enter. The frustrating part is that deep learning is still seen as a very hard topic for product engineers to grasp. That’s true at the cutting edge of research, but otherwise it’s mostly a holdover from the early days. There’s already a lot of great documentation on the TensorFlow site, but to demonstrate how easy it can be for general software engineers to pick up I’m going to present a walk-through that takes you from a clean OS X laptop all the way to classifying your own categories of images. You’ll find written instructions in this post, along with a screencast showing exactly what I’m doing.
[Update – TensorFlow for Poets is now an official Google Codelab! It has the same content, but should be kept up to date as TensorFlow evolves, so I would recommend following the directions there.]
Docker
It’s possible to get TensorFlow running natively on OS X, but there’s less standardization around how the development tools like Python are installed which makes it hard to give one-size-fits-all instructions. To make life easier, I’m going to use the free Docker container system, which will allow me to install a Linux virtual machine that runs on my MacBook Pro. The advantage is that I can start from a known system image, and so the instructions are a lot more likely to work for everyone.
Installing Docker
There’s full documentation on installing Docker at docker.com, and it’s likely to be updated over time, but I will run through exactly what steps I took to get it running here.
- I went to docs.docker.com/mac/ in my browser.
- Step one of the instructions sent me to download the Docker Toolbox.
- On the Toolbox page, I clicked on the Mac download button.
- That downloaded a DockerToolbox-1.10.2.pkg file.
- I ran that downloaded pkg to install the Toolbox.
- At the end of the install process, I chose the Docker Quickstart Terminal.
- That opened up a new terminal window and ran through an installation script.
- At the end of the script, I saw ASCII art of a whale and I was left at a prompt.
- I went back to step one of the instructions, and ran the suggested command in the terminal:
docker run hello-world
- This gave me output confirming my installation of Docker had worked:
Hello from Docker.
This message shows that your installation appears to be working correctly.
Installing TensorFlow
Now I’ve got Docker installed and running, I can get a Linux virtual machine with TensorFlow pre-installed running. We create daily development images, and ones for every major release. Because the example code I’m going to use came in after the last versioned release, 0.7.1, we’ll have to do some extra work below to update the source code using git, but once 0.8 comes out you could replace the ‘0.7.1’ below with the 0.8.0 instead, and skip the ‘Updating the Code’ section. The Docker section in the TensorFlow documentation has more information.
To download and run the TensorFlow docker image, use this command from the terminal:
docker run -it b.gcr.io/tensorflow/tensorflow:0.7.1-devel
This will show a series of download and extraction steps. These are the different components of the TensorFlow image being assembled. It needs to download roughly a gigabyte of data, so it can take a while on a slow network connection.
Once that’s complete, you’ll find yourself in a new terminal. This is now actually the shell for the Linux virtual machine you’ve downloaded. To confirm this has been successful, run this command:
ls /tensorflow
You should see a series of directories, including a tensorflow one and some .build files, something like this:
Optimizing Docker
Often Docker is just used for testing web apps, where computational performance isn’t that important, so the speed of the processor in the virtual machine isn’t crucial. In our example we’re going to be doing some very heavy number-crunching though, so optimizing the configuration for speed is important.
Under the hood, Docker actually uses VirtualBox to run its images, and we’ll use its control panel to manage the setup. To do that, we’ll need to take the following steps:
- Find the VirtualBox application on your Mac. I like to use spotlight to find and open it, so I don’t have to hunt around on the file system.
- Once VirtualBox is open, you should see a left-hand pane showing virtual machines. There should be one called ‘default’ that’s running.
- Right-click on ‘default’ to bring up the context menu and chose ‘Close->ACPI Shutdown’. The other close options should also work, but this is the most clean.
- Once the shutdown is complete, ‘default’ should have the text ‘Powered off’ below it. Right click on it again and choose ‘Settings…’ from the menu.
- Click on the ‘System’ icon, and then choose the ‘Motherboard’ tab.
- Drag the ‘Base Memory’ slider as far as the green section goes, which is normally around 75% of your total laptop’s memory. So in my case it’s 12GB, because I have a 16GB machine.
- Click on the ‘Processor’ tab, and set the number of processors higher than the default of 1. Most likely on a modern MacBook Pro 4 is a good setting, but use the green bar below the slider as a guide.
- Click ‘OK’ on the settings dialog.
- Right-click on ‘default’ and choose ‘Start->Headless Start’.
You should find that your terminal was kicked out of the Linux prompt when you stopped the ‘default’ box, but now you’ve restarted it you can run the same command to access it again:
docker run -it b.gcr.io/tensorflow/tensorflow:0.7.1-devel
The only difference is that now the virtual machine will have access to a lot more of your laptop’s computing power, and so the example should run a lot faster!
Downloading Images
The rest of this walk-through is based on the image-retraining example on the TensorFlow site. It shows you how to take your own images organized into folders by category, and use them to quickly retrain the top layer of the Inception image recognition neural network to recognize those categories. To get started, the first thing you need to do is get some example images. To begin, go to the terminal and enter the ‘exit’ command if you still see the ‘root@…’ prompt that indicates you’re still in the Linux virtual machine.
Then run the following commands to create a new folder in your Downloads directory to hold training images, and download and extract the flower photos:
cd $HOME mkdir tf_files cd tf_files curl -O http://download.tensorflow.org/example_images/flower_photos.tgz tar xzf flower_photos.tgz open flower_photos
This should end up with a new finder window opening, showing a set of five folders:
This means you’ve successfully downloaded the example flower images. If you look at how they’re organized, you should be able to use the same structure with classes you care about, just replacing the folder names with the category labels you’re dealing with, and populating them with photos of those objects. There’s more guidance on that process in the tutorial.
Running the VM with Shared Folders
Now you’ve got some images to train with, we’re going to start up the virtual machine again, this time sharing the folder you just created with Linux so TensorFlow can access the photos:
docker run -it -v $HOME/tf_files:/tf_files b.gcr.io/tensorflow/tensorflow:0.7.1-devel
You should find yourself back in a Linux prompt. To make sure the file sharing worked, try the following command:
ls /tf_files/flower_photos
You should see a list of the flower folders, like this:
root@2c570d651d08:~# ls /tf_files/flower_photos LICENSE.txt daisy dandelion roses sunflowers tulips root@2c570d651d08:~#
Updating the Code
For this example, we need the very latest code since it’s just been added. Unfortunately getting it is a little involved, with some use of the source control program git. I’ll walk through the steps below.
Pulling the code requires a default email address, which you can set to anything, since we’re not planning on pushing any changes back.
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
Now you should be able to pull the latest source.
cd /tensorflow/
git pull origin master
You’ll find yourself in a vim window. Just type ‘:quit’ to exit.
You should now have fully up-to-date code. We want to sync it to a version we know works though, so we’ll run this command:
git checkout 6d46c0b370836698a3195a6d73398f15fa44bcb2
Building the Code
If that worked, the next step is to compile the code. You may notice there’s some optimization flags in the command that help speed it up on processors with AVX, which almost all modern OS X machines have.
cd /tensorflow/ bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain
This part can take five to ten minutes, depending on the speed of your machine, as it’s compiling the full source code for TensorFlow. Don’t worry if you see a lot of warnings, this is normal (though we’re working on reducing them going forward).
Running the Code
I can now run the retraining process using this command:
bazel-bin/tensorflow/examples/image_retraining/retrain \ --bottleneck_dir=/tf_files/bottlenecks \ --model_dir=/tf_files/inception \ --output_graph=/tf_files/retrained_graph.pb \ --output_labels=/tf_files/retrained_labels.txt \ --image_dir /tf_files/flower_photos
You’ll see a message about downloading the Inception model, and then a long series of messages about creating bottlenecks. There’s around 3,700 photos in total to process, and my machine does around 200 a minute, so it takes around twenty minutes in total. If you want to know more about what’s happening under the hood while you wait, you can check out the tutorial for a detailed explanation.
I’ve changed the default /tmp destination for things like the output graph and cached bottlenecks to the shared /tf_files folder, so that the results will be accessible from OS X and will be retained between different runs of the virtual machine.
Once the bottlenecks are cached, it will then go into the training process, which takes another five minutes or so on my laptop. At the end, you should see the last output line giving the final estimated accuracy, which should be around 90%. That means you’ve trained your classifier to guess the right flower species nine times out of ten when shown a photo!
Using the Classifier
The training process outputs the retrained graph into /tmp/output_graph.pb, and to test it out yourself you can build another piece of sample code. The label_image example is a small C++ program that loads in a graph and applies it to a user-supplied image. Give it a try like this:
bazel build tensorflow/examples/label_image:label_image && \ bazel-bin/tensorflow/examples/label_image/label_image \ --graph=/tf_files/retrained_graph.pb \ --labels=/tf_files/retrained_labels.txt \ --output_layer=final_result \ --image=/tf_files/flower_photos/daisy/21652746_cc379e0eea_m.jpg
You should see a result showing that it identified a daisy in that picture, though because the training process is random you may occasionally have a model that makes a mistak on the image. Try it with some of the other photos to get a feel for how it’s doing.
Next Steps
The first thing you’ll probably want to do is train a classifier for objects you care about in your application. This should be as simple as creating a new folder in your Downloads/tf_images directory, putting subfolders full of photos in it, and re-running the classifier commands. You can find more detailed advice on tuning that process in the tutorial.
Finally, you’ll want to use this in your own application! The label_image example is a good template to look at if you can integrate C++ into your product, and we even support running on mobile, so check out the Android sample code if you’d like to run on a smart phone.
Thanks for working through this process with me, I hope it’s inspired you to think about how you can use deep learning to help your users, and I can’t wait to see what you build!
Pingback: Four short links: 1 March 2016 - O'Reilly Radar
Pingback: pinboard March 2, 2016 — arghh.net
Pingback: Start up: Bitcoin’s nightmare, the cheating economy, how Snapchat took off, Oculus spurns Macs, and more | The Overspill: when there's more that I want to say
I found that I needed to run the following commands (from [1]) in order to build the Android example linked to.
sudo dpkg –add-architecture i386
sudo apt-get -qqy update
sudo apt-get -qqy install libncurses5:i386 libstdc++6:i386 zlib1g:i386
[1] https://github.com/bazelbuild/bazel/issues/392
Pingback: This Week in Data Science (March 8, 2016) -
Pete –
This is nice. Here is a link I gave out at OpenAI that is a ready to go Ubuntu VM with Tensorflow and the GUI loaded up and running. It is probably simpler for some folks than what you have here.
http://goo.gl/forms/yTZYGv72AI
Enjoy,
Dave W
Pingback: TensorFlow for poets – O’Reilly Media | Big Data Cloud
This is really an interesting topic. thank for sharing. I know this is classify images, how about searching similiar image? Is tensor flow able to search similiar images?
It’s not perfect, but you can use the bottleneck output as an embedding vector and try the distance between two bottlenecks as a rough measure of images semantic similarity.
Prior to running code – ignore the warnings
Getting a successful build – then trying to run code (retraining process)
Getting error below – timeout at container level where the docker file is stored on the cloud?
bash: bazel-bin/tensorflow/examples/image_retraining/retrain: No such file or directory
Have a cloud account short on days – so using this method (thanks).
Sorry you’re hitting problems! Can you make sure you’re in the correct folder (you should be in /tensorflow). If you are, can you file a bug on https://github.com/tensorflow/tensorflow and mention @petewarden so I see it? Thanks.
Very interesting blog indeed. Thanks a lot.
I run into problems when I try to run the code
bazel-bin/tensorflow/examples/image_retraining/retrain \ (and following lines with the options
the terminal tells me that this is an “Illegal instruction”
Any comment on what I do wrong.
many thanks
Peter
Sorry you’re running into problems! I believe you should be able to remove the ‘–copt=-mavx’ part from the build command you ran just before this. This will run things more slowly, but should work on a wider range of machines.
Pingback: Bookmarks for March 30th | Chris's Digital Detritus
Thank you so much for doing these kinds of tutorials. I’m a developer and when I experiment with new technologies I appreciate someone putting in effort for it to just work. Hope this is the second of many more posts like it!
My local docker setup is a little different, and I was getting a lot of this kind of error:
INFO: From Compiling tensorflow/core/kernels/cwise_op_mul.cc:
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See for instructions.
ERROR: /tensorflow/tensorflow/core/BUILD:358:1: C++ compilation of rule ‘//tensorflow/core:kernel_lib’ failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE ‘-D_FORTIFY_SOURCE=1’ -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections … (remaining 79 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
Target //tensorflow/examples/image_retraining:retrain failed to build
Use –verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2925.538s, Critical Path: 2891.84s
They were different each time, with the commonality being gcc exiting with status 4. I didn’t increase the memory of the VM, however I was able to build successfully with this command: bazel build -c opt –local_resources 2048,.5,1.0 tensorflow/examples/image_retraining:retrain
Thought I’d leave that for future readers with the same problem.
Pingback: Awesome TensorFlow,包括教程、项目、视频、文献、博客、社群等资源 | 神刀安全网
Pingback: Bossy girls, Parser McParseface, and why deep learning is not just another fad « Pete Warden's blog
Pingback: TensorFlow for Poets | Dr. William Kaya Erbil's Blog
While using the docker image for Tensorflow on Mac, I found that the size of the virtual machine grew quite quickly, from about 3 GB to about 6 GB in a few days, while what I did in Tensorflow was just to run some tutorials within a Jupyter notebook. I don’t understand why the virtual machine size grows so quickly.
Thank you for your tutorial first. I got an error during training of my own image.
Category has no images – validation
Is it because I don’t have enough images? Or should each label folder has the same number of images?
Pingback: TensorFlow for Poets | josephschmoe
Pingback: 机器学习小食谱第7集:用TensorFlow For Poets训练一个图片分类器 – 127.0.0.1
Pingback: 机器学习小食谱第7集:用TensorFlow For Poets训练一个图片分类器 – 127.0.0.1
I am getting the following error after downloading the tensor flow docker image.
➜ tensorflow git:(master) docker run -it b.gcr.io/tensorflow/tensorflow:0.7.1-devel
Unable to find image ‘b.gcr.io/tensorflow/tensorflow:0.7.1-devel’ locally
0.7.1-devel: Pulling from tensorflow/tensorflow
a64038a0eeaa: Extracting 65.69 MB/65.69 MB
2ec6e7edf8a8: Download complete
0a5fb6c3c94b: Download complete
a3ed95caeb02: Download complete
ed54cceb1f9b: Retrying in 1 second
219efdf5ea33: Download complete
147d151c1430: Download complete
48862cd9a16f: Download complete
aa5acc667d66: Download complete
348cc777900e: Retrying in 1 second
0872ed622c8c: Download complete
a718d6248904: Download complete
2b705ad62ec4: Downloading 178.1 MB/178.1 MB
1f90a0184aa3: Waiting
00278cb3e438: Waiting
docker: write /mnt/sda1/var/lib/docker/tmp/GetImageBlob883289376: read-only file system.
See ‘docker run –help’.
Pingback: Dive into TensorFlow with Linux | Deep Learning Resources
Thank you for the tutorial. However i would like to know how to classify more thatn1 image a time please.
Thanks for the tutorial
I have successfully retrained the inception model on my own data and its working perfectly
Somehow i need to know that is there any way to get the bounded region or its coordinates or the heat map of the object.
I have using it for ALPR.
Hi! I have successfully retrained the inception model on my own data too, but I can’t test it, because of original label_image.py don’t work with inception model. Could you share with me version of label_image.py just for inception model?
Many thanks!
Hi Pete,
Thanks for the tutorial. I’m curious what are the steps that I should take to improve this classifier. I am building a image recognition system as my bachelor project and would like to improve the model without creating the whole neural network as I don’t have workstation for training deep neural network.
Hello and thanks for the tutorial, it was very helpfull.
Does it make sense to train or test with higher resolution images, for better results ?
If so, what should I change to make it happen ?
Thanks,
lef
Pingback: Good resources to learn TensorFlow | Learn for Master
Pingback: Import AI: Issue 23: Brain-controlled robots, nano drones, and Amazon’s robot growth | Mapping Babel
Thanks very much for the tutorial, I’ve successfully run the retraining without natively in OSX.
Once the model is retrained is it correct that it is a new final softmax layer on the graph, and the previous layers are unchanged? If so is there a method to merge the newly created output layer with the original output layer to simply add the newly trained categories to the model?
There isn’t, but if you look at the graph I think you may find the old output layer is still there, and you can just query both of them when you fetch your outputs.
Great tutorial! It was very informative.
Do you know of any tutorials that use both images and additional data? For example, I want to train a model that uses both flower image and growing region as my input. (ie, the North Sunflower and Southern Sunflower look the same, but North Sunflower only grows in Region 1 and Southern Sunflower only grows in Region 3).
All of the examples I have found train only on images. Do you know of any examples that use image + other data?
thanks again
Pingback: Linux上TensorFlow的深入研究 - 莹莹之色
Pingback: Learning AI if You Suck at Math – TarnLab
Pingback: TensorFlow 资源大全–中文版 | | URl-team
Hey,I want to use stochastic gradient descent algorithm as a learning method.How should I proceed? Should I modify retrain.py file?
Hi Pete. Thanks a lot for this tutorial. Everything is working fine!
Now I’m trying to use tensorflow serving to give me the predictions. I saw all the tutorials on the official site and nothing works for this example.
Do you know how to generate a correct tensorflow serving graph from your example?
if you have some codes example or something like that….. it will be very nice!
Thanks a lot!
Pingback: This Week in Data Science (March 8, 2016) – Big Data University
i have trained inception using the instructions,but now i dont know how to reaccess it again!
do i need to retrain the model from scratch evrery time?
If do_distort_images is False, will the model rescale the input images to 299×299? Do I need to normalize the pixels values of the input images to [0, 1) or [-1, 1) ?
Hi,
I am a beginner, I followed this very good tutorial, who does exactly what I want, and
I obtain the file “retrained_graph.pb”
My final project is to use an inception V3 Core ML Model file in Xcode.
Tensorflow .pb is not recognized but .h5 Keras model is. The final conversion to Core Ml Model is done with CoreMlTools.
I do not see how to translate the .pb file directly into .h5 Keras model ?
All the solutions lead me to redo a training phase with Keras instead of tensorflow.
What is the solution ?
Pingback: Learning AI if You Suck at Math | Me Uploads
Pingback: Bookmarks – The Flying Bullfrog
I really like this blog. Thanks a lot.
I got a problem when I run the program.
when I try to run the flowing command on my terminal,
bazel build -c opt –copt=-mavx tensorflow/examples/image_retraining:retrain
the terminal given my error:
no such package ‘tensorflow/examples/tensorflow/examples/image_retraining’: BUILD file not found on package path.
How should I fix this?
Thanks,
Di
how to do these things on linux with pre installed tensorflow in anaconda env
Reblogged this on Rodolfo Ferro: Ctrl+F This! and commented:
With this kind of stuff you can understand why Siraj calls them wizards…
Pingback: Getting started with a TensorFlow surgery classifier with TensorBoard data viz – Adam Monsen
Pingback: How to improve accuracy of Tensorflow camera demo on iOS for retrained graph | Question and Answer
Pingback: tensorflow tutorials and projects - AI游乐园 - Think Different
Pingback: Exploring Tensorflow for Poets, or, Building a Pottery Classifier – Electric Archaeology
Pingback: TensorFlow сохранение/загрузка графика из файла — Вопросы и ответы по программированию
I got the error when trying to use your NN in Tensor Serve with error Loading servable: {name: inception version: 1} failed: Not found: Could not find meta graph def matching supplied tags: { serve }.
When I trying to use saved_model_cli.py to explore NN i really get empty tag-sets.
(tensorflow) E:\KERAS\CLASSIFIER_1\tensorflow-for-poets-2\tf_files>python saved_model_cli.py show –dir E:\KERAS\CLASSIFIER_1\tensorflow-for-poets-2\tf_files\saved_model
The given SavedModel contains the following tag-sets:
Need help,
please…