Cross-compiling TensorFlow for the Raspberry Pi

raspberriesPhoto by oatsy40

I love the Raspberry Pi because it’s such a great platform for software to interact with the physical world. TensorFlow makes it possible to turn messy, chaotic sensor data from cameras and microphones into useful information, so running models on the Pi has enabled some fascinating applications, from predicting train times, sorting trash, helping robots see, and even avoiding traffic tickets!

It’s never been easy to get TensorFlow installed on a Pi though. I had created a makefile script that let you build the C++ part from scratch, but it took several hours to complete and didn’t support Python. Sam Abrahams, an external contributor, did an amazing job maintaining a Python pip wheel for major releases, but building it required you to add swap space on a USB device for your Pi, and took even longer to compile than the makefile approach. Snips managed to get TensorFlow cross-compiling for Rust, but it wasn’t clear how to apply this to other languages.

Plenty of people on the team are Pi enthusiasts, and happily Eugene Brevdo dived in to investigate how we could improve the situation. We knew we wanted to have something that could be run as part of TensorFlow’s Jenkins continuous integration system, which meant building a completely automatic solution that would run with no user intervention. Since having a Pi plugged into a machine to run something like the makefile build would be hard to maintain, we did try using a hosted server from Mythic Beasts. Eugene got the makefile built going after a few hiccups, but the Python version required more RAM than was available, and we couldn’t plug in a USB drive remotely!

Cross compiling, building on an x86 Linux machine but targeting the Pi, looked a lot more maintainable, but also more complex. Thankfully we had the Snips example to give us some pointers, a kindly stranger had provided a solution to a crash that blocked me last time I tried it, and Eugene managed to get an initial version working.

I was able to take his work, abstract it into a Docker container for full reproducibility, and now we have nightly builds running as part of our main Jenkins project. If you just want to try it out for Python 2.7, run:

sudo apt-get install libblas-dev liblapack-dev python-dev \
libatlas-base-dev gfortran python-setuptools
sudo ​pip2 install \
http://ci.tensorflow.org/view/Nightly/job/nightly-pi/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp27-none-any.whl

This can take quite a while to complete, largely because it looks like the SciPy compilation is extremely slow. Once it’s done, you’ll be able to run TensorFlow in Python 2. If you get an error about the .whl file not being found at that URL, the version number may have changed. To find the correct name, go to  http://ci.tensorflow.org/view/Nightly/job/nightly-pi/lastSuccessfulBuild/artifact/output-artifacts/ and you should see the new version listed.

For Python 3.4 support, you’ll need to use a different wheel and pip instead of pip2, like this:

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools
sudo ​pip install \
 http://ci.tensorflow.org/view/Nightly/job/nightly-pi-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl

If you’re running Python 3.5, you can use the same wheel but with a slight change to the file name, since that encodes the version. You will see a couple of warnings every time you import tensorflow, but it should work correctly.

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools
curl -O http://ci.tensorflow.org/view/Nightly/job/nightly-pi-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl
mv tensorflow-1.4.0-cp34-none-any.whl tensorflow-1.4.0-cp35-none-any.whl
sudo ​pip install tensorflow-1.4.0-cp35-none-any.whl

If you have a Pi Zero or One that you want to use TensorFlow on, you’ll need to use an alternative wheel that doesn’t include NEON instructions. This is a lot slower than the one above that’s optimized for the Pi Two and above, so I don’t recommend you use it on newer devices. Here are the commands for Python 2.7:

sudo apt-get install libblas-dev liblapack-dev python-dev \
libatlas-base-dev gfortran python-setuptools
​sudo pip2 install \
http://ci.tensorflow.org/view/Nightly/job/nightly-pi-zero/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0rc1-cp27-none-any.whl

Here is the Python 3.4 version for the Pi Zero:

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools 
sudo ​pip install \
 http://ci.tensorflow.org/view/Nightly/job/nightly-pi-zero-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl

And here are the Python 3.5 instructions:

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools
curl -O http://ci.tensorflow.org/view/Nightly/job/nightly-pi-zero-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl
mv tensorflow-1.4.0-cp34-none-any.whl tensorflow-1.4.0-cp35-none-any.whl
sudo ​pip install tensorflow-1.4.0-cp35-none-any.whl

I’ve found the scipy compilation on Pi Zeros/Ones is so slow (many hours), it is unfeasible to wait for it to complete. Instead I’ve found myself pressing Control-C to cancel when it’s in the middle of a scipy-related compile step, and then re-running with ‘–no-deps’ flag after install to skip building dependencies. This is extremely hacky, but since scipy is only needed for testing purposes you should have a workable copy of TensorFlow at the end, provided all the other dependencies completed.

If you want to build your own copy of the wheels, you can run this line from within the TensorFlow source root on a Linux machine with Docker installed to build for the Pi Two or Three with Python 2.7:

tensorflow/tools/ci_build/ci_build.sh PI tensorflow/tools/ci_build/pi/build_raspberry_pi.sh

For Python 3.4:

CI_DOCKER_EXTRA_PARAMS="-e CI_BUILD_PYTHON=python3 -e CROSSTOOL_PYTHON_INCLUDE_PATH=/usr/include/python3.4" tensorflow/tools/ci_build/ci_build.sh PI-PYTHON3 tensorflow/tools/ci_build/pi/build_raspberry_pi.sh

For Python 2.7 on the Pi Zero:

tensorflow/tools/ci_build/ci_build.sh PI tensorflow/tools/ci_build/pi/build_raspberry_pi.sh PI_ONE

For Python 3.4 on the Pi Zero:

CI_DOCKER_EXTRA_PARAMS="-e CI_BUILD_PYTHON=python3 -e CROSSTOOL_PYTHON_INCLUDE_PATH=/usr/include/python3.4" tensorflow/tools/ci_build/ci_build.sh PI-PYTHON3 tensorflow/tools/ci_build/pi/build_raspberry_pi.sh PI_ONE

(Note, the Docker files are currently broken because they were upgraded to use Ubuntu 16.04 and the Python cross toolchain fails to install on that version. There should be a fix visible in TensorFlow’s github within the next few days, but for now you can locally change Dockerfile.pi, etc, to use 14.04 instead.)

This is all still experimental, so please do file bugs with feedback if these don’t work for you. I’m hoping we will be able to provide official stable Pi binaries for each major release in the future, like we do for Android and iOS, so knowing how well things are working is important to me. I’m also always excited to hear about cool new applications you find for TensorFlow on the Pi, so do let me know what you build too!

21 responses

  1. Pingback: tensor flow learning notes | Electronics DIY

  2. Pingback: TensorFlow On The Pi – Curated SQL

  3. Pingback: Data Science Weekly – Issue 196 | A bunch of data

  4. Installing the latest wheel from ci.tensorflow.org on an RPi 3 doesn’t seem to involve any SciPy. Most of the time seems to be spent downloading and building NumPy. Total install time was around 30 min.

  5. Hi, thanks for the great article and for posting your solution to building tensorflow for the RPi. I am trying to tweak this solution to cross-compile for arm64 (aarch64) + Linux (from a x86_64 server running Ubuntu 16.04) and have been struggling for a few days to get it to work. The Docker container gets built and pip.sh creates new .whl packages, but they are for x86_64 (filenames: tensorflow-1.3.0-cp27-none-linux_x86_64.whl, tensorflow-1.3.0-cp27-cp27mu-manylinux1_x86_64.whl, tensorflow-1.3.0-cp27-cp27mu-linux_x86_64.whl)

    Here’s a quick summary of what I’ve tried (all in the ci_build folder):
    1) created install/install_aarch64_toolchain.sh and changed all references to armhf -> arm64

    2) replaced Dockerfile.pi with Dockerfile.aarch64, and
    a) changed all references to armhf -> arm64
    b) replaced reference to install_pi_toolchain.sh with install_aarch64_toolchain.sh

    (and of course, I’m passing Dockerfile.aarch64 as a command line parameter to ci_build.sh, and am building a cpu Docker container)

    3) created ./aarch64/build_aarch64.sh and
    a) pointed CROSSTOOL_CC at a local installation of aarch64-linux-gnu-gcc-4.9 instead of downloading and building the arm-rpi compiler:
    CROSSTOOL_CC=/usr/bin/aarch64-linux-gnu-gcc
    b) changed TARGET in the first make statement to ARMV8
    c) hard-coded PI_COPTS for Neon on the arm64:
    PI_COPTS=’–copt=-march=arm64 –copt=-mfpu=neon-vfpv4
    –copt=-U__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1
    –copt=-U__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2
    –copt=-U__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8′
    echo “Building for the AArch64, with NEON acceleration”
    d) changed the bazel build line to suit the arm64 target:
    bazel build -c opt ${PI_COPTS} \
    –config=monolithic \
    –copt=-funsafe-math-optimizations –copt=-ftree-vectorize \
    –copt=-fomit-frame-pointer –fat_apk_cpu=arm64-v8a \
    –compiler aarch64-linux-gnu-gcc \
    –verbose_failures \
    //tensorflow/tools/benchmark:benchmark_model \
    //tensorflow/tools/pip_package:build_pip_package

    Aside: I also commented out all pip3 package installs, as I only need Python 2.7 tensorflow for now

    Inspecting the built Docker container, I find no aarch64 gcc compilers in /usr/bin, just a bunch of x86_64-linux-gnu-gcc related ones, so I’m suspecting this is the issue but haven’t yet been able to track down why. I have no experience with bazel before this, and I suspect the issue lies somewhere there. I found this /tensorflow/third_party/toolchains/cpus/arm/CROSSTOOL.tpl file which has some lines about the toolchain, but I don’t know if/how this is referenced in the ci_build. Any help or suggestions you can give would be really appreciated! Thanks again for putting this together. I think including an arm64 nightly build would be awesome 😉

  6. Well I finally got it working. I had no idea what I didn’t know about Bazel…so I spent a good chunk of time learning up on that tool. Thanks again for putting together the RPi build and writing it up!

    • Hi Michael,
      Any details about how you got it working. I’m also looking at trying the same thing, and notice in `build_raspberry_pi.sh` there is a download for `https://github.com/raspberrypi/tools/archive/0e906ebc527eab1cdbf7adabff5b474da9562e9f` downloading what seem to be cross compile tools for raspberry pi.
      Did you fine a generic equivalent for arm64? Or is this compatible?
      Thanks
      D

  7. Hi, after I got it installed how can I run a model using it? I’m getting some problems to use that. I can build any of the pi_examples

  8. Hi Pete,

    Thanks for setting this up!
    Unfortunately I am having an issue with your nightly package (the latest Jenkins successful build #111) in combination with the Tensorflow speech commands example.
    To make sure it is not some installation problem of mine, I tried from scratch from a clean Raspbian Stretch Lite image for the Pi. Steps followed (on a Pi 3):

    – installed the packages you listed for Python 2.7
    – also installed virtualenv
    – downloaded said wheel (build #111)
    – made a virtualenv and activated it
    – pip install the .whl

    To get the speech example code, I then grabbed the 1.4.0 source release of Tensorflow from Github and threw away all except the examples folder.

    Now, if I run “python tensorflow/examples/speech_commands/label_wav.py”, I get the following errors:

    python examples/speech_commands/label_wav.py
    Traceback (most recent call last):
    File “examples/speech_commands/label_wav.py”, line 40, in
    from tensorflow.contrib.framework.python.ops import audio_ops as contrib_audio
    File “/home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/__init__.py”, line 81, in
    from tensorflow.contrib.eager.python import tfe as eager
    File “/home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/eager/python/tfe.py”, line 75, in
    from tensorflow.contrib.eager.python.datasets import Iterator
    File “/home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/eager/python/datasets.py”, line 23, in
    from tensorflow.contrib.data.python.ops import prefetching_ops
    File “/home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/data/python/ops/prefetching_ops.py”, line 25, in
    resource_loader.get_path_to_datafile(“../../_prefetching_ops.so”))
    File “/home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/util/loader.py”, line 55, in load_op_library
    ret = load_library.load_op_library(path)
    File “/home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py”, line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
    File “/home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py”, line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
    tensorflow.python.framework.errors_impl.NotFoundError: /home/pi/venv/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/data/python/ops/../../_prefetching_ops.so: undefined symbol: _ZN6google8protobuf8internal26fixed_address_empty_stringE

    Some Googling yields that this could have something to do with a mismatch between the GCC versions used to build Tensorflow and Protobuf, but since I have compiled neither of them myself it seems I cannot fix this. Do you have any additional environment set up on the Pi on which the nightly is tested?

    Thanks a bunch!

    • Sorry you’re hitting this issue! I took a look and something funky is happening with the contrib imports. I’ve been able to work around this by replacing the line:

      from tensorflow.contrib.framework.python.ops import audio_ops as contrib_audio

      with:

      from tensorflow.python.ops.gen_audio_ops import *

      Let me know if that helps.

      • I have the same issue. Installing tensorflow is so frustrating. I have spend 4 days straight and all I end up with is obscure errors.

  9. Got the solution. If you install tensorflow-1.4.0rc1-cp27-none-any.whl from nightly-pi, the audio part is supported in that and is working fine 🙂

  10. Pingback: How to build Image classifier Robot using Raspberry PI, with DeepLearning – Royyak AI

  11. Pingback: How to build Image classifier Robot using Raspberry Pi, with Deep Learning - Royyak AI

  12. Thanks for this. What is the status of this for future versions of TF? You mentioned “I’m hoping we will be able to provide official stable Pi binaries for each major release in the future” – is that closer to happening? Thanks,
    John

Leave a comment