Cross-compiling TensorFlow for the Raspberry Pi

raspberriesPhoto by oatsy40

I love the Raspberry Pi because it’s such a great platform for software to interact with the physical world. TensorFlow makes it possible to turn messy, chaotic sensor data from cameras and microphones into useful information, so running models on the Pi has enabled some fascinating applications, from predicting train times, sorting trash, helping robots see, and even avoiding traffic tickets!

It’s never been easy to get TensorFlow installed on a Pi though. I had created a makefile script that let you build the C++ part from scratch, but it took several hours to complete and didn’t support Python. Sam Abrahams, an external contributor, did an amazing job maintaining a Python pip wheel for major releases, but building it required you to add swap space on a USB device for your Pi, and took even longer to compile than the makefile approach. Snips managed to get TensorFlow cross-compiling for Rust, but it wasn’t clear how to apply this to other languages.

Plenty of people on the team are Pi enthusiasts, and happily Eugene Brevdo dived in to investigate how we could improve the situation. We knew we wanted to have something that could be run as part of TensorFlow’s Jenkins continuous integration system, which meant building a completely automatic solution that would run with no user intervention. Since having a Pi plugged into a machine to run something like the makefile build would be hard to maintain, we did try using a hosted server from Mythic Beasts. Eugene got the makefile built going after a few hiccups, but the Python version required more RAM than was available, and we couldn’t plug in a USB drive remotely!

Cross compiling, building on an x86 Linux machine but targeting the Pi, looked a lot more maintainable, but also more complex. Thankfully we had the Snips example to give us some pointers, a kindly stranger had provided a solution to a crash that blocked me last time I tried it, and Eugene managed to get an initial version working.

I was able to take his work, abstract it into a Docker container for full reproducibility, and now we have nightly builds running as part of our main Jenkins project. If you just want to try it out for Python 2.7, run:

sudo apt-get install libblas-dev liblapack-dev python-dev \
libatlas-base-dev gfortran python-setuptools
sudo ​pip2 install \
http://ci.tensorflow.org/view/Nightly/job/nightly-pi/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp27-none-any.whl

This can take quite a while to complete, largely because it looks like the SciPy compilation is extremely slow. Once it’s done, you’ll be able to run TensorFlow in Python 2. If you get an error about the .whl file not being found at that URL, the version number may have changed. To find the correct name, go to  http://ci.tensorflow.org/view/Nightly/job/nightly-pi/lastSuccessfulBuild/artifact/output-artifacts/ and you should see the new version listed.

For Python 3.4 support, you’ll need to use a different wheel and pip instead of pip2, like this:

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools
sudo ​pip install \
 http://ci.tensorflow.org/view/Nightly/job/nightly-pi-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl

If you’re running Python 3.5, you can use the same wheel but with a slight change to the file name, since that encodes the version. You will see a couple of warnings every time you import tensorflow, but it should work correctly.

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools
curl -O http://ci.tensorflow.org/view/Nightly/job/nightly-pi-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl
mv tensorflow-1.4.0-cp34-none-any.whl tensorflow-1.4.0-cp35-none-any.whl
sudo ​pip install tensorflow-1.4.0-cp35-none-any.whl

If you have a Pi Zero or One that you want to use TensorFlow on, you’ll need to use an alternative wheel that doesn’t include NEON instructions. This is a lot slower than the one above that’s optimized for the Pi Two and above, so I don’t recommend you use it on newer devices. Here are the commands for Python 2.7:

sudo apt-get install libblas-dev liblapack-dev python-dev \
libatlas-base-dev gfortran python-setuptools
​sudo pip2 install \
http://ci.tensorflow.org/view/Nightly/job/nightly-pi-zero/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0rc1-cp27-none-any.whl

Here is the Python 3.4 version for the Pi Zero:

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools 
sudo ​pip install \
 http://ci.tensorflow.org/view/Nightly/job/nightly-pi-zero-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl

And here are the Python 3.5 instructions:

sudo apt-get install libblas-dev liblapack-dev python-dev \
 libatlas-base-dev gfortran python-setuptools
curl -O http://ci.tensorflow.org/view/Nightly/job/nightly-pi-zero-python3/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0-cp34-none-any.whl
mv tensorflow-1.4.0-cp34-none-any.whl tensorflow-1.4.0-cp35-none-any.whl
sudo ​pip install tensorflow-1.4.0-cp35-none-any.whl

I’ve found the scipy compilation on Pi Zeros/Ones is so slow (many hours), it is unfeasible to wait for it to complete. Instead I’ve found myself pressing Control-C to cancel when it’s in the middle of a scipy-related compile step, and then re-running with ‘–no-deps’ flag after install to skip building dependencies. This is extremely hacky, but since scipy is only needed for testing purposes you should have a workable copy of TensorFlow at the end, provided all the other dependencies completed.

If you want to build your own copy of the wheels, you can run this line from within the TensorFlow source root on a Linux machine with Docker installed to build for the Pi Two or Three with Python 2.7:

tensorflow/tools/ci_build/ci_build.sh PI tensorflow/tools/ci_build/pi/build_raspberry_pi.sh

For Python 3.4:

CI_DOCKER_EXTRA_PARAMS="-e CI_BUILD_PYTHON=python3 -e CROSSTOOL_PYTHON_INCLUDE_PATH=/usr/include/python3.4" tensorflow/tools/ci_build/ci_build.sh PI-PYTHON3 tensorflow/tools/ci_build/pi/build_raspberry_pi.sh

For Python 2.7 on the Pi Zero:

tensorflow/tools/ci_build/ci_build.sh PI tensorflow/tools/ci_build/pi/build_raspberry_pi.sh PI_ONE

For Python 3.4 on the Pi Zero:

CI_DOCKER_EXTRA_PARAMS="-e CI_BUILD_PYTHON=python3 -e CROSSTOOL_PYTHON_INCLUDE_PATH=/usr/include/python3.4" tensorflow/tools/ci_build/ci_build.sh PI-PYTHON3 tensorflow/tools/ci_build/pi/build_raspberry_pi.sh PI_ONE

This is all still experimental, so please do file bugs with feedback if these don’t work for you. I’m hoping we will be able to provide official stable Pi binaries for each major release in the future, like we do for Android and iOS, so knowing how well things are working is important to me. I’m also always excited to hear about cool new applications you find for TensorFlow on the Pi, so do let me know what you build too!

10 responses

  1. Pingback: tensor flow learning notes | Electronics DIY

  2. Pingback: TensorFlow On The Pi – Curated SQL

  3. Pingback: Data Science Weekly – Issue 196 | A bunch of data

  4. Installing the latest wheel from ci.tensorflow.org on an RPi 3 doesn’t seem to involve any SciPy. Most of the time seems to be spent downloading and building NumPy. Total install time was around 30 min.

  5. Hi, thanks for the great article and for posting your solution to building tensorflow for the RPi. I am trying to tweak this solution to cross-compile for arm64 (aarch64) + Linux (from a x86_64 server running Ubuntu 16.04) and have been struggling for a few days to get it to work. The Docker container gets built and pip.sh creates new .whl packages, but they are for x86_64 (filenames: tensorflow-1.3.0-cp27-none-linux_x86_64.whl, tensorflow-1.3.0-cp27-cp27mu-manylinux1_x86_64.whl, tensorflow-1.3.0-cp27-cp27mu-linux_x86_64.whl)

    Here’s a quick summary of what I’ve tried (all in the ci_build folder):
    1) created install/install_aarch64_toolchain.sh and changed all references to armhf -> arm64

    2) replaced Dockerfile.pi with Dockerfile.aarch64, and
    a) changed all references to armhf -> arm64
    b) replaced reference to install_pi_toolchain.sh with install_aarch64_toolchain.sh

    (and of course, I’m passing Dockerfile.aarch64 as a command line parameter to ci_build.sh, and am building a cpu Docker container)

    3) created ./aarch64/build_aarch64.sh and
    a) pointed CROSSTOOL_CC at a local installation of aarch64-linux-gnu-gcc-4.9 instead of downloading and building the arm-rpi compiler:
    CROSSTOOL_CC=/usr/bin/aarch64-linux-gnu-gcc
    b) changed TARGET in the first make statement to ARMV8
    c) hard-coded PI_COPTS for Neon on the arm64:
    PI_COPTS=’–copt=-march=arm64 –copt=-mfpu=neon-vfpv4
    –copt=-U__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1
    –copt=-U__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2
    –copt=-U__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8′
    echo “Building for the AArch64, with NEON acceleration”
    d) changed the bazel build line to suit the arm64 target:
    bazel build -c opt ${PI_COPTS} \
    –config=monolithic \
    –copt=-funsafe-math-optimizations –copt=-ftree-vectorize \
    –copt=-fomit-frame-pointer –fat_apk_cpu=arm64-v8a \
    –compiler aarch64-linux-gnu-gcc \
    –verbose_failures \
    //tensorflow/tools/benchmark:benchmark_model \
    //tensorflow/tools/pip_package:build_pip_package

    Aside: I also commented out all pip3 package installs, as I only need Python 2.7 tensorflow for now

    Inspecting the built Docker container, I find no aarch64 gcc compilers in /usr/bin, just a bunch of x86_64-linux-gnu-gcc related ones, so I’m suspecting this is the issue but haven’t yet been able to track down why. I have no experience with bazel before this, and I suspect the issue lies somewhere there. I found this /tensorflow/third_party/toolchains/cpus/arm/CROSSTOOL.tpl file which has some lines about the toolchain, but I don’t know if/how this is referenced in the ci_build. Any help or suggestions you can give would be really appreciated! Thanks again for putting this together. I think including an arm64 nightly build would be awesome 😉

  6. Well I finally got it working. I had no idea what I didn’t know about Bazel…so I spent a good chunk of time learning up on that tool. Thanks again for putting together the RPi build and writing it up!

  7. Hi, after I got it installed how can I run a model using it? I’m getting some problems to use that. I can build any of the pi_examples

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: