My colleague Yangqing Jia, creator of Caffe, recently spent some free time getting the framework running on Nvidia’s Jetson board. If you haven’t heard of the Jetson, it’s a small development board that includes Nvidia’s TK1 mobile GPU chip. The TK1 is starting to appear in high-end tablets, and has 192 cores so it’s great for running computational tasks like deep learning. The Jetson’s a great way to get a taste of what we’ll be able to do on mobile devices in the future, and it runs Ubuntu so it’s also an easy environment to develop for.
Caffe comes with a pre-built ‘Alexnet’ model, a version of the Imagenet-winning architecture that recognizes 1,000 different kinds of objects. Using this as a benchmark, the Jetson can analyze an image in just 34ms! Based on this table I’m estimating it’s drawing somewhere around 10 or 11 watts, so it’s power-intensive for a mobile device but not too crazy.
Yangqing passed along his instructions, and I’ve checked them on my own Jetson, so here’s what you need to do to get Caffe up and running.
The first step once you’ve unboxed your Jetson is logging in. You can attach a monitor and keyboard, but I prefer just plugging it into a local router and ssh-ing in. elinux.org/Jetson/Remote_Access has more details, but it should show up as tegra-ubuntu.local on your local network, and the username is ubuntu:
ssh [email protected]
The default password is ubuntu. Next we need to run Nvidia’s installer that comes with the device, and reboot.
sudo shutdown -r now
Once the board has rebooted, you can log back in and continue installing all the packages you’ll need for Caffe.
ssh [email protected]
sudo add-apt-repository universe
sudo apt-get update
sudo apt-get install libprotobuf-dev protobuf-compiler gfortran \
libboost-dev cmake libleveldb-dev libsnappy-dev \
libboost-thread-dev libboost-system-dev \
libatlas-base-dev libhdf5-serial-dev libgflags-dev \
libgoogle-glog-dev liblmdb-dev gcc-4.7 g++-4.7
You’ll need the Cuda SDK to build and run GPU programs, and elinux.org/Tegra/Installing_CUDA has a good general guide. The summary is that you’ll need to register as an Nvidia developer, on a logged-in browser download the Cuda 6.0 for ARM package to your local machine and then copy it over to the Jetson from there.
scp ~/Downloads/cuda-repo-l4t-r19.2_6.0-42_armhf.deb [email protected]:
Then back on the ssh connection to your Tegra, run these Cuda installation steps.
sudo dpkg -i cuda-repo-l4t-r19.2_6.0-42_armhf.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-6-0
sudo usermod -a -G video $USER
echo "# Add CUDA bin & library paths:" >> ~/.bashrc
echo "export PATH=/usr/local/cuda/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH" >> ~/.bashrc
If everything’s installed correctly, running ‘nvcc -V’ should give you a compiler version message. Now you need to grab the Tegra versions of OpenCV. On your main machine, download developer.nvidia.com/rdp/assets/opencv-run-tegra-k1 and developer.nvidia.com/rdp/assets/opencv-dev-tegra-k1 from your logged-in browser and copy them over to the Jetson.
scp ~/Downloads/libopencv4tegra* [email protected]:
On the Jetson, install those packages.
sudo dpkg -i libopencv4tegra_18.104.22.168_armhf.deb
sudo dpkg -i libopencv4tegra-dev_22.214.171.124_armhf.deb
We need to download and install Caffe. Yangqing has put in a few recent tweaks and fixes so at the moment you’ll need to grab the dev branch, but those should soon be rolled into master.
sudo apt-get install -y git
git clone https://github.com/BVLC/caffe.git
cd caffe && git checkout dev
cp Makefile.config.example Makefile.config
sed -i "s/# CUSTOM_CXX := g++/CUSTOM_CXX := g++-4.7/" Makefile.config
We have to use gcc version 4.7 because nvcc hits some problems with the default 4.8, but otherwise we’re using a pretty standard setup. You should be able to kick off the build.
make -j 8 all
Once that’s complete, you should check things are working properly by running Caffe’s test suite. This can take quite a while to finish, but hopefully it should report a clean bill of health.
make -j 8 runtest
Finally you can run Caffe’s benchmarking code to measure performance.
build/tools/caffe time --model=models/bvlc_alexnet/deploy.prototxt --gpu=0
This should take about 30 seconds, and output a set of statistics. It’s running 50 iterations of the recognition pipeline, and each one is analyzing 10 different crops of the input image, so look at the ‘Average Forward pass’ time and divide by 10 to get the timing per recognition result. I see 337.86 ms as the average, so it takes 34 ms for each image. You can also try leaving off the –gpu=0 flag to see the CPU results, in my case is about 585 ms, so you can see how much Cuda helps!