I loved the original Raspberry Pi, it was a great platform to run deep neural networks on, especially with a fully-programmable GPU. I was excited when the new Pi 2 was released, because it was even more powerful for the same low price. Unfortunately I heard back from early users that the GPU code I had been using no longer worked, the device just crashed when the example program was run.
I ordered a Pi 2, and this weekend I was finally able to devote a few hours to debugging the problems. The bad news is that I wasn’t able to figure out why the GPU code is being problematic. The good news is that the CPU’s so improved on the Pi 2 that I’m able to run even faster without it, in 3.2 seconds!
I’ve checked in my changes, and you can see full directions in the README, but the summary is that by using Eigen and gcc 4.8, NEON code on the CPU is able to run the matrix calculations very fast. One of my favorite parts of joining Google has been all the open-source heroes I’ve been able to hang out with, and I’ve got to know Benoit Jacob , the founder, and Benoit Steiner, a top contributor to the Eigen project. I knew they’ve been doing amazing work improving ARM performance, so I was hopeful that the latest version would be a big step forward. I was pleased to discover that the top of tree is almost 25% faster than the last stable release in January!
Let me know how you get on if you do dive in. I’ve had a lot of fun with this, and I hope you do too!