2017 May « Pete Warden's blog

crown-gecko-967335_960_720

I gave a talk last week at the Embedded Vision Summit, and one question that came up was how to run neural networks trained in TensorFlow on tiny, power-efficient CPUs like the EFM32. This microcontroller is based on an ARM M4 design, and uses about 2.5 milliwatts when running at 40 MHz. It’s not much to work with, but it does at least have 256KB of RAM and 1024KB of flash program memory. This is more than enough to run some simple neural networks, especially for tasks like audio hotword detection that don’t need ultra-high accuracy to be useful. It is a tricky task though, since it requires a combination of ML and embedded hardware expertise. Here’s how I would tackle it:

Make sure you can train a model on the desktop that achieves the accuracy you need for your product, ignoring embedded constraints. Usually the hardest part here is getting enough relevant training data, since for good results you’ll need something that reflects the actual data you’ll be seeing on the device.

Once you have a proof of concept model working, try to shrink it down to fit the constraints of your device. This will involve:

Making sure the number of weights is small enough to fit in RAM. You can compress them down to eight bit in most cases to help. The tensorflow/tools/graph_transform/summarize_graph tool will give you estimates on the number of weights.
Reduce the size of the fully-connected and convolutional layers so that the number of compute operations is small enough to run within your device’s budget. The tensorflow/tools/benchmark utility with –show_flops will give you an estimate of this number. For example, I might guess that an M4 could do 2 MFLOPs/second, and so aim for a model that fits in that limit.

When you have a model that looks like it should fit, write a custom runtime to execute it. Right now I wouldn’t recommend using the standard TensorFlow runtime on an embedded device because the binary size overhead of a general interpreter is too large. Instead, manually write a piece of code that loads your weights and does the matrix multiplies. Usually your model will be comparatively simple, just a couple of fully-connected layers for example, so hopefully the amount of coding won’t be overwhelming. I am hoping we have a better solution and some examples for this use case in the future though, since I think it’s an area where neural network solutions are going to make a massive impact!

	Ideal Dataset Size f… on How many images do you need to…
	How to set up Raspbe… on Why has the Internet of Things…
	Thomas on Launching Moonshine Micro
	bouquetsweetly69036a… on Meet Fiona and Abby
	softlysuitcb91a8b8b1 on Meet Fiona and Abby

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

Monthly Archives: May 2017

Running TensorFlow Graphs on Microcontrollers

How the TensorFlow team handles open source support