2016 December « Pete Warden's blog

Photo by Stephen D. Strowes

One of the most interesting things about neural networks for me is that they’re programs you can do meaningful computation on. The most obvious example of that is automatic differentiation, but even after you’ve trained a model there are lots of other interesting transformations you can apply. These can be as simple as trimming parts of the graph that aren’t needed for just running inference, all the way to folding batch normalization nodes into precalculated weights, turning constant sub expressions into single nodes, or rewriting calculations in eight bit.

Many of these operations have been available as piecemeal Python scripts inside the TensorFlow codebase, but I’ve spent some time rewriting them into what I hope is a much cleaner and easier to extend C++ Graph Transform Tool. As well as a set of predefined operations based on what we commonly need ourselves, I’ve tried to create a simple set of matching operators and other utilities to encourage contributors to create and share their own rewriting passes.

I think there’s a lot of potential for computing on compute graphs, so I’m excited to hear what you can come up with! Do cc me (@petewarden) on github too with any issues you encounter.

Picture by Torley

A few months ago I returned to my home town of Cambridge to attend the first ARM Research Summit. What was special about this conference was that it focused on introducing external researchers to each other, rather than pushing ARM’s own agenda. They had invited a broad range of people they worked with, from academic researchers to driver engineers, and all we had in common was that we spent a lot of time working on the ARM platform. This turned out fantastically for me at least, because it meant I had the chance to learn from experts in fields I knew nothing about. As such, it left my mind spinning a little, and so this post is a bit unusual! I’m trying to clarify gut feelings about the future with some actual evidence, so please bear with me as I work through my reasoning.

One of my favorite talks was on energy harvesting by James Myers. This table leapt out at me (apologies to James if I copied any of his figures incorrectly):

Energy harvesting rules of thumb:

Human vibration – 4µW/cm2
Industrial vibration – 100µW/cm2
Human temperature difference – 25µW/cm2
Industrial temperature difference – 1 to 10 mW/cm2
Indoor light – 10µW/cm2
Outdoor light – 10mW/cm2
GSM RF – 0.1µW/cm2
Wifi RF – 0.001µW/cm2

What this means in plain English is that you can expect to harvest four micro-watts (millionths of a watt or µW) for every square centimeter of a device relying on human vibration. A solar panel in sunlight could gather ten milliwatts (thousandths of a watt or mW) for every square centimeter. If you think about an old incandescent bulb, that burns forty watts, and even a modern cell phone probably uses a watt or so when it’s being actively used, so the power you can get from energy harvesting is clearly not enough for most applications. My previous post on smartphone energy consumption shows that even running an accelerometer takes over one twenty milliwatts, so clearly it’s hard to build devices that rely on these levels of power.

Why does that matter? I’m convinced that smart sensors are going to be massively important in the future, and that vision can’t work if they require batteries. I believe that we’ll be throwing tiny cheap devices up in the air like confetti to scatter around all the environments we care about, and they’ll result in a world we can intelligently interact with in unprecedented ways. Imagine knowing exactly where pests are in a crop field so that a robot can manually remove them rather than indiscriminately spraying pesticides, or having stickers on every piece of machinery in a factory that listen to the sounds and report when something needs maintenance.

These sort of applications will only work if the devices can last for years unattended. We can already build tiny chips that do these sort of things, but we can’t build batteries that can power them for anywhere near that long, and that’s unlikely to change soon.

Can the cloud come to our rescue? I’m a software guy, but everything I see in the hardware world shows that transmitting signals continuously takes a lot of energy. Even with a protocol like BLE sending data just a foot draws more than 10 milliwatts. There seems to be an enduring relationship between power usage and the distance you’re sending the data, with register access cheaper than SRAM, which is far cheaper than DRAM, which beats radio transmission.

That’s why I believe our only hope for long-lived smart sensors is driving down the energy used by local compute to the point at which harvesting gives enough power to run useful applications. The good news is that existing hardware like DSPs can perform a multiply-add for just low double-digit picojoules, and can access local SRAM to avoid the costs of DRAM. If you do the back of the envelope calculations, a small image network like Inception V1 takes about 1.5 billion multiply-adds, so 20 picojoules * 1.5 billion gives a rough energy cost of 30 millijoules per prediction (or 30 milliwatts at 1 prediction per second). This is already an order of magnitude less energy than the equivalent work done on a general-purpose CPU, so it’s a good proof that it’s possible to dramatically reduce computational costs, even though it’s still too high for energy harvesting to work.

That’s where another recurrent theme of the ARM research conference started to seem very relevant. I didn’t realize how much of a problem keeping results reliable is as the components continue to shrink. Increasingly large parts of the design are devoted to avoiding problems like Rowhammer, where accesses to adjacent DRAM rows can flip bits, as Onur Mutlu explained. It’s not just memory that faces problems like these, CPUs also need to be over-engineered to avoid errors introduced by current leakage and weirder quantum-level effects.

I was actually very excited when I learned this, because one of the great properties of neural networks is that they’re very resilient in the face of random noise. If we’re going to be leaving an increasing amount of performance on the table to preserve absolute reliability for traditional computing applications, that opens the door for specialized hardware without those guarantees that will be able to offer increasingly better energy consumption. Again, I’m a software engineer so I don’t know exactly what kinds of designs are possible, but I’m hoping that by relaxing constraints on the hardware the chip creators will be able to come up with order-of-magnitude improvements, based on what I heard at the conference.

If we can drive computational energy costs down into the femtojoules per multiply-add, then the world of ambient sensors will explode. As I was writing, I ran across a new startup that’s using deep learning and microphones to predict problems with machinery, but just imagine when those, along with seismic, fire, and all sorts of other sensors are scattered everywhere, too simple to record data but smart enough to alert people when special conditions occur. I can’t wait to see how this process unfolds, but I’m betting unreliable electronics will be a key factor in making it possible.

	ademidun on Leave a trail of breadcru…
	oi0841oi on Understanding the Raspberry Pi…
	Pete Warden on Understanding the Raspberry Pi…
	oi0841oi on Understanding the Raspberry Pi…
	Alan on Understanding the Raspberry Pi…

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

Monthly Archives: December 2016

Rewriting TensorFlow Graphs with the GTT

AI and Unreliable Electronics (*batteries not included)