I was at the Marine Robotics Forum at Woods Hole Oceanographic Institute last week, and it set a lot of wheels turning in my head. As a newcomer to the field, I was blown away by some of the demonstrations like the Swarm Diver underwater drones, the ChemYak data-gathering robot, Drone Tugs, and WaveGliders. Operating underwater means that the constraints are very different from terrestrial robots, for instance it’s almost impossible to send appreciable amounts of data wirelessly more than a few hundred meters. I was struck by one constraint that was missing though – nobody had to worry about privacy in the deep sea.
One of the reasons I’m most excited about TinyML is that adding machine learning to microphones and cameras will let us create smart sensors. I think about these as small, cheap, self-contained components that do a single job like providing a voice interface, detecting when a person is nearby (like an infra-red motion sensor that won’t be triggered accidentally by animals or trees moving in the wind), or spotting different hand gestures. It’s possible to build these using audio and image sensors combined with machine learning models, but only expose a “black box” interface to system builders that raises a pin high when there’s a person in view for example, or has an array of pins corresponding to different commands or gestures. This standardization into easy to use modules is the only way we’ll be able to get mass adoption. There’s no reason they can’t be as simple to wire into your design as a button or temperature sensor, and just as cheap.
I see this future as inevitable in a technical sense, probably sooner rather than later. I’m convinced that engineers will end up building these devices because they’re starting to become possible to create. They’ll also be very useful, so I expect they’ll see widespread adoption across all sorts of consumer, automotive, and industrial devices. The reason privacy has been on my mind is that if this does happen, our world will suddenly contain many more microphones and cameras. We won’t actually be using them to record sound or video for playback, in the traditional sense, they’ll just be implementation details of higher-level sensors. This isn’t an easy story to explain to the general public though, and even for engineers on a technical level it may be hard to guarantee that there’s no data leakage.
I believe we have an opportunity right now to engineer-in privacy at a hardware level, and set the technical expectation that we should design our systems to be resistant to abuse from the very start. The nice thing about bundling the audio and image sensors together with the machine learning logic into a single component is that we have the ability to constrain the interface. If we truly do just have a single pin output that indicates if a person is present, and there’s no other way (like Bluetooth or WiFi) to smuggle information out, then we should be able to offer strong promises that it’s just not possible to leak pictures. The same for speech interfaces, if it’s only able to recognize certain commands then we should be able to guarantee that it can’t be used to record conversations.
I’m a software guy, so I can’t pretend to know all the challenges that might face us in trying to implement something like this in hardware, but I’m given some hope that it’s possible from the recent emergence of cryptography devices for microcontrollers. There will always be ways determined attackers may be able to defeat particular approaches, but what’s important is making sure there are significant barriers. Here are some of the properties I’d expect we’d need:
– No ability to rewrite the internal program or ML model on the device once it has been created.
– No radio or other output interfaces exposed.
– As simple an interface as possible for the application. For example for a person detector we might specify a single output pin that updates no faster than once per second, to make it harder to sneak out information even if the internal program is somehow compromised.
– As little memory on the device as possible. If it can’t store more than one frame of 320×320 video, or a few seconds of audio, then you won’t be able to effectively use it as a recording device even if other protections fail.
I would also expect we’d need some kind of independent testing and certification to ensure that these standards are being met. If we could agree as an industry on the best approach, then maybe we can have a conversation with the rest of the world about what we see coming, and how these guarantees may help. Establishing trust early on and promoting a certification that consumers can look for would be a lot better than scrambling to reassure people after they’ve been unpleasantly surprised by a privacy hack.
These are just my personal thoughts, not any kind of official position, and I’m posting this in the hope I’ll hear from others with deep experience in hardware, privacy, and machine learning about how to improve how we approach this new wave of challenges. How do you think we can best safeguard our users?