Ok, I thought about leaving this as a one-word blog post, but even though I can categorically state that it isn’t happening, the fact that this question comes up regularly in my everyday life, and that I worked on always-on audio when I was at Google, makes me want to expand on this a bit.
A good starting point is this BBC article from 2016 asking “Is your smartphone listening to you?“, which includes the common anecdote of an ad that seems like it was triggered by a recent conversation, an investigation into the technical possibility that it could be happening, and denials from Google, Facebook, and Amazon that what users suspect is actually occurring. I worked for years on the infrastructure Google uses for the machine learning models to recognize speech triggers like “Hey Google”, so if you trust me you can take my word that we didn’t have the capability to do what people are concerned about. Even if you don’t trust me, there are public papers from Google and Apple that go into detail about how the always-on system in Android and iOS phones works. The summary is that in order to run even when most of the phone (including the CPU) is powered down, the microphone data has to be processed by a subsystem that is extremely constrained, because to avoid draining the battery it can only consume something like ten milliwatts. For comparison, a Cortex A processor used for the main CPU (or application processor) can easily burn a watt or more. To run at such low power, this subsystem has a lot less memory and compute than the application processor, often only a few hundred kilobytes of RAM and runs at a frequency in the low hundreds of megahertz. This makes running full speech recognition, or even listening for more than a few keywords, impractical from an engineering perspective. The Google research teams have managed some minor miracles like squeezing “Now Playing” onto the Pixel’s always-on subsystem, listening out for when music is playing and waking up the application processor to identify it, but it took incredible ingenuity to fit that into the memory budget available. Even though the article states the security researchers built a proof of concept app that didn’t use much power, they don’t link to any code or power measurements. Since regular Android developers can’t run apps on the always-on subsystem (it’s restricted to phone manufacturers) their app must have been running on the application processor, and I’m willing to bet a lot of money you’d notice your battery draining fast if the main CPU was awake for long periods.
So, I would have been directly involved in any code that did the kind of conversational spying that many people incorrectly suspect is happening, and I’m in a good position to categorically say it isn’t. Why should you trust me though? Or to put it another way, how can an everyday user verify my statement? The BBC article is a bit unsatisfying, because they have security researchers create a proof of concept for an app that listens to conversations, and then state that the companies involved deny that they are doing this. Even if you have faith in the big tech firms involved, I know from my own experience that their engineers can make mistakes and leak information accidentally. My knowledge is also aging, technology keeps improving and running full speech recognition on an always-on chip won’t always be out of reach.
That gap, the fact that we have to trust the word of phone manufacturers that they aren’t spying on us and that there’s no good way for a third party to verify that promise, is what I’ll be focusing on in my research. I believe it should be possible to build voice interfaces and other devices with microphones and cameras in such a way that someone like Underwriters’ Laboratories or Consumer Reports can test their privacy guarantees. I’ve already explored some technical solutions in the past, but I think it’s important to gather a coalition of people interested in the broader questions. With that in mind, if you are a researcher or engineer either in academia or industry who’s interested in this area, drop me an email at firstname.lastname@example.org. I’m hoping we can organize some kind of symposium and discussion groups to figure out the best practices. I believe that we as computer scientists can do better than just asking the public to blindly trust corporations to do the right thing, so let’s figure out how!
Pingback: Machines of Loving Understanding « Pete Warden's blog