Why ML interfaces will be more like pets than machines

cyborg_dogPhoto by Dave Parker

When I talk to people about what’s happening in deep learning, I often find it hard to get across why I’m so excited. If you look at a lot of the examples in isolation, they just seem like incremental progress over existing features, like better search for photos or smarter email auto-replies. Those are great of course, but what strikes me when I look ahead is how the new capabilities build on each other as they’re combined together. I believe that they will totally change the way we interact with technology, moving from the push-button model we’ve had since the industrial revolution to something that’s more like a collaboration with our tools. It’s not a perfect analogy, but the most useful parallel I can think of is how our relationship with pets differs from our interactions with machines.

To make what I’m saying more concrete, imagine a completely made-up device for helping around the house (I have no idea if anyone’s building something like this, so don’t take it as any kind of prediction, but I’d love one if anybody does get round to it!). It’s a small indoors drone that assists with the housework, with cleaning attachments and a grabbing arm. I’ve used some advanced rendering technology to visualize a mockup below:

mopbot

Ignoring all the other questions this raises (why can’t I pick up my own socks?), here are some of the behaviors I’d want from something like this:

  • It only runs when I’m not home.
  • It learns where I like to put certain items.
  • It can scan and organize my paper receipts and mail.
  • It will help me find mislaid items.
  • It can be summoned with a voice command, or when it hears an accident.

Here are the best approaches I can think of to meet those requirements without using deep learning:

  • It only runs when I’m not home.
    • Run on a fixed schedule I program in.
  • It learns where I like to put certain items.
    • Puts items in fixed locations.
  • It can scan and organize my paper receipts and mail.
    • Can OCR receipts, but identifying them in the clutter is hard.
  • It will help me find mislaid items.
    • Not possible.
  • It can be summoned with a voice command, or when it hears an accident.
    • Difficult and hard to generalize.

These limitations are part of the reason nothing like this has been released. Now, let’s look at how these challenges can be met with current deep learning technology:

  • It only runs when I’m not home.
    • Person detection.
  • It learns where I like to put certain items.
    • Object classification.
  • It can scan and organize my paper receipts and mail.
    • Object classification and OCR.
  • It will help me find mislaid items.
    • Natural language processing and object classification.
  • It can be summoned with a voice command, or when it hears an accident.
    • Higher-quality voice and audio recognition.

The most important part about all these capabilities is that for the first time they are starting to work reliably enough to be useful, but there will still be plenty of mistakes. For this application we’re actually asking the device to understand a lot about us and the world around it, and make decisions on its own. I believe we’re at a point where that’s now possible, but their fallibility deeply changes how we’ll need to interact with products. We’ll benefit as devices become more autonomous, but it also means we’ll need to tolerate more mistakes and find ways to give feedback so they can learn smarter behaviors over time.

This is why the only analogy that I can think of to what’s coming is our pets. They don’t always do what we want, but (sometimes) they learn and even when they don’t they bring so much that we’re happy to have them in our lives. This is very different from our relationship with machines. There we’re always deciding what needs to happen based on our own observations of the world, and then instructing our tools to do exactly as we order. Any deviation from the behavior we specify is usually a serious bug, but there’s no easy way to teach changes, we usually have to build a whole new version. They will also carry out any order, no matter how little sense it might make. Everything from a Spinning Jenny to a desktop GUI relies on the same implicit command and control division of labor between people and tools.

Ever since we started building complex machines this is how our world has worked, but the advances in deep learning are going to take us in a different direction. Of course, tools that are more like agents aren’t a new idea, and there have been some notable failures in the past.

clippy Photo by Rhonda Oglesby

So what’s different? I believe machine learning is now able to do a much better job of understanding user behavior and the surrounding world, and so we won’t be in the uncanny valley that Clippy was stuck in, aggressively misunderstanding people’s intent and then never learning from their evident frustration. He’s a good reminder of the dangers that lurk along the path of autonomy though. To help think about how future interfaces will be developing, here are a few key areas I see them differing in from the current state of the art.

Fallible versus Foolproof

The world is messy, and so any device that’s trying to make sense of it will need to interpret unclear data and make the best decisions it can. There still need to be hard limits around anything to do with safety, but deep learning products will need to be designed with inevitable mistakes in mind. The cost of any mistakes will have to be much less than the value of the benefits they bring, but part of that cost can be mitigated by design, so that it’s easy to cancel actions or there’s more of a pause and request for confirmation when there’s uncertainty.

Learning versus Hardcoded

One of the hardest problems when you work with complex deep learning models is how to run a quality assurance process, and it only gets tougher once systems can learn after they’re deployed. There’s no substitute for real-world testing, but the whole process of evaluating products will need to be revamped to cope with more flexible and unpredictable responses. Tay is another cautionary tale for what can go wrong with uncontrolled learning.

Attentive or Ignorant

Traditional tools wait to be told what to do by their owner, and don’t have any concept of common sense. Even if the house is burning down around it, your television won’t try to wake you up. Future products will have a much richer sense of what’s happening in the world around them, and will be expected to respond in sensible ways to all sorts of situations outside of their main function. This is vital for smart devices to become truly useful but vastly expands the “surface” of their interfaces, making designs based around flow charts impossible.

I definitely don’t have all the answers for how we’ll deal with this new breed of interfaces, but I do know that we need some new ways of thinking about them. Personally I’d much rather spend time with pets than machines, so I hope that I am right about where we’re headed!