Home > Uncategorized > The art of not pissing people off with voice-operated systems

The art of not pissing people off with voice-operated systems

November 18, 2011

In one of Robert Heinlein’s novels, Maureen Johnson (Lazarus Long’s mother) wakes up in a strange hotel next to a dead person. She calls the concierge, and, not impressed by his erudition, threatens to come take it out of his hide. “You’re welcome to try,” says the concierge; “I am bolted to the floor in the third sub-basement”. She has, unawares, been talking to a computer.

Although the story takes place in the distant future, the voice-operated computer system still manages to do what similar systems do today: piss off customers. But the mode of this pissing-off has changed over the years. Early systems were annoying simply because they didn’t recognize voices accurately – sometimes with comical results. Heinline’s heroine is upset because the hotel system can’t perform extremely high-level functions that would challenge a human concierge. Today we’re somewhere in the middle, where computer systems have gotten spectacularly better at recognizing speech, but grate our nerves from their location in the uncanny valley.

“The Uncanny Valley” generally refers to computer graphics or robotics portraying human faces that are almost, but not enough human. It’s creepy, the way all unliving things that pretend to be alive* are creepy. I propose that it can also apply to voice-operated systems.

When I dial into a voice-operated system, it engages in several programmed pleasantries in an apparent attempt to put me at ease:

Unliving Thing (UT), speaking in a female voice: “Please state your sixteen-digit credit card number.”
Me: (says number as clearly as I can)
UT: “OK. When you hear what you want me to do, just say it back to me.”

Stop right there. Put yourself in my place, scriptor of the Unliving Thing. The moment I hear a computer-selected recording (and I totally recognize this woman’s voice from three systems so far) I know it’s going to be a long haul before I get to talk to a person. That’s fine if I called to check a balance or report a stolen card – I don’t need a person for that.  But if I am calling for a more complicated interaction – something I know a computer isn’t going to be able to handle – the computer pretending to be a person becomes an obstacle that I know I will need to wait out and maneuver around. So stop trying to butter me up.

The result is that by the time I actually do get to talk to a human, I’m already forcing myself to be calm.  This is not a good way to begin any conversation, and I wonder if it increases stress for the human representatives on the other end of five levels of computer gatekeeping. It isn’t their fault; they’re trying to be as helpful as they can, picking up the wreckage of the customer’s mood to address the problem.

Corporations can’t, by the way, put “talk to a person” in the first menu; that would have the effect of routing almost all traffic around the computer and overloading the human operators.  Many people have not adjusted to the idea that for basic information, they should seek out a computer rather than a human operator. But many are catching on.

So here are three suggestions:

  1. Stop using recorded human voice clips selected by a computer.  People have human voices. A recorded human voice makes some part of my brain irrationally expect human-level understanding.  A good voice synthesizer sets my unconscious expectations at the “I’m dealing with a computer” level.
  2. Focus on what computers are good at, which is efficiency. People say “OK”. People refer to themselves in the first person. Computers are things and I damn well know I’m talking to a computer.  Putting all those human-touch flourishes on the system doesn’t put me at ease; it only widens the gulf between my unconscious and conscious expectations. Let humans engage in pleasantries.  (Actually the word “OK” might be OK, as long as it isn’t in a human voice.  But computers referring to themselves in the first person should wait for when they start doing it on their own.)
  3. Give people some hint where they are in the menu structure.  Perhaps something like: “Most transactions can be efficiently handled in the following three menus.  Please listen to all options.”

I titled this post “The art…” but this kind of process development is part art, part science, with a lot of usability and A/B testing.  It’s the kind of thing industry is very good at, once they know they need to do it.

* Like John Boehner. So close to human, yet so far, it creeps me out every time I see a picture of it. The android Sonny in “I, Robot” was way more human than that thing will ever be.

Categories: Uncategorized
  1. November 18, 2011 at 13:57 | #1

    Way back when, IBM were demonstrating speaker independent voice recognition (for MS-DOS commands) at the Hanover Trade Fair on a PC.

    With moderate success.

    Until I leaned over and said into the demo mike “See colon format yes yes” ;-)

  2. Chas, PE SE
    November 20, 2011 at 15:59 | #2

    “To Sail Beyond the Sunset”. Also in that book, when one asked for ‘Telephone’ a simulcarum of a human head popped out of the wall.

    Agree with your characterization of computer voices, and how people interact with them. If the menu said, “for an operator press ’0′” in the first menu, prectically everyone would press ’0′, aND ASK WHAT THE ADDRESS WAs, for the fax number, what were the hours, etc.

    Friend of mine was using an early voice operation system. It didn’t speak Texan. As his frustration built, I looked at him and said, “Rotate the pod, HAL”. He broke up.

Comments are closed.