Sound Communication: The Holdcom Blog

Voice Recognition and Telephony


In our October Newsletter, Holdcom featured a clip from the critically acclaimed sketch show Burnistoun about two Scottish gentlemen struggling with an elevator voice-recognition system. The sketch played off of the myth that voice recognition systems (or the average English speaker) cannot decipher thick Scottish accents.

However, a recent study has determined that gender, and not accents, may contribute to voice recognition errors:

“Researchers up at the University of Edinburgh have determined that the male voice is harder for voice recognition software to pick up and understand than its female counterpart. This conclusion was reached after telephone conversation recordings were run through a battery of tests, which revealed that men seem to say "umm" and "err" more often, while also identifying that the greatest difficulties arise with words that sound similar and can arise in the same context, such as "him" and "them." Equally troubling is the first word in a sentence, as it comes without context and therefore doesn't benefit from any predictive assistance.” Engadget.com

Pitching and frequency range, as well as speaking habits, determine whether voice recognition understands individuals. To make matters worse, many phone systems that use voice activation do not provide an a touchtone option to bypass the system – usually the caller must say “operator.” Even if touchtone options are available, escape options aren’t usually offered until the end of a long stream of directions – double in length, because you can “please press one” or “please say one.”

Voice recognition has faced severe neglect since its inception in 1982. According to Rest in Peas: The Unrecognized Death of Speech Recognition by Robert Fortner, “The agency [DARPA] financed investigations into conversational speech recognition but shifted priorities and money, after accuracy plateaued [80%].” Not only that, but researches claim “we need artificial intelligence if computers are going to understand us,” but so far developers “rather than challeng[ing] a cherished belief, [are scaling] it back until it fades away.”

What does this show for the audio marketing industry? If “sticking to a few topics, like numbers…approaches 100% accuracy,” could a system be developed where professionally recorded announcements, voice recognition, and IVR structuring work in conjunction? Or is voice recognition too risky for a customer based, human-interaction market?

Here are some related links:

Tags: on hold marketing, audio branding, audio marketing, voice production