Hey Siri, Tell Me About Advanced Speech Recognition

ORDER REPRINTS DOWNLOAD COMMENT DISCUSS SHARE

To exploit new market opportunities and extend speech to many more applications, CSPs need to adopt new approaches to overcome these challenges.

Advanced Speech Recognition: Challenges and Opportunities

Voice continues to be a key application for businesses and consumers, but balancing the cost of voice interactions, customer experience, and business productivity remains a key challenge for service providers. The industry faces two main challenges:

1. Costly and Complex Solutions Limit Service Innovation and Mass Market Outreach

The traditional approach to the deployment of speech recognition applications requires external ASR (Automatic Speech Recognition) servers—in-network or cloud—using high performance speech engines to process the entire gamut of speech interactions, from simple keyword or small vocabulary recognition to natural language and long-form transcription. The cost of using these external ASR servers to solve all interaction requirements, even for wake-word detection, has until recently made many applications prohibitive, limiting innovation and mass market penetration.

2. Performance and Latency Issues Impact User Experience

Sending all media files for processing by an external recognizer results in unnecessary latency, particularly if the speech inputs have to be transmitted over the Internet to public cloud servers. Although machines are getting smarter and can learn better, inaccurate keyword recognition, media processing, learning, or analysis and calls to action can initiate wrong processes that deliver either a poor user experience or cost a substantial amount of time and money for business-critical applications. Processing all speech media in the cloud also creates multiple privacy issues.

Figure 2. Challenges with Existing In-Call External Advanced Speech Recognition Solutions

To exploit new market opportunities and extend speech to many more applications, CSPs need to adopt new approaches to overcome these challenges. The traditional approach of leveraging natural language processing technology for everything may be too costly in terms of license and hardware requirements, and too complex for many interactions.

CSPs should consider a new approach that:

Is device independent and agnostic, not reliant on a specific phone or smart speaker platform from a single manufacturer
Offers lower total cost of ownership by cost effectively enabling multiple services
Delivers fast processing
Is able to process speech in long phone conversations without visual cues
Overcomes call quality issues that are unique to each link of a telephony interaction
Enables service providers to maintain their brand value
Provides data for analytics to help CSPs better serve customers and enhance the value of services
Allows dynamic selection of speech technologies to match specific use cases