Voice continues to be a key application for businesses and consumers, but balancing the cost of voice interactions, customer experience, and business productivity remains a key challenge for service providers. The industry faces two main challenges:
1. Costly and Complex Solutions Limit Service Innovation and Mass Market Outreach
The traditional approach to the deployment of speech recognition applications requires external ASR (Automatic Speech Recognition) servers—in-network or cloud—using high performance speech engines to process the entire gamut of speech interactions, from simple keyword or small vocabulary recognition to natural language and long-form transcription. The cost of using these external ASR servers to solve all interaction requirements, even for wake-word detection, has until recently made many applications prohibitive, limiting innovation and mass market penetration.
2. Performance and Latency Issues Impact User Experience
Sending all media files for processing by an external recognizer results in unnecessary latency, particularly if the speech inputs have to be transmitted over the Internet to public cloud servers. Although machines are getting smarter and can learn better, inaccurate keyword recognition, media processing, learning, or analysis and calls to action can initiate wrong processes that deliver either a poor user experience or cost a substantial amount of time and money for business-critical applications. Processing all speech media in the cloud also creates multiple privacy issues.
Figure 2. Challenges with Existing In-Call External Advanced Speech Recognition Solutions
To exploit new market opportunities and extend speech to many more applications, CSPs need to adopt new approaches to overcome these challenges. The traditional approach of leveraging natural language processing technology for everything may be too costly in terms of license and hardware requirements, and too complex for many interactions.
CSPs should consider a new approach that: