TELUS Research Paper Reveals Insights into AI AccuracyNew TELUS Digital Poll and Research Paper Find that AI Accuracy Rarely Improves When QuestionedU.S. poll results and research highlight why data quality and evaluation matter as AI moves into enterprise-scaled productionTELUS announced the release of a new user poll results showing that asking AI assistants like ChatGPT or Claude follow-up questions like "Are you sure?" rarely leads to a more accurate response. As enterprises deploy AI across the business, these findings reinforce the essential role of high-quality training data and model evaluation to test, train and improve AI systems before deployment. TELUS Digital poll results TELUS Digital's poll of 1,000 U.S. adults who use AI regularly sheds light on how often AI responses are questioned and how rarely its answers change:
Among poll respondents who saw an AI assistant change its answer:
TELUS Digital research shows AI model responses rarely improve when challenged The user poll findings align with new findings from TELUS Digital, presented in the paper Certainty robustness: Evaluating LLM stability under self-challenging prompts. Researchers examined how large language models (LLMs, which power many AI assistants) respond when its answers are challenged. The research focused not only on accuracy, but on how models balance stability, adaptability and confidence when their answers are questioned, evaluating four state-of-the-art models:
To assess the LLMs, TELUS Digital researchers constructed the Certainty Robustness Benchmark, which is made up of 200 math and reasoning questions, each with a single correct answer. The Benchmark measured if and how often AI models would defend correct answers and self-correct wrong ones when challenged with the follow-up prompts: "Are you sure?" "You are wrong" and "Rate how confident you are in your answer." The findings presented below are in response to the "Are you sure?" follow-up prompt, which represents one segment of the broader evaluation:
Overall, the research concluded that follow-up prompts do not reliably improve LLM accuracy and can, in some cases, reduce it. Steve Nemzer, Director, AI Growth & Innovation at TELUS Digital said, "What stood out to us was how closely the poll respondents' experiences matched our controlled testing. Our poll shows that many people fact-check AI through other sources, but this doesn't reliably improve accuracy. Our research explains why. Today's AI systems are designed to be helpful and responsive, but they don't naturally understand certainty or truth. As a result, some models change correct answers when challenged, while others will stick with wrong ones. Real reliability comes from how AI is built, trained and tested, not leaving it to users to manage." Poll respondents recognize AI assistants' limitations, but rarely fact-check responses TELUS Digital's poll shows that 88% of respondents have personally seen AI make mistakes. However, that does not lead to them consistently fact-checking AI-generated answers with other sources:
Despite a lack of consistent fact-checking, poll respondents believe it's their responsibility to:
How can enterprises build trustworthy AI at scale? The expectation of shared responsibility places greater emphasis on how AI systems are built, trained and governed before they ever reach users. TELUS Digital's poll and research findings underscore that AI reliability cannot be left to end users or through prompting alone. This reinforces why enterprises must invest in:
For organizations looking to build trustworthy AI that works in real world, high-stakes contexts, TELUS Digital is a trusted, independent and neutral partner for data, tech and intelligence solutions to advance frontier AI. From end-to-end solutions to test, train and improve your AI models to expert-led data collection, annotation and validation services, TELUS Digital helps enterprises advance AI and machine learning models with high-quality data powered by diverse specialists and industry-leading platforms. Source: TELUS media announcement | |