
Amazon AI Assistant Stumbles Over English Dialects
ITHACA, N.Y. – A new Cornell University study reveals Amazon’s AI shopping assistant, Rufus, gives vague or incorrect responses to users writing in some English dialects, such as African American English (AAE), especially when prompts contain typos.
The paper introduces a framework to evaluate chatbots for harms that occur when AI systems perform worse for users who speak or write in different dialects. The study has implications for the increasing number of online platforms that are incorporating chatbots based on large language models to provide services to users, the researchers said.
“Currently, chatbots may provide lower-quality responses to users who write in dialects. However, this doesn’t have to be the case,” said lead author Emma Harvey, a Ph.D. student at Cornell Tech. “If we train large language models to be robust to common dialectical features that exist outside of so-called Standard American English, we could see more equitable behavior.”
“Chatbots are increasingly used for high-stakes tasks, from education to government services,” said Allison Koenecke , assistant professor and co-author of the study. “We wanted to study whether users who speak and write differently — across dialects and formality levels — have comparable experiences with chatbots trained mostly on ‘standard’ American English.”
To test their framework, the researchers audited Amazon Rufus, a chatbot in the Amazon shopping app. They used a tool called MultiVALUE to convert standard English prompts into five widely spoken dialects: AAE, Chicano English, Appalachian English, Indian English and Singaporean English. The researchers also modified these prompts to reflect real-world use by adding typos, removing punctuation and changing capitalization.
The team found Rufus more often gave low-quality answers that were vague or incorrect when prompted in dialects rather than in Standard American English (SAE). The gap widened when prompts included typos.
“Part of this underperformance stems from specific grammatical rules,” said Koenecke. “This has serious implications for widely-used chatbots like Rufus, which likely underperform for a large portion of users.”
Overall, the authors advocate for dialect-aware AI auditing. They also urge developers to design systems that embrace linguistic diversity.
Funding for the study was supported by grants from Apple Inc. and Renaissance Philanthropy.
https://doi.org/10.1145/3715275.3732137