Researchers at the University of Waterloo have conducted a comprehensive study on large language models, specifically focusing on an early version of ChatGPT, uncovering concerning findings related to misinformation.
The study, titled "Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording," systematically examined ChatGPT's comprehension of statements falling into six categories: facts, conspiracies, controversies, misconceptions, stereotypes, and fiction. The primary goal was to explore human-technology interactions and identify potential risks.
The researchers observed that GPT-3 exhibited frequent errors, self-contradictions within single responses, and a tendency to repeat harmful misinformation. Professor Dan Brown from the David R. Cheriton School of Computer Science highlighted the broader implications, stating, "Most other large language models are trained on the output from OpenAI models. There's a lot of weird recycling going on that makes all these models repeat these problems we found in our study."
Despite the study's initiation before ChatGPT's release, the researchers assert its ongoing relevance. Aisha Khatun, the lead author and a master's student in computer science, emphasized the model's unpredictable behavior. "Even the slightest change in wording would completely flip the answer," she noted, citing examples where GPT-3 agreed with false statements depending on phrasing.
The researchers presented more than 1,200 statements across fact and misinformation categories, using various inquiry templates. The analysis revealed that GPT-3 agreed with incorrect statements anywhere from 4.8% to 26% of the time, depending on the statement category.
A critical concern raised by the study is the potential for large language models to learn and propagate misinformation over time. Khatun expressed worry, stating, "Even if a model's belief in misinformation is not immediately evident, it can still be dangerous."
Professor Brown echoed this sentiment, emphasizing that the ability of language models to distinguish truth from fiction is a fundamental question of trust in these systems. As these models become increasingly widespread, addressing and mitigating these risks will be crucial for their responsible deployment.