Large language models (LLMs) like GPT-4 and Claude are celebrated for their ability to write essays, solve complex equations, and synthesize vast amounts of data at incredible speeds. Yet, despite their impressive capabilities, these AI models sometimes stumble on surprisingly simple tasks, such as counting the number of times the letter "r" appears in the word "strawberry." While the correct answer is three, these advanced AI models often mistakenly report it as two.
This discrepancy highlights a fundamental limitation of LLMs: they don't "think" or process information like humans do. Although they can generate coherent text and perform intricate calculations, they lack an understanding of basic concepts like letters and syllables.
Most LLMs are built on a deep learning architecture known as transformers. These models don't read text in the way humans do. Instead, they break down input text into tokens—these tokens can be whole words, syllables, or even individual letters, depending on the specific model. However, the way transformers handle these tokens is abstracted away from the actual letters that make up the words.
"LLMs are based on this transformer architecture, which notably is not actually reading text," explained Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, in an interview with TechCrunch. "When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.’"
This means that when you input a prompt, the model converts the text into numerical representations rather than directly processing the letters. The AI might recognize that "strawberry" is composed of "straw" and "berry," but it may not grasp that "strawberry" is spelled with specific letters in a particular order. As a result, it can struggle with tasks that require an understanding of the text at a more granular level, such as counting letters.
This issue is deeply embedded in the architecture of these models, making it difficult to rectify. The very design that allows LLMs to excel at complex tasks also limits their ability to perform simple, seemingly trivial ones, like counting letters in a word.
The failures of LLMs in these areas serve as a reminder that, despite their advanced capabilities, these AI systems are not sentient beings. They don't possess human-like understanding or reasoning. Instead, they operate based on patterns and associations learned from vast amounts of data, which sometimes leads to errors in tasks that humans find elementary.
As AI continues to evolve, addressing these limitations will be crucial, especially as LLMs are increasingly integrated into everyday applications. For now, though, the fact that they can stumble on something as simple as counting letters serves as a humbling reminder of the gap that still exists between artificial and human intelligence.