New MIT study shows what you already knew about AI: it doesn't actually understand anything

The OpenAI logo is being displayed on a smartphone with an AI brain visible in the background, in this photo illustration taken in Brussels, Belgium, on January 2, 2024. (Photo illustration by Jonathan Raa/NurPhoto via Getty Images)
(Image credit: MIT)

The latest generative AI models are capable of astonishing, magical human-like output. But do they actually understand anything? That'll be a big, fat no according to the latest study from MIT (via Techspot).

More specifically, the key question is whether the LLMs or large language models at the core of the most powerful chatbots are capable of constructing accurate internal models of the world. And the answer that MIT researchers largely came up with is no, they can't.

To find out, the MIT team developed new metrics for testing AI that go beyond simple measures of accuracy in responses and hinge on what's known as deterministic finite automations, or DFAs.

A DFA is a problem with a sequence of interdependent steps that rely on a set of rules. Among other tasks, for the research navigating the streets of New York City was chosen.

The MIT team found some generative AI models are capable of very accurate turn-by-turn driving directions in New York City, but only in ideal circumstances. When researchers closed some streets and added detours, performance plummeted. In fact, the internal maps implicitly generated by the LLMs by their training processes were full of nonexistent streets and other inconsistencies.

“I was surprised by how quickly the performance deteriorated as soon as we added a detour. If we close just 1 percent of the possible streets, accuracy immediately plummets from nearly 100 percent to just 67 percent,” says lead author on the research paper, Keyon Vafa.

The core lesson here is that the remarkable accuracy of LLMs in certain contexts can be misleading. "Often, we see these models do impressive things and think they must have understood something about the world. I hope we can convince people that this is a question to think very carefully about, and we don’t have to rely on our own intuitions to answer it," says senior paper author Ashesh Rambachan.

More broadly, this research is a reminder of what's really going on with the latest LLMs. All they are actually doing is predicting what word to put next in a sequence based on having scraped, indexed and correlated gargantuan quantities of text. Reasoning and understanding are not inherent parts of that process.

Your next upgrade

Nvidia RTX 4070 and RTX 3080 Founders Edition graphics cards

(Image credit: Future)

Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.

What this new MIT research showed is that LLMs can do remarkably well without actually understanding any rules. At the same time, that accuracy can break down rapidly in the face of real-world variables.

Of course, this won't entirely come as news to anyone familiar with using chatbots. We've all experienced how quickly a cogent interaction with a chatbot can degrade into hallucination or just borderline gibberish following a certain kind of interrogative prodding.

But this MIT study is useful for crystallizing that anecdotal experience into a more formal explanation. We all knew that chatbots just predict words. But the incredible accuracy of some of the responses can sometimes begin to convince you that something magical might just be happening.

This latest study is a reminder that it's almost certainly not. Well, not unless incredibly accurate but ultimately mindless word prediction is your idea of magic.

TOPICS
Jeremy Laird
Hardware writer

Jeremy has been writing about technology and PCs since the 90nm Netburst era (Google it!) and enjoys nothing more than a serious dissertation on the finer points of monitor input lag and overshoot followed by a forensic examination of advanced lithography. Or maybe he just likes machines that go “ping!” He also has a thing for tennis and cars.