New MIT study shows what you already knew about AI: it doesn't actually understand anything

The OpenAI logo is being displayed on a smartphone with an AI brain visible in the background, in this photo illustration taken in Brussels, Belgium, on January 2, 2024. (Photo illustration by Jonathan Raa/NurPhoto via Getty Images)
(Image credit: MIT)

The latest generative AI models are capable of astonishing, magical human-like output. But do they actually understand anything? That'll be a big, fat no according to the latest study from MIT (via Techspot).

More specifically, the key question is whether the LLMs or large language models at the core of the most powerful chatbots are capable of constructing accurate internal models of the world. And the answer that MIT researchers largely came up with is no, they can't.

To find out, the MIT team developed new metrics for testing AI that go beyond simple measures of accuracy in responses and hinge on what's known as deterministic finite automations, or DFAs.

A DFA is a problem with a sequence of interdependent steps that rely on a set of rules. Among other tasks, for the research navigating the streets of New York City was chosen.

The MIT team found some generative AI models are capable of very accurate turn-by-turn driving directions in New York City, but only in ideal circumstances. When researchers closed some streets and added detours, performance plummeted. In fact, the internal maps implicitly generated by the LLMs by their training processes were full of nonexistent streets and other inconsistencies.

“I was surprised by how quickly the performance deteriorated as soon as we added a detour. If we close just 1 percent of the possible streets, accuracy immediately plummets from nearly 100 percent to just 67 percent,” says lead author on the research paper, Keyon Vafa.

The core lesson here is that the remarkable accuracy of LLMs in certain contexts can be misleading. "Often, we see these models do impressive things and think they must have understood something about the world. I hope we can convince people that this is a question to think very carefully about, and we don’t have to rely on our own intuitions to answer it," says senior paper author Ashesh Rambachan.

More broadly, this research is a reminder of what's really going on with the latest LLMs. All they are actually doing is predicting what word to put next in a sequence based on having scraped, indexed and correlated gargantuan quantities of text. Reasoning and understanding are not inherent parts of that process.

Your next upgrade

Nvidia RTX 4070 and RTX 3080 Founders Edition graphics cards

(Image credit: Future)

Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.

What this new MIT research showed is that LLMs can do remarkably well without actually understanding any rules. At the same time, that accuracy can break down rapidly in the face of real-world variables.

Of course, this won't entirely come as news to anyone familiar with using chatbots. We've all experienced how quickly a cogent interaction with a chatbot can degrade into hallucination or just borderline gibberish following a certain kind of interrogative prodding.

But this MIT study is useful for crystallizing that anecdotal experience into a more formal explanation. We all knew that chatbots just predict words. But the incredible accuracy of some of the responses can sometimes begin to convince you that something magical might just be happening.

This latest study is a reminder that it's almost certainly not. Well, not unless incredibly accurate but ultimately mindless word prediction is your idea of magic.

Jeremy Laird
Hardware writer

Jeremy has been writing about technology and PCs since the 90nm Netburst era (Google it!) and enjoys nothing more than a serious dissertation on the finer points of monitor input lag and overshoot followed by a forensic examination of advanced lithography. Or maybe he just likes machines that go “ping!” He also has a thing for tennis and cars.

Read more
A digitally generated image of abstract AI chat speech bubbles overlaying a blue digital surface.
We need a better name for AI, or we risk talking past each other until actually intelligent AGI comes home mooing
Symbolic photo: Logo of the video platform YouTube on June 07, 2023 in Berlin, Germany.
'It’s a whole new kind of blerp': YouTube's AI-enhanced reply suggestions seem to be working as well as you might expect
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
New research says ChatGPT likely consumes '10 times less' energy than we initially thought, making it about the same as Google search
PC building
ChatGPT vs DeepSeek: which AI can build me a better gaming PC?
An Ai face looks down on a human.
Xbox announces 'a generative AI model for gameplay ideation' called Muse, but don't get too excited: Machines aren't about to make games for you just yet
DeepSeek
Today I learned I can run my very own DeepSeek R1 chatbot on just $6,000 of PC hardware and no megabucks Nvidia GPUs required
Latest in AI
Aloy
'Creepy,' 'ghastly,' 'rancid': Viewers react to leaked video of Sony's AI-powered Aloy
Seattle, USA - Jul 24, 2022: The South Lake Union Google Headquarter entrance at sunset.
Google is rolling out an even more AI-heavy search engine mode because 'power users want AI responses for even more of their searches'
A digitally generated image of abstract AI chat speech bubbles overlaying a blue digital surface.
We need a better name for AI, or we risk talking past each other until actually intelligent AGI comes home mooing
MOUNTAIN VIEW, CALIFORNIA - AUGUST 22: A view of Google Headquarters in Mountain View, California, United States on August 22, 2024.
One educational company accuses Google's AI summary of leading to a 'hollowed-out information ecosystem of little use and unworthy of trust' in latest lawsuit
Nvidia Signs, its AI-led ASL teaching platform
Nvidia has built a free AI-led platform to help teach American Sign Language with '400,000 video clips representing 1,000 signed words' so far
Microsoft Muse-generated gaming in action
'A massive, massive moment of wow.' Microsoft CEO predicts AI-generated games are a 'CGI moment' for the industry
Latest in News
A masked man with an axe in the woods
Rebellion CEO seems kind of awed by major studios making massive videogames: 'How do you organize a game that has 2,000 people working on it?'
A young witch watering a smiling mushroom in a magic garden
Here's a roguelite dungeon crawler Steam reviewers call 'a botanical Diablo' and 'like Cult of the Lamb' except you manage a mystical garden
Destiny 2 Rite of the Nine: The Emissary, massive, ominously standing at the edge of a water basin.
Oops! Bungie rolled out Destiny 2's Rite of the Nine event three weeks early, and new loot is already dropping
Chatacabra from Monster Hunter Wilds
The latest Monster Hunter Wilds event quest gives piles of Armor Spheres for hunting a Chatacabra, making this a very bad week to be a frog in the Forbidden Lands
No Rest for the Wicked Steam early access screenshots
No Rest for the Wicked developer Moon Studios is now 'fully independent' after acquiring the rights to the game from Take-Two
A hunter posing with an absurd Blangonga outfit in Monster Hunter Wilds.
Attention, fashion hunters: There's a Monster Hunter Wilds mod to disable all those obnoxious glowing buff effects that distract from your fits