New ChatGPT bot is out, promises to hallucinate less

ChatGPT logo on a trippy green background.
(Image credit: MirageC / Getty/ OpenAI)

OpenAI is bringing us GPT-4, the next evolution of everyone's favourite chatbot, ChatGPT. On top of a more advanced language model that "exhibits human-level performance on various professional and academic tests" the new version accepts image inputs, and promises more stringent refusal behaviour to stop it from fulfilling your untoward requests.

The accompanying GPT-4 Technical Report (PDF) warns, however, that the new model still has a relatively high capacity for what the researchers are calling "hallucinations". Which sounds totally safe.

What the researchers mean when they refer to hallucinations is that the new ChatGPT model, much like the previous version, has the tendency to "produce content that is nonsensical or untruthful in relation to certain sources." 

Though the researchers make it clear that "GPT-4 was trained to reduce the model’s tendency to hallucinate by leveraging data from prior models such as ChatGPT." Not only are they training it on its own fumbles, then, but they've also been training it through human evaluation

"We collected real-world data that had been flagged as not being factual, reviewed it, and created a ’factual’ set for it where it was possible to do so. We used this to assess model generations in relation to the ’factual’ set, and facilitate human evaluations."

The process appears to have helped significantly when it comes to closed topics, though the chatbot is still having trouble when it comes to the broader strokes. As the paper notes, GPT-4 is 29% better than GPT-3.5 when it comes to 'closed-domain' chats, but only 19% better at avoiding 'open-domain' hallucinations. 

ITNEXT explains the difference between open- and closed-domain, in that "Closed-domain QA is a type of QA system that provides answers based on a limited set of information within a specific domain or knowledge base." Open-domain QA systems instead "provide answers based on a vast array of information available on the internet, and is best suited for specific, limited information needs."

So yeah, we're still likely to see Chat GPT-4 straight up lying to us about stuff.

Of course, users are going to be upset about the chatbot feeding them false information, though this isn't the biggest problem. One of the main issues is "overreliance". The tendency to hallucinate "can be particularly harmful as models become increasingly convincing and believable, leading to overreliance on them by users" the paper says.

"Counterintuitively, hallucinations can become more dangerous as models become more truthful, as users build trust in the model when it provides truthful information in areas where they have some familiarity." It's natural for us to trust a source if it's been accurate before, but a broken clock it right twice a day, as they say.

Your next upgrade

(Image credit: Future)

Best CPU for gaming: The top chips from Intel and AMD
Best gaming motherboard: The right boards
Best graphics card: Your perfect pixel-pusher awaits
Best SSD for gaming: Get into the game ahead of the rest

Overreliance becomes particularly problematic when the chatbot is integrated into automated systems that help us make decisions within society. This can cause a feedback loop that can lead to "degradation of overall information quality."

"It’s crucial to recognize that the model isn’t always accurate in admitting its limitations, as evidenced by its tendency to hallucinate."

Issues aside, the devs seem pretty optimistic about the new model, at least according to the GPT-4 overview on the OpenAI site.

"We found and fixed some bugs and improved our theoretical foundations. As a result, our GPT-4 training run was (for us at least!) unprecedentedly stable".

We'll see about that when it starts up with the gaslighting again, though the meltdowns we've been hearing about are mostly coming through Bing's ChatGPT integration.

ChatGPT-4 available right now for ChatGPT Pro users, though even paying customers should expect the service to be "severely capacity constrained". 

Katie Wickens
Hardware Writer

Screw sports, Katie would rather watch Intel, AMD and Nvidia go at it. Having been obsessed with computers and graphics for three long decades, she took Game Art and Design up to Masters level at uni, and has been rambling about games, tech and science—rather sarcastically—for four years since. She can be found admiring technological advancements, scrambling for scintillating Raspberry Pi projects, preaching cybersecurity awareness, sighing over semiconductors, and gawping at the latest GPU upgrades. Right now she's waiting patiently for her chance to upload her consciousness into the cloud.

Read more
Alibaba
Forget DeepSeek R1, apparently it's now Alibaba that has the most powerful, the cheapest, the most everything-est chatbot
PC building
ChatGPT vs DeepSeek: which AI can build me a better gaming PC?
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
New research says ChatGPT likely consumes '10 times less' energy than we initially thought, making it about the same as Google search
OpenAI representatives using a rotary phone to call ChatGPT via the 1-800-ChatGPT phone number
You can now WhatsApp message ChatGPT or call it on the phone, even from an old rotary blower. What a time it is to be alive
Symbolic photo: Logo of the video platform YouTube on June 07, 2023 in Berlin, Germany.
'It’s a whole new kind of blerp': YouTube's AI-enhanced reply suggestions seem to be working as well as you might expect
Microsoft Muse-generated gaming in action
'A massive, massive moment of wow.' Microsoft CEO predicts AI-generated games are a 'CGI moment' for the industry
Latest in AI
Aloy
'Creepy,' 'ghastly,' 'rancid': Viewers react to leaked video of Sony's AI-powered Aloy
Seattle, USA - Jul 24, 2022: The South Lake Union Google Headquarter entrance at sunset.
Google is rolling out an even more AI-heavy search engine mode because 'power users want AI responses for even more of their searches'
A digitally generated image of abstract AI chat speech bubbles overlaying a blue digital surface.
We need a better name for AI, or we risk talking past each other until actually intelligent AGI comes home mooing
MOUNTAIN VIEW, CALIFORNIA - AUGUST 22: A view of Google Headquarters in Mountain View, California, United States on August 22, 2024.
One educational company accuses Google's AI summary of leading to a 'hollowed-out information ecosystem of little use and unworthy of trust' in latest lawsuit
Nvidia Signs, its AI-led ASL teaching platform
Nvidia has built a free AI-led platform to help teach American Sign Language with '400,000 video clips representing 1,000 signed words' so far
Microsoft Muse-generated gaming in action
'A massive, massive moment of wow.' Microsoft CEO predicts AI-generated games are a 'CGI moment' for the industry
Latest in News
Marvel Rivals Human Torch
Marvel Rivals is carrying on the tradition of chaotic patches after buffing two of the most annoying heroes, but I main one of them, so I'm not complaining
 photo shows a factory tool that places lids on data center system-on-chips at an Intel fab in Chandler, Arizona, in December 2023. In February 2024, Intel Corporation launched Intel Foundry as the world’s first systems foundry for the AI era, delivering leadership in technology, resiliency and sustainability.
So, wait, now TSMC is supposedly pitching a joint venture with Nvidia, AMD and Broadcom to run Intel's ailing chip fabs?
Monster Hunter Wilds Artian weapon crafting - Gemma holding hot metal
Gemma's English VA is right with us on Monster Hunter Wild's confusing menus, which makes me feel a little better for having to Google symbols all the time
Sapphire Pulse Radeon RX 9070 XT on a red and orange background
Some Sapphire RX 9070/9070 XT graphics cards have hard-to-spot foam inside that must be removed or it 'may result in a decrease in cooling capacity or product failure'
Promotional image of the HP Envy Inspire inkjet printer
Haunted printers turning on by themselves and printing nonsense has to be one of my favorite Windows 11 bugs ever
The UHPILCL water cooled gaming laptop
This water-cooled gaming laptop packs a full-size desktop RTX 5090 and even fits in a backpack, but I sure wouldn't want it in mine