Look out, OpenAI's latest chatbot hallucinates less and might even count to three

OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
(Image credit: Jakub Porzycki/NurPhoto via Getty Images)

OpenAI has unleashed yet another new chatbot on we poor, unsuspecting humans. We give you o1, a chatbot designed for more advanced reasoning that's claimed to be better at things like coding, math and generally solving multistep problems.

Perhaps the most significant change from previous OpenAI LLMs is a shift from mimicking patterns found in text training data to a focus on more direct problem solving, courtesy of reinforcement learning. The net result is said to be a more consistent, accurate chatbot.

“We have noticed that this model hallucinates less,” OpenAI’s research lead, Jerry Tworek, told The Verge. Of course, "hallucinates less" doesn't mean no hallucinations at all. “We can’t say we solved hallucinations,” Tworek says. Ah.

Still, o1 is said to use something akin to a “chain of thought” that's similar to how we humans process problems, step-by-step. That contributes to much higher claimed performance in tasks like coding and math.

Apparently, o1 scored 83% in the qualifying exam for the International Mathematics Olympiad, far better than the rather feeble 13% notched up by GPT-4o. It has also performed well in coding competitions and OpenAI says an imminent further update will enable it to match PhD students, "in challenging benchmark tasks in physics, chemistry and biology.”

However, despite these advances, or perhaps because of them, this new bot is actually worse by some measures. It has fewer facts about the world at its finger tips and it can't browse the web or process images. It's also currently slower to respond and spit out answers, currently, than GPT-4o.

Of course, one immediate question that follows from all this is whether this new chatbot still suffers any of the surprising limitations of previous bots. Can o1, for instance, even count to three?

Apparently, yes, it can. GPT-4o can apparently be flummoxed when ordered to count the number of "r's" in the word "strawberry" only managing to count to two. But o1 gets all the way to three. 

That step-change in counting ability, however, doesn't come cheap. Developer access costs $15 per 1 million input tokens and $60 per 1 million output tokens. That's three times and four times, respectively, more expensive than GPT-4o.

ChatGPT Plus and Team users reportedly already have access to the initial version of the bot, known as o1-preview. Meanwhile, in future a version called o1-mini will be made available for free, though OpenAI hasn't put a date on that.

Your next machine

Gaming PC group shot

(Image credit: Future)

Best gaming PC: The top pre-built machines.
Best gaming laptop: Great devices for mobile gaming.

All told, it certainly sounds like a bot capable of more reliable responses—along with more practical reasoning—is a step towards both something both more useful in the real world and also closer to general or human-like intelligence.

That, indeed, is OpenAI's plan. “We have been spending many months working on reasoning because we think this is actually the critical breakthrough,” OpenAI’s chief research officer, Bob McGrew says. “Fundamentally, this is a new modality for models in order to be able to solve the really hard problems that it takes in order to progress towards human-like levels of intelligence.”

Anyway, if it really can count to three, colour me impressed. And as a routine precaution it goes without saying that I for one welcome, well, you know the rest.

Jeremy Laird
Hardware writer

Jeremy has been writing about technology and PCs since the 90nm Netburst era (Google it!) and enjoys nothing more than a serious dissertation on the finer points of monitor input lag and overshoot followed by a forensic examination of advanced lithography. Or maybe he just likes machines that go “ping!” He also has a thing for tennis and cars.

Read more
Alibaba
Forget DeepSeek R1, apparently it's now Alibaba that has the most powerful, the cheapest, the most everything-est chatbot
OpenAI representatives using a rotary phone to call ChatGPT via the 1-800-ChatGPT phone number
You can now WhatsApp message ChatGPT or call it on the phone, even from an old rotary blower. What a time it is to be alive
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
New research says ChatGPT likely consumes '10 times less' energy than we initially thought, making it about the same as Google search
SAN FRANCISCO, CALIFORNIA - NOVEMBER 06: OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 06, 2023 in San Francisco, California. Altman delivered the keynote address at the first-ever Open AI DevDay conference.(Photo by Justin Sullivan/Getty Images)
In a mere decade 'everyone on Earth will be capable of accomplishing more than the most impactful person can today' says OpenAI boss Sam Altman
PC building
ChatGPT vs DeepSeek: which AI can build me a better gaming PC?
The OpenAI logo is being displayed on a smartphone with an AI brain visible in the background, in this photo illustration taken in Brussels, Belgium, on January 2, 2024. (Photo illustration by Jonathan Raa/NurPhoto via Getty Images)
OpenAI is working on a new AI model Sam Altman says is ‘good at creative writing’ but to me it reads like a 15-year-old's journal
Latest in AI
BURBANK, CALIFORNIA - AUGUST 15: Protestors attend the SAG-AFTRA Video Game Strike Picket on August 15, 2024 in Burbank, California. (Photo by Lila Seeley/Getty Images)
8 months into their strike, videogame voice actors say the industry's latest proposal is 'filled with alarming loopholes that will leave our members vulnerable to AI abuse'
live action Jimbo the Jester from Balatro holding a playing card and addressing the camera
LocalThunk forbids AI-generated art on the Balatro subreddit: 'I think it does real harm to artists of all kinds'
Aloy
'Creepy,' 'ghastly,' 'rancid': Viewers react to leaked video of Sony's AI-powered Aloy
Seattle, USA - Jul 24, 2022: The South Lake Union Google Headquarter entrance at sunset.
Google is rolling out an even more AI-heavy search engine mode because 'power users want AI responses for even more of their searches'
A digitally generated image of abstract AI chat speech bubbles overlaying a blue digital surface.
We need a better name for AI, or we risk talking past each other until actually intelligent AGI comes home mooing
MOUNTAIN VIEW, CALIFORNIA - AUGUST 22: A view of Google Headquarters in Mountain View, California, United States on August 22, 2024.
One educational company accuses Google's AI summary of leading to a 'hollowed-out information ecosystem of little use and unworthy of trust' in latest lawsuit
Latest in News
spectre divide
Spectre Divide and its studio are shutting down after just six months: 'The industry is in a tough spot right now'
Naoe looking at the wrist blade in Assassin's Creed Shadows
Ubisoft backflips, says Assassin's Creed Shadows will support Steam Deck at launch, but I doubt I'll actually want to play it there
Henry from KCD2 wearing nice outfits
'Diversify your fashion endgame' with this Kingdom Come: Deliverance 2 mod that gives Henry fly new gambesons, pourpoints, and caftans
Masked Counter-Terrorist in helmet in forefront with sunglasses and beret-wearing CT in background touching headset
There's hope yet for Classic Offensive after its Steam rejection: The team behind the Counter-Strike 1.6 revival mod is in touch with Valve about its 'concerns'
Recently appointed Intel CEO Lip-Bu Tan.
Here comes Intel's new CEO: a semiconductor veteran that won the same prestigious award as Jensen Huang and Lisa Su
BURBANK, CALIFORNIA - AUGUST 15: Protestors attend the SAG-AFTRA Video Game Strike Picket on August 15, 2024 in Burbank, California. (Photo by Lila Seeley/Getty Images)
8 months into their strike, videogame voice actors say the industry's latest proposal is 'filled with alarming loopholes that will leave our members vulnerable to AI abuse'