AI chatbots trained to jailbreak other chatbots, as the AI war slowly but surely begins

Some code in purple and white whooshing away from the screen.
(Image credit: Negative Space)

While AI ethics continues to be the hot-button issue of the moment, and companies and world governments continue to wrangle with the moral implications of a technology that we often struggle to define let alone control, here comes some slightly disheartening news: AI chatbots are already being trained to jailbreak other chatbots, and they seem remarkably good at it.

Researchers from the Nanyang Technological University in Singapore have managed to compromise several popular chatbots (via Tom's Hardware), including ChatGPT, Google Bard and Microsoft Bing Chat, all done with the use of another LLM (large language model). Once effectively compromised, the jailbroken bots can then be used to "reply under a persona of being devoid of moral restraints." Crikey.

This process is referred to as "Masterkey" and in its most basic form boils down to a two-step method. First, a trained AI is used to outwit an existing chatbot and circumvent blacklisted keywords via a reverse-engineered database of prompts that have already been proven to hack chatbots successfully. Armed with this knowledge, the AI can then automatically generate further prompts that jailbreak other chatbots, in an ouroboros-like move that makes this writer's head hurt at the potential applications.

Ultimately this method can allow an attacker to use a compromised chatbot to generate unethical content and is claimed to be up to three times more effective at jailbreaking an LLM model than standard prompt, largely due to the AI attacker being able to quickly learn and adapt from its failures.

Thinking of upgrading?

Windows 11 Square logo

(Image credit: Microsoft)

Windows 11 review: What we think of the latest OS.
How to install Windows 11: Our guide to a secure install.
Windows 11 TPM requirement: Strict OS security.

Upon realisation of the effectiveness of this method the NTU researchers reported the issues to relevant chatbot service providers, although given the supposed ability of this technique to quickly adapt and circumvent new processes designed to defeat it, it remains unclear as to how easy it would be for said providers to prevent such an attack.

The full NTU research paper is due for presentation at the Network and Distributed System Security Symposium due to be held in San Diego in February 2024, although one would assume that some of the intimate details of the method may be somewhat obfuscated for security purposes.

Regardless, using AI to circumvent the moral and ethical restraints of another AI seems like a step in a somewhat terrifying direction. Beyond the ethical issues created by a chatbot producing abusive or violent content à la Microsoft's infamous "Tay", the fractal-like nature of setting LLMs against each other is enough to give pause for thought. 

While as a species we seem to be rushing headlong into an AI future we sometimes struggle to understand, the potential for the technology to be used against itself for malicious purposes seems an ever-growing threat, and it remains to be seen if service providers and LLM creators can react swiftly enough to head off these concerns before they cause serious issue or harm.

Andy Edser
Hardware Writer

Andy built his first gaming PC at the tender age of 12, when IDE cables were a thing and high resolution wasn't—and he hasn't stopped since. Now working as a hardware writer for PC Gamer, Andy's been jumping around the world attending product launches and trade shows, all the while reviewing every bit of PC hardware he can get his hands on. You name it, if it's interesting hardware he'll write words about it, with opinions and everything.

Read more
SUQIAN, CHINA - JANUARY 27, 2025 - An illustration photo shows the logo of DeepSeek and ChatGPT in Suqian, Jiangsu province, China, January 27, 2025. (Photo credit should read CFOTO/Future Publishing via Getty Images)
The brass balls on these guys: OpenAI complains that DeepSeek has been using its data, you know, the copyrighted data it's been scraping from everywhere
Ryan Gosling in Blade Runner: 2049, his face cut up and with a bandage over his nose, bathed in purple light with the blackground a blurry blue
Coder creates an 'infinite maze' to snare AI bots in an act of 'sheer unadulterated rage at how things are going' on the content-scraped web
Alibaba
Forget DeepSeek R1, apparently it's now Alibaba that has the most powerful, the cheapest, the most everything-est chatbot
Symbolic photo: Logo of the video platform YouTube on June 07, 2023 in Berlin, Germany.
'It’s a whole new kind of blerp': YouTube's AI-enhanced reply suggestions seem to be working as well as you might expect
One YouTuber has been poisoning AI tools that access her videos with .ass subtitle files and you can too
CHONGQING, CHINA - OCTOBER 30: In this photo illustration - The Facebook app page is displayed on a smartphone in the Apple App Store in front of the Meta Platforms, inc. logo on October 30, 2024 in Chongqing, China. (Photo by Cheng Xin/Getty Images)
Meta might've done something useful, pioneering an AI model that can interpret brain activity into sentences with 80% accuracy
Latest in AI
Seattle, USA - Jul 24, 2022: The South Lake Union Google Headquarter entrance at sunset.
Google is rolling out an even more AI-heavy search engine mode because 'power users want AI responses for even more of their searches'
A digitally generated image of abstract AI chat speech bubbles overlaying a blue digital surface.
We need a better name for AI, or we risk talking past each other until actually intelligent AGI comes home mooing
MOUNTAIN VIEW, CALIFORNIA - AUGUST 22: A view of Google Headquarters in Mountain View, California, United States on August 22, 2024.
One educational company accuses Google's AI summary of leading to a 'hollowed-out information ecosystem of little use and unworthy of trust' in latest lawsuit
Nvidia Signs, its AI-led ASL teaching platform
Nvidia has built a free AI-led platform to help teach American Sign Language with '400,000 video clips representing 1,000 signed words' so far
Microsoft Muse-generated gaming in action
'A massive, massive moment of wow.' Microsoft CEO predicts AI-generated games are a 'CGI moment' for the industry
A promotional image of the Humane AI wearable computer
The $699 'AI pin' that launched less than a year ago is going to stop working at the end of February
Latest in News
A catgirl with long white hair and ears
At least it's not NFTs this time: The new Wizardry RPG is a gacha game
Staring eyes in a face covered in oil
Death Stranding 2's PS5 release date is in June, let's hope it doesn't take 8 months to hit PC this time
An evil-looking demon with red eyes and horns
You can theoretically beat Doom: The Dark Ages without using a gun, but 'You'd have a hard time, that's for sure,' says the game's director
Official Doom Guy art superimposed over Vault 666 Fallout-themed background.
Fallout-themed Doom mod Vault 666 has multiple endings, an OP Dogmeat companion, and a Ron Perlman-impersonating narrator so good, I was worried it was AI-generated at first
The Doomslayer in armor
Doom: The Dark Ages won't end with the Slayer in a coffin waiting for the start of Doom 2016: 'That would mean that we couldn't tell any more medieval stories'
Path of Exile 2 showing the Warbringer ascendancy class bludgeoning his way through a pack of hyenas
Path of Exile 2 speedrunner dominates official race with the game's 'worst' class