This new AI can mimic human voices with only 3 seconds of training

Your foster parents are dead
(Image credit: Tri-Star Pictures)

Humanity has taken yet another step toward the inevitable war against the machines (which we will lose) with the creation of Vall-E, an AI developed by a team of researchers at Microsoft that can produce high quality human voice replications with only a few seconds of audio training.

Vall-E isn't the first AI-powered voice tool—xVASynth, for instance, has been kicking around for a couple years now—but it promises to exceed them all in terms of pure capability. In a paper available at Cornell University (via Windows Central), the Vall-E researchers say that most current text-to-speech systems are limited by their reliance on "high-quality clean data" in order to accurately synthesize high-quality speech.

"Large-scale data crawled from the Internet cannot meet the requirement, and always lead to performance degradation," the paper states. "Because the training data is relatively small, current TTS systems still suffer from poor generalization. Speaker similarity and speech naturalness decline dramatically for unseen speakers in the zero-shot scenario."

("Zero-shot scenario" in this case essentially means the ability of the AI to recreate voices without being specifically trained on them.)

Vall-E, on the other hand, is trained with a much larger and more diverse data set: 60,000 hours of English-language speech drawn from more than 7,000 unique speakers, all of it transcribed by speech recognition software. The data being fed to the AI contains "more noisy speech and inaccurate transcriptions" than that used by other text-to-speech systems, but researchers believe the sheer scale of the input, and its diversity, make it much more flexible, adaptable, and—this is the big one—natural than its predecessors.

"Experiment results show that Vall-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity," states the paper, which is filled with numbers, equations, diagrams, and other such complexities. "In addition, we find VALL-E could preserve the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis."

(Image credit: Vall-E)

You can actually hear Vall-E in action on Github, where the research team has shared a brief breakdown of how it all works, along with dozens of samples of inputs and outputs. The quality varies: Some of the voices are notably robotic, while others sound quite human. But as a sort of first-pass tech demo, it's impressive. Imagine where this technology will be in a year, or two or five, as systems improve and the voice training dataset expands even further.

Which is of course why it's a problem. Dall-E, the AI art generator, is facing pushback over privacy and ownership concerns, and the ChatGPT bot is convincing enough that it was recently banned by the New York City Department of Education. Vall-E has the potential to be even more worrying because of the possible use in scam marketing calls or to reinforce deepfake videos. That may sound a bit hand-wringy but as our executive editor Tyler Wilde said at the start of the year, this stuff isn't going away, and it's vital that we recognize the issues and regulate the creation and use of AI systems before potential problems turn into real (and real big) ones.

The Vall-E research team addressed those "broader impacts" in the conclusion of its paper. "Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker," the team wrote. "To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models."

In case you need further evidence that on-the-fly voice mimicry leads to bad places:

YouTube YouTube
Watch On
Andy Chalk
US News Lead

Andy has been gaming on PCs from the very beginning, starting as a youngster with text adventures and primitive action games on a cassette-based TRS80. From there he graduated to the glory days of Sierra Online adventures and Microprose sims, ran a local BBS, learned how to build PCs, and developed a longstanding love of RPGs, immersive sims, and shooters. He began writing videogame news in 2007 for The Escapist and somehow managed to avoid getting fired until 2014, when he joined the storied ranks of PC Gamer. He covers all aspects of the industry, from new game announcements and patch notes to legal disputes, Twitch beefs, esports, and Henry Cavill. Lots of Henry Cavill.

Read more
Aloy
'Creepy,' 'ghastly,' 'rancid': Viewers react to leaked video of Sony's AI-powered Aloy
An Ai face looks down on a human.
Xbox announces 'a generative AI model for gameplay ideation' called Muse, but don't get too excited: Machines aren't about to make games for you just yet
The OpenAI logo is being displayed on a smartphone with an AI brain visible in the background, in this photo illustration taken in Brussels, Belgium, on January 2, 2024. (Photo illustration by Jonathan Raa/NurPhoto via Getty Images)
OpenAI is working on a new AI model Sam Altman says is ‘good at creative writing’ but to me it reads like a 15-year-old's journal
Microsoft Muse-generated gaming in action
'A massive, massive moment of wow.' Microsoft CEO predicts AI-generated games are a 'CGI moment' for the industry
SAN FRANCISCO, CALIFORNIA - NOVEMBER 06: OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 06, 2023 in San Francisco, California. Altman delivered the keynote address at the first-ever Open AI DevDay conference.(Photo by Justin Sullivan/Getty Images)
In a mere decade 'everyone on Earth will be capable of accomplishing more than the most impactful person can today' says OpenAI boss Sam Altman
CHONGQING, CHINA - OCTOBER 30: In this photo illustration - The Facebook app page is displayed on a smartphone in the Apple App Store in front of the Meta Platforms, inc. logo on October 30, 2024 in Chongqing, China. (Photo by Cheng Xin/Getty Images)
Meta might've done something useful, pioneering an AI model that can interpret brain activity into sentences with 80% accuracy
Latest in AI
Otter AI Meeting Agent
As if your work meetings weren't already fun enough, now Otter has a new all-hearing AI agent that remembers everything anyone has said and can join in the discussion
Image for
'No real human would go four links deep into a maze of AI-generated nonsense': Cloudflare's AI Labyrinth uses decoy pages to trap web-crawling bots and feed them slop 'as a defensive weapon'
CHINA - 2025/02/11: In this photo illustration, a Roblox logo is seen displayed on the screen of a smartphone. (Photo Illustration by Sheldon Cooper/SOPA Images/LightRocket via Getty Images)
'Humans still surpass machines': Roblox has been using a machine learning voice chat moderation system for a year, but in some cases you just can't beat real people
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
ChatGPT faces legal complaint after a user inputted their own name and found it accused them of made-up crimes
Public Eye trailer still - dead-eyed police officer sitting for an interview
I'm creeped out by this trailer for a generative AI game about people using an AI-powered app to solve violent crimes in the year 2028 that somehow isn't a cautionary tale
Closeup of the new Copilot key coming to Windows 11 PC keyboards
Microsoft co-authored paper suggests the regular use of gen-AI can leave users with a 'diminished skill for independent problem-solving' and at least one AI model seems to agree
Latest in News
A gigantic terracotta sentinel made of living armor
Total War: Warhammer 3's army of Cathay has broken containment and is making its way to tabletop Warhammer at last
Two brightly colored stormtroopers dressed like Run-DMC stand in front of PAX Australia's WELCOME HOME banner.
Tickets for PAX Australia 2025 are on sale now
An Enshrouded player in a recreation of Erebor from The Lord of the Rings
Kings under the Mountain! 33 Enshrouded players spent 10,000 hours to recreate this iconic location from The Lord of the Rings
A mech awakens.
Mecha Break developer is considering unlocking all mechs following open beta feedback
Lara Croft Unified Art
Tomb Raider developer Crystal Dynamics lays off 17 employees 'to better align our current business needs and the studio's future success'
A long bendy arm stealing money from people in a subway car
'You're a very long arm. You steal things. It's a comedy game,' explains developer of comedy game where you steal things with a very long arm