Microsoft's latest speech generator is so good it's afraid to release it to the public

Microsoft VALL-E
(Image credit: Microsoft)

This thing we made is so brilliant, we can't risk releasing it to the general public. So Microsoft basically says about it's latest speech generator, VALL-E 2. So, does that reflect genuine concerns? Or is it a clever marketing ruse designed to get some viral traction and online chins wagging?

If it is all completely genuine, what does it say about Microsoft that it's knowingly creating AI tools too dangerous to release? It's a conundrum, to be sure.

Anyway, here are the basic facts of the situation. Microsoft says in a recent blog post (via Extremetech) that it's latest neural codec language model for speech synthesis, known as VALL-E 2, achieves "human parity for the first time".

More specifically, "VALL-E 2 can generate accurate, natural speech in the exact voice of the original speaker, comparable to human performance." Now, to some extent, this is nothing new. However, it's the incredible speed with which VALL-E 2 can achieve this, or to put it another way, the incredibly limited sample or prompt it needs to achieve this feat that's remarkable.

VALL-E 2 can accurately mimic a specific person's voice based on a sample just a few seconds long. It pulls that trick off by using a huge training library that maps variations in pronunciation, intonation, cadence in the model to the sample and spits out what appears to be totally convincing synthesised speech. 

Microsoft's blog post has a range of example audio clips demonstrating how well VALL-E 2 (and indeed its predecessor, VALL-E) can turn a short sample of between three and 10 seconds into convincing synthesised speech that's often indistinguishable from a real human voice.

It's a process known as zero-shot text-to-speech synthesis or zero-shot TTS for short. Again, the approach is nothing new, it's the accuracy and shortness of the sample audio that's novel.

VALL-E 2

Microsoft claims VALL-E 2 is the first speech generator to achieve "human parity". (Image credit: Microsoft)

Of course, the idea of weaponising such tools to create fake content for nefarious purposes is likewise not a new idea. But the VALL-E 2's capabilities do seem to take the threat to a whole new level. Which is why the "Ethics Statement" appended to the blog post makes it clear that Microsoft currently has no intention of releasing VALL-E 2 to the public.

"VALL-E 2 is purely a research project. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public," the statement says, adding that "it may carry potential risks in the misuse of the model, such as spoofing voice identification or impersonating a specific speaker. We conducted the experiments under the assumption that the user agrees to be the target speaker in speech synthesis. If the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model."

Microsoft expressed similar concerns regarding its VASA-1, which can turn a still image of a person into convincing motion video. "It is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it could still potentially be misused for impersonating humans," Microsoft said of VASA-1.

Your next upgrade

Nvidia RTX 4070 and RTX 3080 Founders Edition graphics cards

(Image credit: Future)

Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.

An obvious observation, perhaps, is that the problems that come with such models aren't exactly a surprise. You don't have to succeed in making the perfect speech synthesis model to imagine what might go wrong if such a tool was released to the public. 

So, it's easy enough to see the problem coming, but Microsoft pressed ahead anyway. Now it claims to have achieved its aims, only to decide it's not fit for public consumption.

It does rather beg the question of what other tools it is developing that it much know in advance are too problematic for general release. And then you inevitably wonder what Microsoft's aim is in all this.

There's also the inevitable genie-and-bottle conundrum. Microsoft has made this tool and it's hard to imagine how it or something very similar doesn't eventually end up out in the wild. In short, the ethics of it all are rather confusing. Where it all ends is still anyone's guess.

Jeremy Laird
Hardware writer

Jeremy has been writing about technology and PCs since the 90nm Netburst era (Google it!) and enjoys nothing more than a serious dissertation on the finer points of monitor input lag and overshoot followed by a forensic examination of advanced lithography. Or maybe he just likes machines that go “ping!” He also has a thing for tennis and cars.

Read more
An Ai face looks down on a human.
Xbox announces 'a generative AI model for gameplay ideation' called Muse, but don't get too excited: Machines aren't about to make games for you just yet
Microsoft Muse-generated gaming in action
'A massive, massive moment of wow.' Microsoft CEO predicts AI-generated games are a 'CGI moment' for the industry
Alibaba
Forget DeepSeek R1, apparently it's now Alibaba that has the most powerful, the cheapest, the most everything-est chatbot
Aloy
'Creepy,' 'ghastly,' 'rancid': Viewers react to leaked video of Sony's AI-powered Aloy
SUQIAN, CHINA - JANUARY 27, 2025 - An illustration photo shows the logo of DeepSeek and ChatGPT in Suqian, Jiangsu province, China, January 27, 2025. (Photo credit should read CFOTO/Future Publishing via Getty Images)
China's DeepSeek chatbot reportedly gets much more done with fewer GPUs but Nvidia still thinks it's 'excellent' news
The OpenAI logo is being displayed on a smartphone with an AI brain visible in the background, in this photo illustration taken in Brussels, Belgium, on January 2, 2024. (Photo illustration by Jonathan Raa/NurPhoto via Getty Images)
OpenAI is working on a new AI model Sam Altman says is ‘good at creative writing’ but to me it reads like a 15-year-old's journal
Latest in AI
Otter AI Meeting Agent
As if your work meetings weren't already fun enough, now Otter has a new all-hearing AI agent that remembers everything anyone has said and can join in the discussion
Image for
'No real human would go four links deep into a maze of AI-generated nonsense': Cloudflare's AI Labyrinth uses decoy pages to trap web-crawling bots and feed them slop 'as a defensive weapon'
CHINA - 2025/02/11: In this photo illustration, a Roblox logo is seen displayed on the screen of a smartphone. (Photo Illustration by Sheldon Cooper/SOPA Images/LightRocket via Getty Images)
'Humans still surpass machines': Roblox has been using a machine learning voice chat moderation system for a year, but in some cases you just can't beat real people
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
ChatGPT faces legal complaint after a user inputted their own name and found it accused them of made-up crimes
Public Eye trailer still - dead-eyed police officer sitting for an interview
I'm creeped out by this trailer for a generative AI game about people using an AI-powered app to solve violent crimes in the year 2028 that somehow isn't a cautionary tale
Closeup of the new Copilot key coming to Windows 11 PC keyboards
Microsoft co-authored paper suggests the regular use of gen-AI can leave users with a 'diminished skill for independent problem-solving' and at least one AI model seems to agree
Latest in News
The heroes are attacked by monsters
Pillars of Eternity is getting turn-based combat to mark its 10th anniversary, and that means PC Gamer editors will soon be arguing about combat mechanics again
Image of Ronaldo from Fatal Fury: City of the Wolves trailer
It doesn't really make sense that soccer star Ronaldo is now a Fatal Fury character, but if you follow the money you can see how it happened
Junah beginning a battle in Metaphor: ReFantazio.
Today's RPG fans are 'very sensitive to feeling like they wasted time' when they die, says Metaphor: ReFantazio battle planner—but Atlus still made combat hard anyway
Image of Cersei Lanniser from Game of Thrones: Kingsroad Steam early access trailer
A new Game of Thrones RPG is coming to Steam today with a cast of 'familiar faces,' which is good because it's really the only way to tell it's a GoT game at all
The new Prime Asset featured in the upcoming update for the Outlast Trials.
The Outlast Trials puts its already paranoid players under surveillance for a time-limited story event
A Viera looking confused in Final Fantasy 14.
Old armor continues to fall victim to Final Fantasy 14's bizarre two-channel dye system, unless you're super into changing the colour of teeny-tiny eyelets: 'Why even bother at this point?'