A leaked document indicates Runway's Gen-3 AI video generation tool may have been trained on YouTube videos and copyrighted content without permission

Symbolic photo: Logo of the video platform YouTube on June 07, 2023 in Berlin, Germany.
(Image credit: Thomas Trutschel/Photothek via Getty Images)

Here's a question that can throw a generative AI company into a twist: "What content has been used to train your models?" While some opt to dodge the question, and others bullishly front out the issue entirely, the question of whether an AI company has scraped content for its own business purposes without permission is a thorny one. 

At best, you're likely to get a mealy-mouthed explanation of "curated datasets", and at worst, a polemic about whether everything on the internet is essentially fair game.

Now a document obtained by 404media appears to show that part of the data used to train Runway's latest AI video generation tool, Gen-3, may have come from the YouTube channels of thousands of popular media companies, including Pixar, Netflix, Disney and Sony.

While 404media doesn't go into details as to how the document was obtained, nor could it verify that every video mentioned within was used to train Gen-3, it's potentially an insight into the sort of practices that an AI company might use to scrape copyrighted material to train its models.

A former Runway employee spoke to 404media about the methodology involved. The 14 spreadsheets contained within the leaked document are said to feature terms like "beach" or "rain", with the names of Runway employees next to them. 

According to the source, these names were said to be employees tasked with finding videos or channels related to these keywords, who would then go on to use a YouTube video downloader tool via a proxy to scrape them from the site without being blocked by Google.

It's not just YouTube content that looks to have been scraped, either. A spreadsheet containing 14 links to non-YouTube sources, including a link to a website dedicated to streaming popular cartoons and animated movies, with thousands of copyright complaints logged against it. 

Essentially, pirated media looks to have been at least under consideration for training data, if not directly scraped and used.

AI, explained

OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.

(Image credit: Jakub Porzycki/NurPhoto via Getty Images)

What is artificial general intelligence?: We dive into the lingo of AI and what the terms actually mean.

404media actually went one step further, and attempted to use Gen-3 to generate video using prompts that contained keywords based on the terms found in the spreadsheet, and was able to create clips that looked to be very much in the same style as the associated content.

Runway was itself part-funded by Google, among others, so scraping content without permission from creators on its platforms, if true, is likely to land it in significant hot water. Never mind the potential wider legal repercussions.

Still, while the issue of AI content theft is a thorny one, the model does still appear to have issues. Ars Technica tried creating some videos recently with Gen-3 Alpha, and it gave a cat a pair of human hands. I'm not sure what content was used to train that particular version of the model, but I'd suggest that no matter the methodology used here, it could do with some work one way or the other.

Andy Edser
Hardware Writer

Andy built his first gaming PC at the tender age of 12, when IDE cables were a thing and high resolution wasn't—and he hasn't stopped since. Now working as a hardware writer for PC Gamer, Andy's been jumping around the world attending product launches and trade shows, all the while reviewing every bit of PC hardware he can get his hands on. You name it, if it's interesting hardware he'll write words about it, with opinions and everything.

Read more
One YouTuber has been poisoning AI tools that access her videos with .ass subtitle files and you can too
SUQIAN, CHINA - JANUARY 27, 2025 - An illustration photo shows the logo of DeepSeek and ChatGPT in Suqian, Jiangsu province, China, January 27, 2025. (Photo credit should read CFOTO/Future Publishing via Getty Images)
The brass balls on these guys: OpenAI complains that DeepSeek has been using its data, you know, the copyrighted data it's been scraping from everywhere
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
If you don't let us scrape copyrighted content, we will lose out to China says OpenAI as it tries to influence US government
Redhead woman using computer laptop at home stressed with hand on head, shocked with shame and surprise face, angry and frustrated. Fear and upset for mistake.
Court documents show not only did Meta torrent terabytes of pirated books to train AI models, employees wouldn't stop emailing each other about it: 'Torrenting from a corporate laptop doesn't feel right'
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
ChatGPT faces legal complaint after a user inputted their own name and found it accused them of made-up crimes
A image representing a typical YouTube tech video thumbnail using joke elements to demonstrate the use of an AI tool
Is time too precious to waste making gurning thumbnails for your YouTube videos? Huzzah for this AI tool that does it all for you, then
Latest in AI
Otter AI Meeting Agent
As if your work meetings weren't already fun enough, now Otter has a new all-hearing AI agent that remembers everything anyone has said and can join in the discussion
Image for
'No real human would go four links deep into a maze of AI-generated nonsense': Cloudflare's AI Labyrinth uses decoy pages to trap web-crawling bots and feed them slop 'as a defensive weapon'
CHINA - 2025/02/11: In this photo illustration, a Roblox logo is seen displayed on the screen of a smartphone. (Photo Illustration by Sheldon Cooper/SOPA Images/LightRocket via Getty Images)
'Humans still surpass machines': Roblox has been using a machine learning voice chat moderation system for a year, but in some cases you just can't beat real people
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
ChatGPT faces legal complaint after a user inputted their own name and found it accused them of made-up crimes
Public Eye trailer still - dead-eyed police officer sitting for an interview
I'm creeped out by this trailer for a generative AI game about people using an AI-powered app to solve violent crimes in the year 2028 that somehow isn't a cautionary tale
Closeup of the new Copilot key coming to Windows 11 PC keyboards
Microsoft co-authored paper suggests the regular use of gen-AI can leave users with a 'diminished skill for independent problem-solving' and at least one AI model seems to agree
Latest in News
Image of Ronaldo from Fatal Fury: City of the Wolves trailer
It doesn't really make sense that soccer star Ronaldo is now a Fatal Fury character, but if you follow the money you can see how it happened
Junah beginning a battle in Metaphor: ReFantazio.
Today's RPG fans are 'very sensitive to feeling like they wasted time' when they die, says Metaphor: ReFantazio battle planner—but Atlus still made combat hard anyway
Image of Cersei Lanniser from Game of Thrones: Kingsroad Steam early access trailer
A new Game of Thrones RPG is coming to Steam today with a cast of 'familiar faces,' which is good because it's really the only way to tell it's a GoT game at all
The new Prime Asset featured in the upcoming update for the Outlast Trials.
The Outlast Trials puts its already paranoid players under surveillance for a time-limited story event
A Viera looking confused in Final Fantasy 14.
Old armor continues to fall victim to Final Fantasy 14's bizarre two-channel dye system, unless you're super into changing the colour of teeny-tiny eyelets: 'Why even bother at this point?'
Starfield: Shattered Space
By the time Bethesda was on Starfield, you'd 'basically get in trouble' for breaking schedule, says former dev: 'A lot of the great stuff within Skyrim came from having the freedom to do what you want'