OpenAI says it's 'impossible' to create ChatGPT without copyrighted content, as if that's somehow a good excuse

The OpenAI logo is being displayed on a smartphone with an AI brain visible in the background, in this photo illustration taken in Brussels, Belgium, on January 2, 2024. (Photo illustration by Jonathan Raa/NurPhoto via Getty Images)
(Image credit: Getty Images)

Just a couple weeks after being sued by the New York Times over allegations that it copied and used "millions" of copyrighted news articles to train its large-language models, OpenAI has told the UK's House of Lords communications and digital select committee (via The Guardian) that it has to use copyrighted materials to build its systems because otherwise, they just won't work.

Large-language models—LLMs—that form the basis of AI systems like OpenAI's ChatGPT chatbot harvest massive amounts of data from online sources in order to "learn" how to function. That becomes a problem when questions of copyright come into play. The Times' lawsuit, for instance, says Microsoft and OpenAI "seek to free-ride on The Times' massive investment in its journalism by using it to build substitutive products without permission or payment." 

It's not the only one taking issue with that approach: A group of 17 authors including John Grisham and George RR Martin filed suit against OpenAI in 2023, accusing it of "systematic theft on a mass scale."

In its presentation to the House of Lords, OpenAI doesn't deny the use of copyrighted materials, but instead says it's all fair use—and anyway, it simply has no choice. "Because copyright today covers virtually every sort of human expression—including blog posts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today's leading AI models without using copyrighted materials," it wrote.

"Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today's citizens."

I don't find it a particularly compelling argument. If I, for instance, got busted knocking over a bank, I don't think it would carry much weight with the cops if I told them that it was the only way to provide myself with the money that meets the needs of me. That is admittedly a bit simplistic, and it's possible that OpenAI's lawyers will be able to successfully argue that using copyrighted materials without permission to train its LLMs falls within the confines of fair use. But to my ear the justification for using copyrighted works without a green-light from the original creator ultimately boils down to, "But we really, really wanted to."

Fair use is central to OpenAI's position that the use of copyrighted materials doesn't actually break any rules. It said in its filing with the House of Lords that "OpenAI complies with the requirements of all applicable laws, including copyright laws," and went deeper on that point in an update released today.

"Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents," OpenAI wrote. "We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.

"The principle that training AI models is permitted as a fair use is supported by a wide range of academics, library associations, civil society groups, startups, leading US companies, creators, authors, and others that recently submitted comments to the US Copyright Office. Other regions and countries, including the European Union, Japan, Singapore, and Israel also have laws that permit training models on copyrighted content—an advantage for AI innovation, advancement, and investment."

OpenAI also drew a hard line against the New York Times' lawsuit in the update, essentially accusing the Times of ambushing it in the midst of partnership negotiations. Perhaps taking a lesson from Twitter, which accused Media Matters of manipulating "inorganic combinations of advertisements and content" in order to make pro-Nazi ads appear next to posts by major advertisers, OpenAI also said the Times "manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate" its content and style, a central element of complaints against AI.

"Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts," OpenAI wrote.

OpenAI said in its House of Lords filing that it is "continuing to develop additional mechanisms to empower rightsholders to opt out of training," and is pursuing deals with various agencies like the one it signed with Associated Press in 2023 that it hopes will "yield additional partnerships soon." But to me that lands like a "forgiveness instead of permission" approach: OpenAI is already scraping this stuff anyway, so agencies and outlets might as well sign some kind of deal before a court rules that AI companies can do whatever they want.

Andy Chalk
US News Lead

Andy has been gaming on PCs from the very beginning, starting as a youngster with text adventures and primitive action games on a cassette-based TRS80. From there he graduated to the glory days of Sierra Online adventures and Microprose sims, ran a local BBS, learned how to build PCs, and developed a longstanding love of RPGs, immersive sims, and shooters. He began writing videogame news in 2007 for The Escapist and somehow managed to avoid getting fired until 2014, when he joined the storied ranks of PC Gamer. He covers all aspects of the industry, from new game announcements and patch notes to legal disputes, Twitch beefs, esports, and Henry Cavill. Lots of Henry Cavill.

Read more
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
If you don't let us scrape copyrighted content, we will lose out to China says OpenAI as it tries to influence US government
SUQIAN, CHINA - JANUARY 27, 2025 - An illustration photo shows the logo of DeepSeek and ChatGPT in Suqian, Jiangsu province, China, January 27, 2025. (Photo credit should read CFOTO/Future Publishing via Getty Images)
The brass balls on these guys: OpenAI complains that DeepSeek has been using its data, you know, the copyrighted data it's been scraping from everywhere
Redhead woman using computer laptop at home stressed with hand on head, shocked with shame and surprise face, angry and frustrated. Fear and upset for mistake.
Court documents show not only did Meta torrent terabytes of pirated books to train AI models, employees wouldn't stop emailing each other about it: 'Torrenting from a corporate laptop doesn't feel right'
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
ChatGPT faces legal complaint after a user inputted their own name and found it accused them of made-up crimes
MOUNTAIN VIEW, CALIFORNIA - AUGUST 22: A view of Google Headquarters in Mountain View, California, United States on August 22, 2024.
One educational company accuses Google's AI summary of leading to a 'hollowed-out information ecosystem of little use and unworthy of trust' in latest lawsuit
The OpenAI logo is being displayed on a smartphone with an AI brain visible in the background, in this photo illustration taken in Brussels, Belgium, on January 2, 2024. (Photo illustration by Jonathan Raa/NurPhoto via Getty Images)
OpenAI is working on a new AI model Sam Altman says is ‘good at creative writing’ but to me it reads like a 15-year-old's journal
Latest in AI
Otter AI Meeting Agent
As if your work meetings weren't already fun enough, now Otter has a new all-hearing AI agent that remembers everything anyone has said and can join in the discussion
Image for
'No real human would go four links deep into a maze of AI-generated nonsense': Cloudflare's AI Labyrinth uses decoy pages to trap web-crawling bots and feed them slop 'as a defensive weapon'
CHINA - 2025/02/11: In this photo illustration, a Roblox logo is seen displayed on the screen of a smartphone. (Photo Illustration by Sheldon Cooper/SOPA Images/LightRocket via Getty Images)
'Humans still surpass machines': Roblox has been using a machine learning voice chat moderation system for a year, but in some cases you just can't beat real people
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
ChatGPT faces legal complaint after a user inputted their own name and found it accused them of made-up crimes
Public Eye trailer still - dead-eyed police officer sitting for an interview
I'm creeped out by this trailer for a generative AI game about people using an AI-powered app to solve violent crimes in the year 2028 that somehow isn't a cautionary tale
Closeup of the new Copilot key coming to Windows 11 PC keyboards
Microsoft co-authored paper suggests the regular use of gen-AI can leave users with a 'diminished skill for independent problem-solving' and at least one AI model seems to agree
Latest in News
Image of Ronaldo from Fatal Fury: City of the Wolves trailer
It doesn't really make sense that soccer star Ronaldo is now a Fatal Fury character, but if you follow the money you can see how it happened
Junah beginning a battle in Metaphor: ReFantazio.
Today's RPG fans are 'very sensitive to feeling like they wasted time' when they die, says Metaphor: ReFantazio battle planner—but Atlus still made combat hard anyway
Image of Cersei Lanniser from Game of Thrones: Kingsroad Steam early access trailer
A new Game of Thrones RPG is coming to Steam today with a cast of 'familiar faces,' which is good because it's really the only way to tell it's a GoT game at all
The new Prime Asset featured in the upcoming update for the Outlast Trials.
The Outlast Trials puts its already paranoid players under surveillance for a time-limited story event
A Viera looking confused in Final Fantasy 14.
Old armor continues to fall victim to Final Fantasy 14's bizarre two-channel dye system, unless you're super into changing the colour of teeny-tiny eyelets: 'Why even bother at this point?'
Starfield: Shattered Space
By the time Bethesda was on Starfield, you'd 'basically get in trouble' for breaking schedule, says former dev: 'A lot of the great stuff within Skyrim came from having the freedom to do what you want'