Leaked document suggests Runway's Gen-3 AI video-generating tool may have been trained on YouTube videos and copyrighted content without permission

Here’s a question that can embarrass a generative AI company: “What content was used to train your models?” While some avoid answering this question and others stubbornly back away from it, the issue of whether an AI company scraped content without permission for its own business purposes is a thorny one.

At best, you’ll probably get a vague explanation. “selected data sets”and at worst, a polemic about whether everything on the internet is fundamentally fair game.

Now document obtained by 404media It appears that some of the data used to train Runway’s latest AI video-generation tool, Gen-3, may have come from the YouTube channels of thousands of popular media companies, including Pixar, Netflix, Disney, and Sony.

While 404media doesn’t provide details on how the document was obtained, nor can it verify that every video referenced in it was used to train Gen-3, it could be a potential insight into the types of practices an AI company could apply to acquire copyrighted material to train its models.

A former Runway employee spoke to 404media about the methodology used. Allegedly, 14 spreadsheets included in the leaked document contain terms like “beach” or “rain” with the names of Runway employees next to them.

According to the source, the names of the individuals were allegedly employees whose job was to find videos or channels related to those keywords. They would then apply a YouTube video downloader through a proxy server to download videos from the site without being blocked by Google.

It appears that it wasn’t just YouTube content that was scraped. A spreadsheet containing 14 links to non-YouTube sources, including a link to a website dedicated to streaming popular cartoons and animated movies, with thousands of copyright complaints.

In general, pirated media seems to be at least considered in the creation of training data, if not directly collected and used.

404media took it a step further and attempted to apply Gen-3 to generate videos using keyword prompts based on terms found in a spreadsheet, ultimately creating clips that looked very similar to the content associated with them.

Runway itself was partially funded by Googleamong other things, so scraping content without the permission of creators on his platforms, if true, will likely land him in earnest trouble. Not to mention the potential for wider legal repercussions.

Still, while the issue of AI content theft is a thorny one, the model still seems to have problems. Ars Technica recently tried making videos with Gen-3 Alpha, and gave the cat a pair of human hands. I’m not sure what content was used to train this particular version of the model, but I would suggest that regardless of the methodology used, it would be worth working on anyway.

Leaked document suggests Runway’s Gen-3 AI video-generating tool may have been trained on YouTube videos and copyrighted content without permission

Related articles

“We’ve Only Just Begun. I Have Nothing” – A retired pro gamer is gaining popularity for mercilessly killing players like Casual Arc Raiders –...

All Flintbeetle locations in Hollow Knight Silksong

App Army Assemble: Spilled – “Did our readers enjoy this charming pixel art adventure?”

Game of the Year: Best Open World Game on PS5 in 2025

Intel’s “Serpent Lake” is said to be the first chip developed in partnership with Nvidia

Acer Predator Orion 7000 review

The first five vehicles in Carmageddon: Rogue Shift are described in detail

FNAF and codes from December 2025