AI Video Startup Runway Reportedly Used YouTube Videos Without Permission
Runway, an AI video startup, allegedly trained its Gen-3 model on “thousands” of YouTube videos and pirated movies without obtaining permission. This information comes from 404 Media, which claims to have obtained internal spreadsheets from Runway detailing the sources of their training data.
Unauthorized Data Collection
According to the report, Runway targeted content from high-profile YouTube channels such as those owned by Disney, Netflix, Pixar, and various popular media outlets. A former Runway employee revealed that the company flagged videos it wanted and downloaded them using open-source proxy software to avoid detection.
Specifics of the Training Data
The spreadsheets contained keywords like “astronaut,” “fairy,” and “rainbow,” with annotations about the quality and type of videos found. For instance, the keyword “superhero” had a note stating, “Lots of movie clips.” Additionally, channels related to Unreal Engine, filmmaker Josh Neuman, and a Call of Duty fan page were noted for providing “high movement” training videos.
One spreadsheet listed nearly 4,000 YouTube channels, including CBS New York, AMC Theaters, Pixar, Disney Plus, and the Monterey Bay Aquarium. The company also compiled a list of videos from piracy sites, including unauthorized archives of Studio Ghibli films and other popular content.
Evidence of Unauthorized Use
404 Media tested Runway’s video generator by prompting it with the names of popular YouTubers listed in the spreadsheets. The results closely resembled the original videos, while older versions of the AI model produced unrelated outputs. Following inquiries from 404 Media, Runway’s AI tool ceased generating these specific results, suggesting a deliberate attempt to avoid detection.
Industry Response
A YouTube spokesperson reiterated the company’s stance that using its videos for AI training without permission is a “clear violation” of its terms of service. Runway did not respond to requests for comment before the publication of the report.
Legal and Ethical Implications
This incident highlights the ongoing issues surrounding AI training data and intellectual property rights. While some companies, like OpenAI, are moving towards licensed deals for training data, others appear to be exploiting publicly available content without permission. This practice raises legal and ethical questions as the race for AI development continues.