AI video startup Runway reportedly trained on ‘thousands’ of YouTube videos without permission


AI company Runway has reportedly removed “thousands” of unauthorized YouTube videos and pirated versions of copyrighted movies. 404 Media has been obtained Internal charts suggest that the AI ​​video startup is training a Gen-3 model using YouTube content from channels like Disney, Netflix, Pixar and popular media.

A former Runway employee told the publication that the company used a spreadsheet to record lists of videos it wanted in its database. It will then use open source proxy software to cover its tracks and download them undetected. One sheet lists simple keywords like astronaut, fairy, and rainbow, and notes indicate whether the company has found relevant high-quality videos to practice with. For example, the term “superhero” contains the entry “Multiple movie clips”. (Indeed.)

Other notes point to YouTube channels tagged Unreal Engine, filmmaker Josh Neuman, and Call of Duty fan page Runway as good sources for “high-action” training videos.

“The channels on this chart were a company-wide effort to find quality videos to model,” said a former employee 404 Media. “This was then used as an input to a huge web crawler that downloaded all the videos from all these channels using proxies to avoid being blocked by Google.”

Screenshot of Runway AI homepad. Screenshot of Runway AI homepad.

Runway

A list of nearly 4,000 YouTube channels compiled in one spreadsheet listed “recommended channels” from CBS New York, AMC Theaters, Pixar, Disney Plus, Disney CD, and the Monterey Bay Aquarium. (Because no AI model would be complete without an otter.)

In addition, Runway reportedly compiled a separate list of videos from pirated sites. The table titled “Non-YouTube Source” includes 14 links to sources such as unauthorized online archives. Studio Ghibli moviesanime and movie piracy sites, a fan site that showcases Xbox game videos, and kisscartoon.sh, an anime streaming site.

In what could be seen as a chilling confirmation of the company’s use of training data, 404 Media found that querying a video generator with the names of popular YouTubers listed in a spreadsheet produced results that produced uncanny similarities. Crucially, entering the same names into Runway’s old Gen-2 model — which was trained before the alleged data in the spreadsheets — produced “unrelated” results, such as generic men in suits. Additionally, after the publication contacted Runway to ask about the YouTubers’ similarities appearing in the results, the AI ​​tool stopped generating them altogether.

“I hope that by sharing this information, people will have a better understanding of the scale of these companies and what they do to make ‘cool’ videos,” said the former employee. 404 Media.

When contacted for comment, a YouTube representative pointed to Engadget interview its CEO Neal Mohan gave Bloomberg in April. In that interview, Mohan described training in his videos as a “flagrant violation” of his terms. “Our previous comments about this still stand,” YouTube spokesman Jack Mason wrote to Engadget.

Runway did not respond to commeInt’s request by the time of publication.

At least some AI companies seem to be racing to normalize their tools and establish market leadership before users and courts learn how their sausages are made. Exercising with permission through licensing deals is one thing, and it’s another tactic that companies like OpenAI was recently adopted. But here are eight more (if not illegal) bids to take over the entire Internet—copyrighted material and all—in a relentless race for profit and dominance.

404 Mediait’s great the report is worth reading.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *