In the most recent example, a disturbing industry exampleNVIDIA has removed copyrighted content for AI education. This was reported by Samantha Cole of 404 Media on Monday The $2.4 trillion company asked employees to download videos from YouTube, Netflix and other datasets to develop commercial AI projects. The graphics card maker is among the tech companies that seem to have adopted a “move fast and break everything” ethos as they race to dominate this fever. very often embarrassing AI gold rush.
The training was reportedly to develop models for products such as the Omniverse 3D world generator, self-driving car systems and “digital human” efforts.
NVIDIA defended its practice in an email to Engadget. A company spokesman said its investigation was “in full compliance with the letter and spirit of copyright law”, while also arguing that IP laws protect specific expressions “but not ideas, information or data”. The company equated this practice to a person’s right to “learn facts, ideas, data or information from another source and use it to form their own expression.” Human, computer… what’s the difference?
YouTube disagrees. Spokesman Jack Malon told us a Bloomberg story Since April, CEO Neal Mohan has been quoted as saying that using YouTube to train AI models would be a “flagrant violation” of its terms. “Our previous comment still stands,” YouTube’s policy communications manager wrote to Engadget.
A quote from Mohan in April was a response to these reports OpenAI taught Sora a text-to-video generator in YouTube videos without permission. A report last month showed that The startup Runway AI followed suit.
NVIDIA employees, who have expressed ethical and legal concerns about the practice, have been told by their managers that it has already been given the green light by the highest levels of the company. “It’s an executive decision,” responded Ming-Yu Liu, NVIDIA’s vice president of research. “We have umbrella approval for all data.” Others at the company have allegedly described his breach as an “open legal issue” that they will address down the road.
It all sounds the same as the old Facebook (Meta). “move fast and break everything” motto, has succeeded admirably in breaking many things. Including private lives of millions of people.
In addition to YouTube and Netflix videos, NVIDIA reportedly instructed employees to train on the MovieNet movie trailer database, internal libraries of video game footage, and Github video databases WebVid (now defunct after being discontinued) and InternVid-10M. The latter is a dataset of 10 million YouTube video identifiers.
Some of NVIDIA’s alleged training data is marked as appropriate for academic (or non-commercial) use only. The HD-VG-130M’s library of 130 million YouTube videos includes a usage license that states it is for academic research only. NVIDIA has reportedly insisted that their parties are fair game for commercial AI products, brushing aside concerns about purely academic terms.
To avoid detection from YouTube, NVIDIA reportedly downloads content using virtual machines (VMs) with rotating IP addresses to avoid bans. In response to an employee’s suggestion to use a third-party IP address rotator tool, another NVIDIA employee wrote, “We’re ready. [Amazon Web Services](#) and restart a [virtual machine]The pattern (#) gives the new public IP[.](#) So this is not a problem yet.”
404 MediaIt’s a full report on NVIDIA’s experiences worth reading.