OpenAI and Google reportedly used transcriptions of YouTube videos to train their AI models

OpenAI and Google have trained their AI models on text transcribed from YouTube videos, which may infringe on creators’ copyrights. . The report, which describes the lengths to which OpenAI, Google and Meta are going to increase the amount of data they can feed into their AI, cites multiple people with knowledge of the companies’ practices. This comes days after an interview with YouTube CEO Neal Mohan OpenAI is said to be using YouTube videos to train its new text-to-video generator, Sora. .

according to NYTOpenAI used its Whisper speech recognition tool to transcribe over a million hours of YouTube videos, which were then used to train GPT-4. it was previously reported that OpenAI used YouTube videos and podcasts to train two AI systems. OpenAI president Greg Brockman is reportedly among the people on that team. According to Google’s rules, “unauthorized ripping or downloading of YouTube content” is not allowed, Google spokesman Matt Bryant said. NYTalso said the company was unaware of any such use by OpenAI.

However, the report claims that there are people at Google who don’t take action against OpenAI but don’t know because Google uses YouTube videos to train its AI models. Google said NYT it does so only with videos from creators who have agreed to participate in the experimental program. Engadget has reached out to Google and OpenAI for comment.

The NYT The report also claims that Google changed its privacy policy in June 2022 to more broadly cover the use of publicly available content, including Google Docs and Google Sheets, to train its AI models and products. Bryant said NYT that this is only done with the permission of users opting in to Google’s experimental features, and that the company “has not begun training for additional data types based on this language change.”

Source link

Related Posts

Leave a Reply Cancel reply