OpenAI And Google Reportedly Used Transcriptions Of YouTube Videos To Train Their AI Models

OpenAI and Google reportedly used transcriptions of YouTube videos to train their AI models

Would You like a feature Interview?

All Interviews are 100% FREE of Charge

OpenAI and Google say they are training AI models based on text transcribed from YouTube videos, potentially infringing creators’ copyrights. . The report describes how OpenAI, Google and Meta have gone to great lengths to maximize the amount of data they can feed their AI, citing a number of people familiar with the companies’ practices. I am quoting. This comes days after YouTube CEO Neil Mohan said this in an interview. OpenAI allegedly used YouTube videos to train its new text-to-video generator, Sora.

by new york timesOpenAI used the Whisper speech recognition tool to transcribe over 1 million hours of YouTube videos to train GPT-4. We previously reported that OpenAI was using YouTube videos and podcasts to train two AI systems. OpenAI president Greg Brockman was also reportedly a member of the team. According to Google’s rules, “unauthorized scraping or downloading of YouTube content” is not allowed, Google spokesman Matt Bryant said. new york timesthe company also said it was unaware of any such use by OpenAI.

However, the report alleges that there were people within Google who were aware of OpenAI but did not take action because Google was using YouTube videos to train its own AI models. Google reported new york times Only videos from creators who have agreed to participate in the experimental program will be eligible. Engadget has reached out to Google and OpenAI for comment.

of new york times The report also claims that Google adjusted its privacy policy in June 2022 to more broadly cover the use of public content such as Google Docs and Google Sheets to train its AI models and products. are doing.bryant said new york times This is only being done with the permission of users who have opted in to Google’s experimental features, and the company says it “has not begun training additional types of data based on this language change.”

Author

GC Journalist

As the in-house writer for GallantCEO.com I prefer to remain anonymous as I do not seek anything from my writing only the self gratification of writing for a good cause such as this.