Home » a million hours of YouTube videos to train GP…

a million hours of YouTube videos to train GP…

by admin
a million hours of YouTube videos to train GP…

In the world of artificial intelligence, the shortage of high-quality training data represents a growing challenge. OpenAI would take a controversial approach to overcome this obstacle: transcribe and use over a million hours of YouTube videos to train its language model GPT4. With many experts wondering if this does not violate the copyright of whoever wrote and shot those videos.

OpenAI: One million hours of YouTube videos to train GPT-4

At the desperate for training dataOpenAI would develop the audio transcription model Whisper to transcribe a vast amount of YouTube videos. According to the New York Times (via The Verge), OpenAI was aware of the dubious legality of this practice, but considered it a “correct use”. The New York newspaper, which is suing the company for violating the newspaper’s copyright, reports that the president Greg Brockman he would have been personally involved in the collection of the videos used.

The spokeswoman Lindsay Held stated that OpenAI curates datasets”unique” for each model, using “numerous sources including publicly available data and non-public data partnerships“. But the article reveals that OpenAI had run out of supplies of useful data in 2021prompting her to evaluate YouTube video, podcast, and audiobook transcription after reviewing other resources such as Github code and Quizlet content.

Google commented saying that it has “seen unconfirmed reports” about GPT-4 training on YouTube. However, he explains that, to train the own AI Gemini model, would collect transcripts from YouTube in accordance with creator agreements, while prohibiting it “scraping or unauthorized downloading of content”.

Between AI and copyright

OpenAI’s choice to transcribe and use YouTube videos to train GPT-4 raises legal and ethical questions. Although the company considers this to be “fair use,” this practice may represent a violation of YouTube’s copyright and usage policies.

See also  If copyright becomes a deterrent to protesters

As AI companies look for solutions to address the shortage of training data (in addition to OpenAI and Google, there is also Meta and more), it remains interesting to understand how to manage intellectual property rights and privacy. A topic that, we are sure, will be talked about again.

Stay updated by following us on Google News!

Don’t miss this week on Techbusiness

💡 Fastweb enters the energy market: Fastweb Energia electricity offers
🤖 Apple wants to bring robots into our homes
🎸 What is the Elvis Act, which wants to protect artists from AI
📺 The success of free streaming TV channels: Interview with Marcos Milanez from Rakuten TV
✒️ Our unmissable Caffellattech newsletter! Sign up here
🎧 But did you know that Fjona also has her own newsletter?! Sign up to SuggeriPODCAST!
📺 You can also find Fjona on RAI Play con Touch – Fingerprint!
💌 Let’s solve your heart problems with B1NARY
🎧 Listen to our unmissable podcast Tech life
💸And you can find some interesting offers on Telegram!

Source

The New York Times

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy