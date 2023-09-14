In order to further develop its large language models, on which the AI ​​chatbot ChatGPT is based, the AI ​​company OpenAI – just like its competitors – needs more and better data. The data must be created by humans, because if artificial intelligences are trained with AI-generated data, they degenerate. Experts even speak of “digital mad cow disease”.

In order to get fresh, human-made content, OpenAI launched a web crawler a few weeks ago that scours the Internet for useful data sets, similar to what Google does for its search engine. But this confronts all those who put texts on the Internet with the problem that has been plaguing creative people whose work has been used without asking for the training of image generators for many months: Should you simply use the result of your own creativity for the development of commercial AI provide?

Technically it is possible to “lock out” crawlers. And media like the New York Times and the Süddeutsche Zeitung are already doing this. The article discusses the reasons for keeping things similar – and how AI companies could reward their data suppliers.

Share this: Twitter

Facebook

