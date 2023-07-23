Contents

Chatting with friends online, a treat for artificial intelligence. Tech companies are fighting for our data.

In the film «Matrix» people are the batteries for the machines. In our reality, far away from science fiction films, people are not eaten by machines. Nevertheless, artificial intelligence (AI) is fed with something human even today.

When we chat, talk about music, art or our hobbies, this generates data. AI is trained with these. Human communication, our online conversations, have intrinsic value.

AI is polluting the internet and making itself stupider

This man-made data is becoming more and more valuable. The reason: AI floods the Internet with empty phrases and harms itself as a result. The data material for training AI is becoming increasingly poor in quality. And at the same time, the AI ​​race is heating up: Google, Microsoft, Meta and ChatGPT inventor OpenAI – they all want to be number one on the market.

The proportion of AI-generated content is currently growing rapidly on internet forums and social media. This inevitably leads to AI being fed more and more data that has already been created by AI.

“Just as we polluted the oceans with plastic waste and the atmosphere with CO₂, we are now about to fill the internet with blah.” According to Study by the Universities of Oxford and Cambridge AI cannot learn from itself because it becomes forgetful and at some point only produces empty phrases.

Thilo Stadelmann, Professor of Artificial Intelligence at the Zurich University of Applied Sciences, explains: “If AI were to feed itself with AI-generated data in its own loop, it would not learn anything and would simply get stuck in its behavior. The behavior, the ability, that AI can only get from humans.»

Deep conversations more valuable than chats

That’s why the big tech companies want to prevent them from feeding their AI with data that has already been generated by AI. So they invest millions in technology to find out if texts were really written by humans and not by chatbots.

Legend: Tech companies need man-made data as raw material to train AI. These are taken from online conversations, for example. IMAGO/NurPhoto/Jaap Arriens

The fact that the big tech companies collect a lot of data is not new. According to Martina Arioli, tech expert and self-employed lawyer, a new dimension is now being added: “Until now, it was all about personalized advertising: the user data was made available to third parties. Human exchange is now a value per se.»

According to AI expert Thilo Stadelmann, however, this human data is not always of high quality. People could spread false information or nonsense on the internet. Forums in which experts talk about music or art are more valuable for an AI than forums in which everyone just chats about everything.

Can AI be fed with our data?

Data from real people is therefore indispensable for tech companies. But who actually owns this data? According to Stadelmann, we find an answer in the small print, the terms and conditions of the tech companies. For example, the small print of the free version of ChatGPT states that the company OpenAI can do whatever it wants with the data.

AI training and data protection

Open box Close box

Lawyer Martina Arioli is familiar with data protection and technology law: “Tech companies must clearly inform users if they use their data to train AI. This was recently not the case with Google, for example. Google adjusted its data protection declaration on July 1st and also explained there that all data that users have put online can be used as training data for all of its AI models. »

Data protection – but upside down?

Conversely, the big tech companies don’t want other companies to access their data to train AI. For example, the short message service Twitter and the forum Reddit have restricted access. Thousands of AI companies had automatically read the data from these online platforms.

The arms race in the field of AI is therefore likely to continue. Real human communication is becoming ever scarcer and more valuable. So data from real people is becoming the new gold.

