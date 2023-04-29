Many millions of users, enormous amounts of data: ChatGPT is a challenge to data protection and has so far been subject to little control. Although there is a first ban in Italy, most authorities are still waiting for answers. In an interview with MIT Technology Review, Marit Hansen, data protection officer for the state of Schleswig-Holstein, explains how the process is coordinated, how she herself assesses ChatGPT and whether the EU General Data Protection Regulation could stop ChatGPT in Europe.

Ms. Hansen, as the top data protection officer in your federal state of Schleswig-Holstein, you work with the Large Language Models, or LLMs for short. Do we even know what is currently in these models, where all the information comes from?

At the moment it is not disclosed – and in almost no model – what was fed. This also refers to how the models were further trained and edited manually by people and also what they were “trained out” to. We still have a lot of question marks. Of course, this means that we as data protection officers have to understand this before we can judge it – for example in relation to whether the sources of the training data were legitimate. Is there any legal basis to use this data at all? This is a central question from a data protection point of view and it must be answered quickly in any case.

Offers for the European market

In your opinion, should an LLM like ChatGPT even have been launched in Europe?

The EU data protection requirements apply to the European market, which means also for ChatGPT, which is offered in Europe – even if the manufacturer OpenAI does not have a branch here. If personal data is processed, OpenAI, like all other processors, must comply with the General Data Protection Regulation (GDPR). For example, take care of a legal basis. Fulfill the information requirements. Enable data subjects to exercise their rights of access, rectification and, under certain circumstances, erasure. Ensure the security of processing. Implement data protection by design and by default. If a high risk is to be expected, have carried out a data protection impact assessment. These are all criteria that I would say: whoever brings an offer onto the market in Europe has implemented it and the answers to our questions are already in the drawer.

How do you check that all of this is compliant with OpenAI?

OpenAI does not yet have a branch in Europe. Now the question arises: Who is responsible? Normally there is exactly one lead authority, which is determined by the location of the branch. In many cases, such as Facebook or Google, this is the data protection regulator in Ireland. This is not the case with OpenAI without a branch in the EU. Instead, all data protection supervisory authorities are equally responsible. In the case of ChatGPT, there were so many inquiries that we, as state commissioners for data protection, considered a timely examination to be important. And that’s why we are now doing this across Germany, coordinated by various state data protection officers and also coordinate in the European context.

In Italy, the data protection authority has already acted and even banned ChatGPT. How did the Italian colleagues justify that? And is that something that could also be interesting for Germany from your perspective?

The Italian colleagues have focused on certain points such as the legal bases, information obligations, data subject rights and the protection of children and found that answers that they would have expected from a data protection point of view are missing. We consider the assessment of the Italian colleagues to be understandable at the time. Now we want to ask more questions and get information, also because things are constantly changing. It is already clear that the situation with GPT-4 is already different today than when Italy intervened.

“Then the information must be on the table”

What deadlines have you set for OpenAI?

The deadline in the Schleswig-Holstein cover letter is June 7th. That would be exactly six weeks after the request was sent. In my opinion, that’s fair for examining an offer from the USA, fortunately there is e-mail, so not everything has to go by post. And OpenAI has already reacted immediately that the request has arrived. I assume that the deadline will be met. These are lengthy questions, spanning six pages. So we are expecting the answers at the beginning of June. Maybe we still have questions, maybe it will take a while, so maybe it will still be in the summer. But then the information must be on the table, which will then be evaluated. Our test is open-ended. Similar requests are on the way or in preparation from my colleagues in Hesse, Rhineland-Palatinate, Baden-Württemberg and other countries.

From a data protection point of view, there seem to be two sets of problems. Personal data may have been used when training the models. And when using the systems, OpenAI collects a lot of data. And no one knows what happens to it. What’s the bigger problem?

I can’t rate that at the moment. Training would theoretically also be possible without personal reference. In addition, however, personal data is newly collected through use. We receive many requests from data subjects who are concerned about using ChatGPT for forms of consultation. In such dialogues, things quickly become personal for many people. There are people who reveal themselves and type in many, perhaps even intimate, details. How is that evaluated? what happens to it Of course, as the data protection supervisory authority, we want to know that very precisely.