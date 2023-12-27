Il 2023 it ends as it began, with a charge of copyright infringement against a generative AI. Compared to January, this time the target is bigger: if at the beginning of the year it was Getty to take Stable Diffusion to courtthis time it is the New York Times that even sues OpenAI (and therefore Microsoft) and its ChatGPT.

The famous American newspaper has explained that it intends to defend copyright e online (Who) he said that millions of his articles would have been used to train Sam Altman’s popular chatbot, which now (paradoxically) makes him competition as a reliable form of information.

The reasons for the lawsuit

According to the NYT, the two companies (OpenAI has now become a branch of Microsoft) they would have exploited its contents without permission to create their AI, including very well-known (and very profitable) products such as ChatGPT and Copilot. The cause, which could have significant repercussions on the world of information, too in light of the recent agreement between Apple and some publishers precisely to train his AI with the news, he followed months of commercial negotiations between the three companies, which however did not lead to any agreement. As the news spread, the New York Times headline was rose on the stock market by 0.25%while Microsoft’s lost 0.2%.

It’s not the first time this has happened (and it probably won’t be the last) because this training method, that is, reading millions and millions of pages online and making them your own, is the main one for more or less all AI. And here it is born the first problem, as we have often explained on Italian Tech: Who owns the original sources from which ChatGPT, Copilot and other products from OpenAI and Microsoft learned to do what they do? According to the editorial staff of the New York Times, they belong to the New York Times, which should be remunerated for this type of use. Or at least warned that all this is happening.

The other problem is more general and concerns the use made of this information (that contained in journalists’ articles, photos, images, drawings, works of art shown online): collected in huge databases, they are usually made available for free, as long as they are used not for profit (this is the concept of Fair Use and it is explained here). Which is definitely the opposite of what OpenAI and Microsoft are doing with their products.

The many dark sides of AI

In the text of the case (which can be read here)the New York Times points out that OpenAI and other Microsoft LLMs “can produce text that cites Times content word for word, faithfully summarizes it, and imitates its expressive style,” which is something that “undermines and damages” the Times’ relationship with readerstherefore depriving the newspaper of “subscriptions, licensing contracts, advertising and revenue”.

The American newspaper said it ready to ask for “billions of dollars in compensation” in damages already suffered and potential, demands that OpenAI destroy all LLMs created (also) with its articles and in general follows in the wake of other publications that are trying to close the gates to these prying eyes, such as BBC, CNN and Reuters. At the same time, others are moving in the opposite direction, so much so that Politico, Business Insider and Associated Press have established agreements with OpenAI to use its AI tools.

It should be remembered that that of the alleged (but very probable) copyright violations is just one of the many problems of artificial intelligence, which are forcefully emerging just over a year after the debut of ChatGPT on the market: there is a huge issue linked to discrimination, racism and poor representation of minoritiesthere are strong doubts about the safety of data entrusted to these AIs (including ChatGPT) and recently the fear has also emerged that Laion, which is precisely one of the databases on which artificial intelligences study, also contains thousands of child pornographic images.

