Meta could do it and did it: its new artificial intelligence is trained using data from posts and images on Facebook and Instagram. While, in Italy and beyond, the “I do not authorize” and the copied and pasted chain letters without understanding what it is about were multiplying, in Menlo Park they were working to make their Large Language Model more intelligent and more extensive Llama2, which is the engine behind some of the innovations presented at the latest Meta Connect, last Wednesday.
The study Artificial intelligence and politics: ChatGPT is left-wing, Google is right-wing by Bruno Ruffilli 14 August 2023
Audio, video, text
The most significant is perhaps the Meta AI chatbot, among the first artificial intelligence tools aimed at consumers, currently available in beta version only for the English-language public. It was born from the union of Llama2 and Emu, an AI model specialized in image generation, and is, in the company’s words “an advanced conversational assistant available on WhatsApp, Messenger and Instagram and coming to Ray-Ban Meta and Quest 3”. Starting from a prompt, Meta AI is able to generate photorealistic images in just a few seconds to share with friends, but also audio and obviously text. It has access to real-time web information through a partnership with Microsoft’s Bing search engine.
IT Album/Meta Interview: “Artificial intelligence to build other worlds” by Bruno Ruffilli 22 March 2023
No private posts
But where does Meta AI get its data from? From what is on two of the most popular social networks in the world, Facebook, first in the ranking with over three billion users, and Instagram, fourth, but actually second because in the middle there are YouTube and WhatsApp, which strictly speaking is a chat platform. However, Meta’s president of Global Affairs, Nick Clegg, was keen to underline to Reuters that private posts shared only with family and friends were not used to train Llama 2. In a commendable effort at transparency, he also added that private chats on WhatsApp and Messenger were not used, and that the company took additional steps to filter private details from the public datasets used during the training process. “We tried to exclude datasets that have a strong preponderance of personal information,” he said, explaining that the “vast majority” of the data used for the training was publicly available. He cited LinkedIn as an example of a website whose content Meta deliberately chose not to use due to privacy concerns.
Nick Cave: ChatGPT will never write a great song, and I’ll tell you why by Bruno Ruffilli January 17, 2023
Forehand and backhand
For copyright, then, the issue is even more complex. Clegg acknowledged that using copyrighted material to train AI may fall outside of what is called “fair use,” which allows limited use of protected works for purposes such as comment, research and parody. He said: “We think it is, but I strongly suspect there will be several lawsuits.”
The issue is of fundamental importance for the development of artificial intelligence, and in recent times it seems increasingly debated, with newspapers such as the New York Times banning the use of their content, and specific agreements between big names, such as that of six years ago between OpenAI and Shutterstock for images or the more recent one between Google and Universal for music. At the last Italian Tech Week, the CEO of OpenAI, Sam Altman, when asked on the topic, dismissed it somewhat hastily: “We want to create models where the people who participate are remunerated, where there is an advantage for everyone ”. But it will hardly be enough to publish a post on Facebook for this to really happen. Today the social network’s terms of service are very clear: “When you share, publish or upload content covered by intellectual property rights on or in connection with our products, you grant us a non-exclusive, transferable, sub-licensable, royalty-free license and worldwide to host, use, distribute, modify, perform, copy or publicly display, translate, and create derivative works of your content.” Creating derivative works from your content: that’s exactly what Meta AI does.