The more estimates grow on the economic potential of generative artificial intelligence (which is expected to create a 1,300 billion dollar market by 2030), the more the companies producing ChatGPT, Stable Diffusion and other systems take refuge in secrecy. At first glance, not disclosing detailed information on the dimensions of the model, the hardware used, the training techniques or the data used is an understandable choice: because the giants who have spent, in total, hundreds of millions of dollars to create tools incredibly powerful and sophisticated should they make their specific characteristics known to everyone – including competitors – perhaps with the risk of having rivals steal what they believe to be their secret recipe?
Meta admits: our artificial intelligence is trained with Facebook and Instagram posts by Bruno Ruffilli 02 October 2023
If, from an economic point of view, an evaluation of this type makes sense, from a social point of view the question we should ask ourselves is very different: can we afford to know so little about tools that are gaining an increasingly important role in society and to whom delicate and crucially important tasks are contracted out, in the educational, healthcare, work, surveillance and beyond fields? Simply put, can we trust tools that are as powerful as they are opaque?
According to a group of Stanford researchers specializing in artificial intelligence, the answer is negative. And it is precisely to evaluate the degree of transparency of the main companies in the sector that they created the Foundation Model Transparency Index (“foundation model” is a term that often refers to generative AI): an index through which ten among the main new generation algorithms. These include Google’s Palm 2 (which powers Bard conversational AI), OpenAI’s GPT-4 (powering the premium version of ChatGPT), Meta’s Llama 2, Stability AI’s Stable Diffusion 2, Amazon’s Titan Text, and still others.
To evaluate the transparency of these systems, the researchers relied on one hundred different parameters, analyzing the data disclosed for each model regarding data used for training (and how it was collected), computational power required (and therefore energy consumption ), potential uses of the system, hardware used, risk mitigation, what conditions human moderators who manually optimize the results work in, and more.
I’m another: who he is and what he thinks Sam Altman, the father of ChatGPT by Bruno Ruffilli 26 September 2023
The ranking
However, there are large differences even if the individual parameters are taken into consideration: Llama 2 by Meta, for example, obtains full marks regarding the accessibility of its system, as do Stable Diffusion 2 and Bloomz by BigScience. Bloomz is also the only system that achieves a passing grade in a crucial sector such as data collection, while four out of 10 models obtain a merciless zero in this field. The worst results on average are obtained in three very specific sectors: the working conditions of human moderators, the computational power required by the systems and their impact, calculated through the number of users of the algorithm or the areas in which it is used.
Overall, the main differences are recorded between open source systems such as Llama 2, Bloomz and Stable Diffusion 2 (which occupy three of the first four positions) and all the other closed systems, which by not allowing direct access to the model are inevitably still more opaque.
But why is it important to have detailed information on how these algorithms work? An important example is that of sustainability: training and using generative artificial intelligence consumes a lot of energy, to the point that it is estimated that by 2030 these models could be responsible for 3.5% of global emissions. Knowing the amount of computational power required by the system, the hardware used and perhaps even the energy sources with which it is powered could help users orient themselves towards more sustainable artificial intelligences (an aspect on which a previous, and similar, one had focused , Stanford study).
ITW23 Mario Rasetti: “Artificial intelligence is a cultural phenomenon like the Gutenberg press” by Bruno Ruffilli 02 October 2023
Discriminations
Then there is the very delicate issue of the so-called algorithmic discriminations: generative AI – due to the insufficiently inclusive data used for training – are often subject to various types of prejudices. An important model like OpenAI’s Dall-E 3, for example, recently depicted a white doctor treating black children, despite the prompt (i.e. the user’s command) being exactly the opposite. ChatGPT, on the other hand, has often fallen into the most hackneyed gender stereotypes (the nurse is always a woman, the lawyer is always a man, etc., etc.). Having access to data, knowing the risks foreseen by manufacturing companies and the actions they take to mitigate them would allow society to better understand why and how these problems occur.
Finally, the question of potential uses should not be underestimated. Greater transparency would allow us to know if ChatGPT is potentially used in medical, educational or if it is used to evaluate our professional performance.
Artificial intelligences have enormous potential, but they raise just as many doubts and fears. Considering the increasingly important role they will have in our future, we cannot be satisfied with the narrative and evaluations – necessarily biased – of the same companies that produce these systems. To autonomously and independently evaluate the overall effect of these systems, it is first necessary to be able to open their black box.
Preview Artificial intelligence, sustainability, health: technology trends at CES 2024 by our correspondent Bruno Ruffilli 20 October 2023