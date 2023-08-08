Imagine the following, and now well-known, scenario: a particularly advanced artificial intelligence is given the command to maximize the production of paper clips. Interpreting the objective given to it literally, this artificial intelligence consumes all the resources of planet Earth in order to produce as many paper clips as possible, also unintentionally causing the extinction of the human being.

In summary, this is the famous paradox imagined by the philosopher Nick Bostrom in his essay “Superintelligence” (published in Italy by Bompiani): a provocative thought experiment that aims to show how the risks associated with the development of an artificial superintelligence do not necessarily depend from its possible “evil” or from the desire to rebel against the human being, but also from the ambiguity of human language and from the dangers associated with a literal interpretation of our commands.

Can machines do what we intend, but not what we say?

As Melanie Mitchell, Professor of Complexity at the University of Santa Fe has written, “we want machines to do what we intend, not necessarily what we are told.” Is it possible to achieve this and make artificial intelligences able to contextualize and balance our commands, interpreting them as we humans would?

Leaving aside for a moment the fact that the advent of an artificial superintelligence is, at least for the time to come, a sci-fi scenario, the solution to the problem that has been repeatedly proposed – for example by the CEO of OpenAI Sam Altman or in open letter from the Future of Life Institute – goes from what in English is called “AI alignment”: the alignment of artificial intelligence to the values ​​of the human being.

According to this thesis, providing our values ​​to the machines would allow them to interpret the commands correctly, not limiting themselves to “maximizing the objective function” (i.e. completing the task that has been given to them in the most efficient way possible), but autonomously understanding what really want and what are the limits and constraints to respect (for example, avoid destroying the planet to produce an exorbitant number of paper clips).

How human values ​​could avoid AI’s more apocalyptic scenarios

Beyond the more apocalyptic scenarios, integrating the values ​​of the human being into artificial intelligences could also prove useful in the case of autonomous weapons (which are now becoming a reality). At the moment, these warfare systems can in fact independently identify and strike the target that has been provided to them, but they are not able to evaluate the ethical aspects that a human soldier would (hopefully) take into consideration, deciding for example not to lead to terminate its mission if there is a high risk of civilian casualties.

Is it possible for me to provide the values ​​of human beings to artificial intelligences? Some attempts – such as that of integrating the principles of moral philosophy into machines or of training large language models (such as ChatGPT) using a large body of ethical judgments – have so far proved to be of little success. The most promising way could instead go through the training technique known as “reverse reinforcement learning”.

Simplifying, this technique does not intend to provide the machine with a goal to be maximized at all costs, but instead to analyze how humans carry out their tasks, deducing from their behavior what is the best way to do it. It is a technique that has been used, for example, to train self-driving cars or to teach artificial intelligences to play video games correctly (without limiting themselves to exploiting all the bugs they find to their advantage).

“I think this technique underestimates the challenge we face,” Melanie Mitchell wrote. “Ethical notions like kindness or good behavior are far more complex and context-sensitive than anything that reverse reinforcement learning has been able to master to date. Let’s take for example the notion of ‘sincerity’, a value that we certainly would like our artificial intelligence systems to possess. In reality, one of the main problems that the current Large Language Models have is precisely their inability to distinguish the true from the false”. As if to say that perhaps we are putting the cart before the horse.

What values ​​to give to an artificial intelligence?

But even if it were possible to align artificial intelligences with our values, all this would still raise other problems. First of all: what values? In fact, values ​​and their application vary according to individual people, cultures, political positions, religions and they also change from generation to generation. As Stanford professor Stefano Ermon explained, “the problem is defining exactly what these values ​​are: I think people will have very different opinions about them and that is the main challenge”.

Not only that: even providing an artificial intelligence with a basic value such as that of the aforementioned sincerity presents several obstacles. For example, we human beings, in some moments, decide not to be sincere even for very good reasons, perhaps to protect someone or not to offend them. Teaching an AI to master these nuances would take it a step further.

“It may not be possible to fully align artificial intelligences,” said Yoshua Bengio, one of the world‘s leading experts on deep learning. “There are a lot of values ​​that are innate that we won’t be able to achieve with machine learning. It is therefore not at all clear whether we will be able to align the machines perfectly. I think the best way to put it is to say: ‘we will do our best’”.

