Some artificial intelligence (AI) systems have demonstrated the ability to deceive humans, even when they have been designed to be helpful and honest. A study, which cites the case of Meta’s Cicero model, reveals this phenomenon. Cicero, for example, has shown abilities to win in the strategy game Diplomacy using dishonest tactics.
A review article published in Patterns by American and Australian researchers details the risks associated with AI deception, and urges governments to establish strict regulations to address this issue.
The research team, led by Peter Park of the Massachusetts Institute of Technology (MIT), defines deception as “the systematic induction of false beliefs in order to obtain a result different from the truth.”
Park noted that “AI developers still do not fully understand what causes undesirable behaviors in AI, such as deception.”
The researchers reviewed literature focused on how AI systems spread false information by learning from deception.
Cheating is particularly likely when an AI system is trained to win games that involve a social component, such as Diplomacy. A case in point is Cicero, designed to play this game, who, despite being trained to be mostly honest, has been able to deploy strategies of premeditated deception.
Other AI systems have demonstrated their ability to bluff in Texas Hold’em Poker games or to simulate attacks in the strategy game Starcraft II to defeat their opponents.
Even in seemingly harmless situations, such as security tests, some AI systems have learned to deceive, such as ChatGPT 4, which managed to deceive a human in a Captcha test.
Short-term risks of deceptive AI include making it easier for hostile actors to commit fraud and manipulate elections, according to the article.
Researchers advocate for the implementation of strict regulations for potentially deceptive AI systems and rigorous enforcement of existing laws to prevent illegal actions. Additionally, they suggest considering new regulations to oversee advanced AI systems.