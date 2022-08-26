Original title: The Institute of Mathematics participated in the hosting of the IEEE CoG 2022 football AI competition, and the baseline model ranked second in the double-game standings

Recently, the IEEE Conference on Games (CoG), the top international conference in the field of artificial intelligence, was held in China for the first time. The Shanghai Digital Brain Research Institute (hereinafter referred to as the “Digital Brain Research Institute”), which focuses on the field of decision-making intelligence, cooperated with the Institute of Automation of the Chinese Academy of Sciences to participate in the most popular football AI competition, and provided the baseline model for this competition.In the end, the baseline model provided by the Data Research Institute is second only to the champion NetEase in the 5vs5 track and the 11vs11 track.

The baseline model training provided by this research institute comes from the multi-agent training framework based on the decision-making intelligence large model developed by the Institute of Mathematics. Can also cover multi-agent interstellar environments（StarCraft Mult-agent Challenge）multi-agent Mujoco robot environment, multi-agent particle environment（Multi-agent Particle World）and other environments.Therefore, the generalization ability of the model has been greatly improved, which also shows that the core technology decision-making intelligence large model of the Institute has a huge space to serve multiple decision-making task scenarios through a large model in the future.

The Institute has always been committed to becoming a top research institution and commercial enterprise leading the development of the global intelligent decision-making field. Its co-founder and dean are Wang Jun, a world-renowned scientist in the field of decision-making intelligence and a professor of the Department of Computer Science of UCL. The results are transformed into advanced productivity, helping industrial customers to achieve more scientific, efficient and intelligent decision-making.At present, the Digital Research Institute has gathered a large number of international top intelligent decision-making scientific research and business talents, providing rich solutions for communication, consumer retail, game entertainment, energy and chemical industry and other fields, effectively promoting the digital and intelligent upgrade of related industries.

The IEEE CoG Football Multi-Agent Competition is a challenging and interesting event, attracting more than 100 teams from home and abroad, including scientific research from top universities and research institutions such as Tsinghua University and Chinese Academy of Sciences The strong teams also have professional teams from Internet giants such as NetEase and ByteDance.The baseline model trained based on the self-developed multi-agent decision-making framework of the Institute of Digital Technology participated in the relevant competition for the first time, and it was able to achieve the second best result in the competition of strong players, proving the technical advantage of the Institute of Mathematics in the field of game AI. and engineering prowess.

The essence of this problem is a two-person zero-sum game problem nested with a multi-agent cooperative game problem. At the strategic level, it is necessary to coordinate teammates and defeat opponents.In the process of synchronous decision-making between the two sides of the game, the agent cannot accurately obtain the actions to be taken by other agents. Therefore, effective decision-making requires teammates to have the ability to cooperate and cooperate with each other, such as running and passing the ball, on the other hand. It is necessary to be able to observe and predict the opponent’s behavior, and take actions such as interception, blocking, breakthrough, and counterattack in a timely manner. At the macro level,Soccer AI also needs to strike a good balance between offensive and defensive strategies. The teamwork and competition involved are quite complex, which is one of the problems that currently plague the world‘s top AI research teams.

The Google football environment used in this game is a reinforcement learning environment based on the extension of the open source football game Gameplay Football.It fully simulates the real football game, including fouls, corner kicks, penalties, boundary kicks and offsides and other standard football rules. This game uses 3,000 environmental steps per game. Although there is no home and away points, no substitute players and no overtime, it is equally exciting. The competition system is based on the Swiss round rules, and through constant confrontation, the strongest agent is finally determined.

In the early stage of football model training, by manually adjusting the reward settings and selecting specific opponents, the team of the Institute of Mathematics and Statistics allowed the agent to acquire various basic abilities, such as passing, passing, and shooting. Next,The team uses self-game and population-based training methods to continuously improve the comprehensive strength of the agent and form the first round of models.

Track One (Football 5vs5)

Before the start of the second round, the team targeted and adjusted some key hyperparameters, such as the discount factor gamma that affects the balance between the agent’s long-term and short-term rewards, the size of the action entropy penalty that affects the balance between the agent’s utilization and exploration, etc. .While further stabilizing the algorithm, these adjustments also improve training efficiency and help train strategies with more diverse styles. In the population, the team not only added powerful strategies trained before, but also added some rule-based agents, and further formed new styles and more comprehensive strategies by confronting different opponents.For example, when facing opponents with a pressing style, they can learn the strategy of global scoring, and when facing opponents who are good at long passes, they can learn the strategy of shrinking defense and so on. In addition, in order to obtain smoother cooperation, the team greatly reduced the use of action masks in the later stages of training: action masks can significantly accelerate the convergence of the model at the beginning of training, but will limit the further optimization of the model after the model is relatively mature. Due to the increasing number of agents, the situation of 11vs11 is far more complicated than that of 5vs5. The training is initialized directly from random weights, and it hits a wall when fighting against the strongest built-in AI, and the winning rate can only reach about 0.5.Therefore, the team further adopted the idea of ​​imitation learning, and imitated models such as Wekick and SaltyFish, which ranked the top in the Kaggle football single-agent competition in 2019 and had abundant battle data.Since the 2019 game only needs to control one main player on the field, and this game needs to control all the players on the field, after obtaining the imitation learning model of the main player, the team further combined the model obtained directly from random initialization training before, It was used to fill in the movements of other players, with relatively satisfactory results. In the case of direct confrontation with the strongest built-in AI, it quickly reached a nearly 100% win rate.This method of mixing imitation learning models and reinforcement learning models can also be iterated multiple times to further improve model capabilities. Track Two (Football 11vs11) In the first round of the main competition, drawing on the experience of 5vs5, the team manually designed different auxiliary reward functions based on different training purposes, allowing the model to perform self-game and population-based training, and obtained three different styles in the first round of competition. The main models focus on offense, defense, and cooperation, respectively. In the second round of the main competition, in addition to benefiting from more suitable hyperparameters, the team also adopted a role-based reward function, so that players in different positions can be further differentiated and perform their own duties.In the selection of opponents, considering the large amount of training, we adopted a training method with priority, that is to fight more against stronger opponents, and combined with the Top-K method to directly filter out weaker opponents . In this way, the team has trained a new style of strategy, has a strong ability to shoot, and even learned to confuse opponents with fake moves. Further iterations of this model formed the team's final commit. See also It is revealed that the Chinese team Wu Lei is a “nail in the eye” of the Chinese team in Syria's clear ending battle According to the project team, during the training process of the football multi-agent, the teamSelection of algorithms, monitoring of indicators, design of systems, and design of state features, action masks, and reward functionsThere are in-depth insights and research in other aspects, and relevant personnel from Peking University also participated in providing exploration ideas. During the competition, the team adopted the heterogeneous computing power cluster of HUAWEI CLOUD ModelArts platform, which greatly improved the training efficiency. In terms of algorithm research, the technical team has further expanded the algorithm A2PO, which is more focused on cooperation, on the basis of MAPPO, HAPPO and other models. Tighter monotonic lift bounds implemented on update（monotonic improvement bound），and has better convergence properties.In addition to the Google Football environment, we also verified the performance of A2PO in the multi-agent interstellar environment (StarCraft Mult-agent Challenge), multi-agent Mujoco, multi-agent particle environment (Multi-agent Particle World), multiple environments , which outperformed the existing algorithms. Related papers will be published in the near future, so stay tuned. The first time I participated in an international competition, I took the second place in the baseline model, which also proved the technical advantages and engineering strength of the Institute of Mathematics in the field of game AI.Game AI, as one of the core areas of research by the Institute of Mathematics, has achieved a lot: team members have developed the world‘s first bridge bidding AI, and have the industry’s first general solution for zero-sum games, which can cover football, interstellar Hegemony, bridge, chess, six crown chess and many other scenes. This record has also brought affirmation and encouragement to the team. The Institute will continue to iterate relevant decision-making frameworks and algorithm models in the future, and put them into more practical application scenarios, such as game AI design and debugging, real football (basketball) Intelligent analysis of competitions (such as training plan arrangements, competition tactics formulation, etc.) and industrial robot cooperation. It is foreseeable that such methods are expected to be further migrated to more complex and challenging fields in the future, further accelerating the digital and intelligent transformation of traditional industries, and creating greater practical value.

