IJTCS | 分論壇日程:多智能體強化學習

2021-03-03 北京大學前沿計算研究中心

首屆國際理論計算機聯合大會（International Joint Conference on Theoretical Computer Science，IJTCS）將於2020年8月17日-21日在線上舉行，由北京大學與中國工業與應用數學學會（CSIAM）、中國計算機學會（CCF）、國際計算機學會中國委員會（ACM China Council）聯合主辦，北京大學前沿計算研究中心承辦。

本次大會的主題為「理論計算機科學領域的最新進展與焦點問題」。大會共設7個分論壇，分別對算法博弈論、區塊鏈技術、多智能體強化學習、機器學習理論、量子計算、機器學習與形式化方法和算法與複雜性等領域進行深入探討。同時，大會特別開設了青年博士論壇、女性學者論壇與本科生科研論壇，薈集海內外知名專家學者，聚焦理論計算機前沿問題。有關信息將持續更新，敬請關注！

本期帶來「多智能體強化學習」分論壇精彩介紹。

多智能體強化學習是近年來新興的研究領域，它結合博弈論與深度強化學習，致力於解決複雜狀態、動作空間下的群體智能決策問題，在遊戲AI、工業機器人、社會預測等方面具有廣泛的應用前景。當前，中國研究者在多智能體算法收斂性理論、多智能體通訊機制學習算法、大規模多智能體系統等問題取得許多進展，正與全世界的研究者一道推進多智能體強化學習的研究。本次 IJTCS MARL Track 將聚焦多智能體通訊算法、基於世界模型的強化學習算法、多智能體策略評估、多智能體強化學習的解概念等前沿課題，希望與廣大研究者一同探討多智能體強化學習的未來發展方向。

Online Search and Pursuit-Evasion in Robotics

In search and pursuit-evasion problems one team of mobile entities are requested to seek, a set of fixed objects or capture another team of moving objects in an environment. Searching strategy or motion planning plays a key role in any scenario. In this talk we briefly introduce several exploration and search models in an unknown environment, and propose a number of challenging algorithmic problems.

A Distance Function to Nash Equilibrium

Nash equilibrium has long been a desired solution concept in economics and game theoretical studies. Although the related complexity literature closed the door to efficiently compute the exact equilibrium, approximation methods are still sought after in its various application fields, such as online marketing, crowdsourcing, sharing economy and so on. In this paper, we present a new approach to obtain approximate Nash equilibrium in any N-player normal-form zero-sum game with discrete action spaces, which is applicable to solve any general N-player game with some pre-processing. Our approach defines a new measure for the distance between the current joint strategy profile of players and that of a Nash equilibrium. The computing process transforms the task of finding the equilibrium into one of finding a global minimization solution. We solve it based on a gradient descent algorithm and further prove the convergences of our algorithm under moderate assumptions. We next compare our algorithm with baselines by experiments, show consistent and significant improvement in approximate Nash equilibrium computation and show the robustness of the algorithm as the game size increases.

Model-based Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) typically suffers from low sample efficiency due to useless multi-agent exploration in the state & joint action space. In single-agent RL tasks, there has been an increasing interest of building environment dynamics model and performing model-based RL to improve the sample efficiency. In this talk, I will perform an attempt to build model-based methods to achieve sample-efficient MARL. First, I will discuss several important settings of model-based MARL tasks and the key challenges there. Then I will delve into the decentralized model-based MARL setting, which can be used on almost all decentralized model-free methods of MARL. Theoretic bound on policy value discrepancy will be derived, based on which an effiicient decentralized model-based MARL algorithm will be introduced. Further, I will show the preliminary experimental results. The final takeaway of this talk will be the discussion of feasibility and challenges of model-based MARL.

Solution Concepts in Multi-agent Reinforcement Learning

Nash equilibrium has long been a well-studied solution concept in game theory. Naturally, multi-agent reinforcement learning algorithms usually set Nash equilibrium as the laerning objective. However, in many situations, other solution concepts such as Stackelberg equilibrium and correlated equilibrium have potential to perform better than Nash equilibrium. In this talk, we will talk about two MARL algorithms, bi-level actor-citic (Bi-AC) and signal instructed coordination (SIC), which aim to solving Stackelberg and correlated equilibrium respectively.

Learning Multi-Agent Cooperation

Cooperation is a widespread phenomenon in nature, from viruses, bacteria, and social amoebae to insect societies, social animals, and humans. It is also crucially important to enable agents to learn to cooperate in multi-agent environments for many applications, e.g., autonomous driving, multi-robot control, traffic light control, smart grid control, network optimization, etc. In this talk, I will focus on the latest reinforcement learning methods for multi-agent cooperation via joint policy learning, communication, agent modeling, etc.

An Overview of Game-Based AI Competitions---From a Perspective of AI Evaluation

Intelligence exists when we measure it! A game-based AI competition explicitly depicts our imagination of intelligence, therefore recently, holding this kind of competition is quite popular in AI conferences such as AAAI, IJCAI. With its bright and accurate definition of problems, unified platform environment, fair performance assessment mechanism, open data set, and benchmark, game-based AI competition has attracted many researchers, thus accelerating the development of artificial intelligence technology.

There is a new trend of game-based competitions that hosts a competition for a long time with an online platform, and this will encourage researchers and fans of AI to continuously work on a task and share information at any time. The platform enables us to test the learning ability of bots as well. In this trend, we are facing the problem of evaluating an enormous amount of bots quickly and fairly.

Through the collection and analysis of various competitions, this paper finds that the games used in the competitions are becoming more complex, and the techniques used in the matches are also becoming more complex. The judgment for a match becomes more time consuming and sometimes yield results with randomness. These problems, combined with an increase in the number of participants, have led to the need for organizers to improve the race process to produce fair results on time.

An emerging MCTS (Monte Carlo Tree Search) based AI evaluation method is worthy of our attention. Hopefully, this method may measure the intelligent levels of a bot quantitatively and possibly compare bots created for different games. Besides the above, measuring a bot’s cooperative ability in a multi-agent (three agents or more) system is still an open problem.

本次大會已經正式面向公眾開放註冊！每位參與者可以選擇免費註冊以觀看線上報告，或是支付一定費用以進一步和講者就報告內容進行交流，深度參與大會的更多環節。

註冊截止：2020年8月15日23:59

點擊 ↓↓↓二維碼↓↓↓ 跳轉註冊頁面：

*學生註冊：網站上註冊後需將學生證含有個人信息和學校信息的頁拍照發送至IJTCS@pku.edu.cn，郵件主題格式為＂Student Registration + 姓名＂。

John Hopcroft

中國科學院外籍院士、北京大學訪問講席教授

張平文

中國科學院院士、CSIAM理事長、北京大學教授

大會網站：

https://econcs.pku.edu.cn/ijtcs2020/IJTCS2020.html

註冊連結：

https://econcs.pku.edu.cn/ijtcs2020/Registration.htm

大會贊助、合作等信息，請聯繫：IJTCS@pku.edu.cn

本微信公眾號所有內容，由北京大學前沿計算研究中心微信自身創作、收集的文字、圖片和音視頻資料，版權屬北京大學前沿計算研究中心微信所有；從公開渠道收集、整理及授權轉載的文字、圖片和音視頻資料，版權屬原作者。本公眾號內容原作者如不願意在本號刊登內容，請及時通知本號，予以刪除。

IJTCS | 分論壇日程:多智能體強化學習

相關焦點

IJTCS | 分論壇日程:算法博弈論

北大NeurIPS19論文提出多智能體強化學習方法FEN

[Paper精讀 | 多智能體強化學習算法:QMIX]

【萬字總結】基於多智能體強化學習的《星際爭霸II》中大師級水平的技術研究

DeepMind 在多智能體強化學習方面又有了新進展,最新成果登上...

AlphaGo原來是這樣運行的,一文詳解多智能體強化學習基礎和應用

AlphaGo原來是這樣運行的,一文詳解多智能體強化學習的基礎和應用

智能決策論壇系列解讀 | 深度強化學習理論和算法

ICCV 2019 | 曠視研究院推出基於深度強化學習的繪畫智能體

DeepMind提出SPIRAL:使用強化對抗學習,實現會用畫筆的智能體

多任務深度強化學習綜述

IJTCS | 大會特邀報告介紹(一)

李飛飛提出深度進化強化學習新框架:創建具身智能體學會動物進化法則

強化學習總體介紹-初步搭建強化學習理論體系(一)

《星際爭霸II》中多智能體的群體博弈策略解讀

澳門大學講座教授陳俊龍:從深度強化學習到寬度強化學習:結構,算法...

強化學習簡介(一)

深度強化學習核心技術實戰培訓班

深度強化學習——從DQN到DDPG