Title: Safe, Stable, and Robust Multi-Agent Reinforcement Learning for Connected Autonomous Vehicles
Student: Songyang Han
Major Advisor: Dr. Fei Miao
Associate Advisors: Dr. Caiwen Ding, Dr. Jinbo Bi
Review Committee Members: Dr. Dongjin Song, Dr. Yufeng Wu
Date/Time: Tuesday, April 4, 2023, 10:00 am
Location: WebEx Online & In Person
Meeting room: HBL1102
Meeting link:https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m7e81d9999c4c1880da4c5ab204a2021e
Meeting number: 2621 854 1526
Password: MRcbwvyF534
Join by video system: Dial 26218541526@uconn-cmr.webex.com
You can also dial 173.243.2.68 and enter your meeting number.
Join by phone: +1-415-655-0002 US Toll
Access code: 2621 854 1526
Abstract:
With the development of sensing and communication technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodologies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging how to take advantage of shared information in MARL to improve the safety of the CAVs and the efficiency of the traffic flow under dynamic and uncertain environments. It is also challenging to mathematically characterize the improvement of the performance of CAVs with communication and cooperation capability. To address these challenges, first, we design an information-sharing-based MARL framework for CAVs, to take advantage of the extra information when making decisions to improve traffic efficiency and safety with two new techniques: the truncated Q-function and safe action mapping. The truncated Q-function utilizes the shared information from neighboring CAVs such that the joint state and action spaces of the Q-function do not grow for a large-scale CAV system. The safe action mapping provides a provable safety guarantee for both the training and execution based on control barrier functions. Second, we propose a Shapley value-based reward reallocation to motivate stable cooperation among autonomous vehicles. We prove that Shapley value-based reward reallocation of MARL is stable and efficient. Agents will stay within the coalition or the cooperating group, communicate and cooperate with other coalition members to optimize the coalition-level objective. Finally, we study the fundamental properties of MARL under state uncertainties. We prove that the optimal agent policy and the robust Nash equilibrium do not always exist for a State-Adversarial Markov Game (SAMG). Instead, we define a new solution concept, robust agent policy, of the proposed SAMG under adversarial state perturbations, where agents want to maximize the worst-case expected state value.