Reinforcement Learning for Zone Based Multiagent Pathfinding under Uncertainty

Jiajing Ling, Tarun Gupta, Akshat Kumar

PosterID: 11

We present a new framework for the problem of multiple agents finding their paths from respective source to destination nodes in a graph (also called MAPF). Most existing approaches assume that all agents move at fixed speed, and that a single node accommodates only a single agent. Motivated by the emerging applications of autonomous vehicles such as drone traffic management, we present zone-based path finding (or ZBPF) where agents move among zones (e.g., geofenced airblocks for drones), and agents' movements require uncertain travel time. Furthermore, each zone can accommodate multiple agents (as per its capacity). We also develop a 3D simulator for ZBPF in the ml-agents platform of the Unity3D game engine, which provides a clean interface from the simulation environment to learning algorithms. We develop a novel formulation of the ZBPF problem using difference-of-convex functions (DC) programming. The resulting approach utilizes samples from the simulator to optimize agent policies. We also present a multiagent credit assignment scheme that helps our learning approach converge faster. Empirical results in a number of 2D and 3D instances show that our approach can effectively minimize congestion in zones, while ensuring agents reach their final destinations.

Session Aus1: Non-deterministic & Probabilistic Planning

Canb 10/27/2020, 20:00 – 21:00

10/30/2020, 11:00 – 12:00

Paris 10/27/2020, 10:00 – 11:00

10/30/2020, 01:00 – 02:00

NYC 10/27/2020, 05:00 – 06:00

10/29/2020, 20:00 – 21:00

LA 10/27/2020, 02:00 – 03:00

10/29/2020, 17:00 – 18:00

Stochastic Fairness and Language-Theoretic Fairness in Planning in Nondeterministic Domains

Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin

Multi-Tier Automated Planning for Adaptive Behavior

Daniel Ciolek, Nicolás D'Ippolito, Alberto Pozanco, Sebastian Sardiña

Reinforcement Learning for Zone Based Multiagent Pathfinding under Uncertainty

Jiajing Ling, Tarun Gupta, Akshat Kumar

POMDP + Information-Decay: Incorporating Defender's Behaviour in Autonomous Penetration Testing

Jonathon Schwartz, Hanna Kurniawati, Edwin El-Mahassni