Guidelines for Action Space Definition in Reinforcement Learning-Based Traffic Signal Control Systems

Maxime Treca, Julian Garbiso, Dominique Barth

PosterID: 61

Traffic signal control is an urban planning tool with important economic, social and environmental implications. Reinforcement learning applied to traffic signal control (RL-TSC) has shown promising results compared to existing methods. If previous works in the RL-TSC literature have focused on optimizing state and reward definitions, the impact of the agent's action space definition remains largely unexplored. Indeed, typical RL-TSC models feature either phase-based controllers — which determine a signal duration in one go — or step-based controllers — which can decide to extend a phase duration interactively — without comparing their respective merits. In this paper, we provide guidelines for optimally defining RL-TSC actions by comparing different action types in a simulated network featuring different traffic demand patterns. Our results show that an agent's performance and convergence speed both increase with its interaction frequency with the environment. However, certain methods with lower observation frequencies — that can be achieved with realistic sensing technologies — have reasonably similar performance compared to higher frequency ones in all scenarios, and even outperform them under specific traffic conditions.

Session E3: Planning with Uncertainty

Canb 10/29/2020, 01:00 – 02:00

10/30/2020, 21:00 – 22:00

Paris 10/28/2020, 15:00 – 16:00

10/30/2020, 11:00 – 12:00

NYC 10/28/2020, 10:00 – 11:00

10/30/2020, 06:00 – 07:00

LA 10/28/2020, 07:00 – 08:00

10/30/2020, 03:00 – 04:00

Guidelines for Action Space Definition in Reinforcement Learning-Based Traffic Signal Control Systems

Maxime Treca, Julian Garbiso, Dominique Barth

Multiple-Environment Markov Decision Processes: Efficient Analysis and Applications

Krishnendu Chatterjee, Martin Chmelík, Deep Karkhanis, Petr Novotný, Amélie Royer

Probabilistic planning with formal guarantees for mobile service robots

Bruno Lacerda, Fatma Faruq, David Parker, Nick Hawes

A correctness result for synthesizing plans with loops in stochastic domains

Laszlo Treszkai, Vaishak Belle