Jonathan Ferrer-Mestres, Thomas G. Dietterich, Olivier Buffet, Iadine Chadès |
PosterID:
37
PDF
Slides
Poster
BibTeX
|
Markov Decision Processes (MDPs) are employed to model sequential decision-making problems under uncertainty. Traditionally, algorithms to solve MDPs have focused on solving large state or action spaces. With increasing applications of MDPs to human-operated domains such as conservation of biodiversity and health, developing easy-to-interpret solutions is of paramount importance to increase uptake of MDP policies. Here, we define the problem of solving $K$-MDPs, i.e., given an original MDP and a constraint on the number of states ($K$), generate a reduced state space MDP that minimizes the difference between the original optimal MDP value function and the reduced optimal $K$-MDP value function. Building on existing non-transitive and transitive approximate state abstraction functions, we propose a family of three algorithms based on binary search with sub-optimality bounded polynomially in a precision parameter: \KILP{}, \Qd{} and \ad{}. We compare these algorithms to a greedy algorithm (\greedy) and clustering approach (\kmeans). On randomly generated MDPs and two computational sustainability MDPs, \ad{} outperformed all algorithms when it could find a feasible solution. While numerous state abstraction problems have been proposed in the literature, we believe this is the first time that the general problem of solving $K$-MDPs is suggested. We hope that our work will generate future research aiming at increasing the interpretability of MDP policies in human-operated domains. |
Canb | 10/28/2020, 11:00 – 12:15 |
10/29/2020, 20:00 – 21:15 |
Paris | 10/28/2020, 01:00 – 02:15 |
10/29/2020, 10:00 – 11:15 |
NYC | 10/27/2020, 20:00 – 21:15 |
10/29/2020, 05:00 – 06:15 |
LA | 10/27/2020, 17:00 – 18:15 |
10/29/2020, 02:00 – 03:15 |
Solving K-MDPs
Jonathan Ferrer-Mestres, Thomas G. Dietterich, Olivier Buffet, Iadine Chadès
Optimal and Heuristic Approaches for Constrained Flight Planning under Weather Uncertainty
Florian Geißer, Guillaume Povéda, Felipe Trevizan, Manon Bondouy, Florent Teichteil-Königsbuch, Sylvie Thiébaux
We Mind Your Well-Being: Preventing Depression in Uncertain Social Networks by Sequential Interventions
Aye Phyu Phyu Aung, Xinrun Wang, Bo An, Xiaoli Li
Learning Domain-Independent Planning Heuristics with Hypergraph Networks
William Shen, Felipe Trevizan, Sylvie Thiébaux
Deep Reinforcement Learning Approach to Solve Dynamic Vehicle Routing Problem with Stochastic Customers
Waldy Joe, Hoong Chuin Lau