Monte Carlo Information-Oriented Planning

Thomas, Vincent; Hutin, G&#233;r&#233;my; Buffet, Olivier

doi:10.3233/FAIA200368

Abstract

In this article, we discuss how to solve information-gathering problems expressed as ρ-POMDPs, an extension of Partially Observable Markov Decision Processes (POMDPs) whose reward ρ depends on the belief state. Point-based approaches used for solving POMDPs have been extended to solving ρ-POMDPs as belief MDPs when its reward ρ is convex in B or when it is Lipschitz-continuous. In the present paper, we build on the POMCP algorithm to propose a Monte Carlo Tree Search for ρ-POMDPs, aiming for an efficient on-line planner which can be used for any ρ function. Adaptations are required due to the belief-dependent rewards to (i) propagate more than one state at a time, and (ii) prevent biases in value estimates. An asymptotic convergence proof to Ïţ-optimal values is given when ρ is continuous. Experiments are conducted to analyze the algorithms at hand and show that they outperform myopic approaches.

This website uses cookies

This website uses cookies