POMDP solving: what rewards do you really expect at execution?

Chanel, Caroline Ponzoni Carvalho; Farges, Jean-Loup; Teichteil-K&#246;nigsbuch, Florent; Infantes, Guillaume

doi:10.3233/978-1-60750-676-8-50

Abstract

Partially Observable Markov Decision Processes have gained an increasing interest in many research communities, due to sensible improvements of their optimization algorithms and of computers capabilities. Yet, most research focus on optimizing either average accumulated rewards (AI planning) or direct entropy (active perception), whereas none of them matches the rewards actually gathered at execution. Indeed, the first optimization criterion linearly averages over all belief states, so that it does not gain best information from different observations, while the second one totally discards rewards. Thus, motivated by simple demonstrative examples, we study an additive combination of these two criteria to get the best of reward gathering and information acquisition at execution. We then compare our criterion with classical ones, and highlight the need to consider new hybrid non-linear criteria, on a realistic multi-target recognition and tracking mission.

This website uses cookies

This website uses cookies