Partially Observable Markov Decision Processes have gained an increasing interest in many research communities, due to sensible improvements of their optimization algorithms and of computers capabilities. Yet, most research focus on optimizing either average accumulated rewards (AI planning) or direct entropy (active perception), whereas none of them matches the rewards actually gathered at execution. Indeed, the first optimization criterion linearly averages over all belief states, so that it does not gain best information from different observations, while the second one totally discards rewards. Thus, motivated by simple demonstrative examples, we study an additive combination of these two criteria to get the best of reward gathering and information acquisition at execution. We then compare our criterion with classical ones, and highlight the need to consider new hybrid non-linear criteria, on a realistic multi-target recognition and tracking mission.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org