This paper addresses vectorial form of Markov Decision Processes (MDPs) to solve MDPs with unknown rewards. Our method to find optimal strategies is based on reducing the computation to the determination of two separate polytopes. The first one is the set of admissible vector-valued functions and the second is the set of admissible weight vectors. Unknown weight vectors are discovered according to an agent with a set of preferences. Contrary to most existing algorithms for reward-uncertain MDPs, our approach does not require interactions with user during optimal policies generation. Instead, we use a variant of approximate value iteration on vectorial value MDPs based on classifying advantages, that allows us to approximate the set of non-dominated policies regardless of user preferences. Since any agent's optimal policy comes from this set, we propose an algorithm for discovering in this set an approximated optimal policy according to user priorities while narrowing interactively the weight polytope.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org