Solving MDPs with Unknown Rewards Using Nondominated Vector-Valued Functions

Alizadeh, Pegah; Chevaleyre, Yann; L&#233;vy, Fran&#231;ois

doi:10.3233/978-1-61499-682-8-15

Abstract

This paper addresses vectorial form of Markov Decision Processes (MDPs) to solve MDPs with unknown rewards. Our method to find optimal strategies is based on reducing the computation to the determination of two separate polytopes. The first one is the set of admissible vector-valued functions and the second is the set of admissible weight vectors. Unknown weight vectors are discovered according to an agent with a set of preferences. Contrary to most existing algorithms for reward-uncertain MDPs, our approach does not require interactions with user during optimal policies generation. Instead, we use a variant of approximate value iteration on vectorial value MDPs based on classifying advantages, that allows us to approximate the set of non-dominated policies regardless of user preferences. Since any agent's optimal policy comes from this set, we propose an algorithm for discovering in this set an approximated optimal policy according to user priorities while narrowing interactively the weight polytope.

This website uses cookies

This website uses cookies