As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
This paper addresses vectorial form of Markov Decision Processes (MDPs) to solve MDPs with unknown rewards. Our method to find optimal strategies is based on reducing the computation to the determination of two separate polytopes. The first one is the set of admissible vector-valued functions and the second is the set of admissible weight vectors. Unknown weight vectors are discovered according to an agent with a set of preferences. Contrary to most existing algorithms for reward-uncertain MDPs, our approach does not require interactions with user during optimal policies generation. Instead, we use a variant of approximate value iteration on vectorial value MDPs based on classifying advantages, that allows us to approximate the set of non-dominated policies regardless of user preferences. Since any agent's optimal policy comes from this set, we propose an algorithm for discovering in this set an approximated optimal policy according to user priorities while narrowing interactively the weight polytope.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.