As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Many Reinforcement Learning algorithms assume a Markov reward function to guarantee optimality. However, not all reward functions are Markov. This paper proposes a framework for mapping non-Markov reward functions into equivalent Markov ones by learning specialized reward automata, Reward Machines. Unlike the general practice of learning Reward Machines, we do not require a set of high-level propositional symbols from which to learn. Rather, we learn hidden triggers, directly from data, that construct them. We demonstrate the importance of learning Reward Machines over their Deterministic Finite-State Automata counterparts given their ability to model reward dependencies. We formalize this distinction in our learning objective. Our mapping process is constructed as an Integer Linear Programming problem. We prove that our mappings form a suitable proxy for maximizing reward expectations. We empirically validate our approach by learning black-box, non-Markov reward functions in the Officeworld domain. Additionally, we demonstrate the effectiveness of learning reward dependencies in a new domain, Breakfastworld.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.