Learning Task Automata for Reinforcement Learning Using Hidden Markov Models

Abate, Alessandro; Almulla, Yousif; Fox, James; Hyland, David; Wooldridge, Michael

doi:10.3233/FAIA230247

Abstract

Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification. We learn non-Markovian finite task specifications as finite-state ‘task automata’ from episodes of agent experience within environments with unknown dynamics. First, we learn a product MDP, a model composed of the specification’s automaton and the environment’s MDP (both initially unknown), by treating it as a partially observable MDP and employing a hidden Markov model learning algorithm. Second, we efficiently distil the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our automaton enables a task to be decomposed into sub-tasks, so an RL agent can later synthesise an optimal policy more efficiently. It is also an interpretable encoding of high-level task features, so a human can verify that the agent’s learnt tasks have no misspecifications. Finally, we also take steps towards ensuring that the automaton is environment-agnostic, making it well-suited for use in transfer learning.

This website uses cookies

This website uses cookies