Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Hans, Alexander; Udluft, Steffen

doi:10.3233/978-1-60750-606-5-361

Abstract

Reinforcement learning aims to derive an optimal policy for an often initially unknown environment. In the case of an unknown environment, exploration is used to acquire knowledge about it. In that context the well-known exploration-exploitation dilemma arises—when should one stop to explore and instead exploit the knowledge already gathered? In this paper we propose an uncertainty-based exploration method. We use uncertainty propagation to obtain the Q-function's uncertainty and then use the uncertainty in combination with the Q-values to guide the exploration to promising states that so far have been insufficiently explored. The uncertainty's weight during action selection can be influenced by a parameter. We evaluate one variant of the algorithm using full covariance matrices and two variants using an approximation and demonstrate their functionality on two benchmark problems.

This website uses cookies

This website uses cookies