As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Offline Reinforcement Learning (RL) is an important research domain for real-world applications because it can avert expensive and dangerous online exploration. Offline RL is prone to extrapolation errors caused by the distribution shift between offline datasets and states visited by behavior policy. Existing offline RL methods constrain the policy to offline behavior to prevent extrapolation errors. But these methods limit the generalization potential of agents in Out-Of-Distribution (OOD) regions and cannot effectively evaluate OOD generalization behavior. To improve the generalization of the policy in OOD regions while avoiding extrapolation errors, we propose an Energy-Based Policy Optimization (EBPO) method for OOD generalization. An energy function based on the distribution of offline data is proposed for the evaluation of OOD generalization behavior, instead of relying on model discrepancies to constrain the policy. The way of quantifying exploration behavior in terms of energy values can balance the return and risk. To improve the stability of generalization and solve the problem of sparse reward in complex environment, episodic memory is applied to store successful experiences that can improve sample efficiency. Extensive experiments on the D4RL datasets demonstrate that EBPO outperforms the state-of-the-art methods and achieves robust performance on challenging tasks that require OOD generalization.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.