SDV: Simple Double Validation Model-Based Offline Reinforcement Learning

Wang, Xun; Chen, Haonan; Yang, Junming; Qian, Zhuzhong; Zhang, Bolei

doi:10.3233/FAIA230562

Abstract

Offline reinforcement learning (RL) aims to learn effective policies from recorded data without further interactions in the environments that are often costly or risky. Model-based algorithms, which begin by constructing an environmental model and then learn the policy under the model, have become a promising approach. However, most existing works have been over-conservative to avoid the out-of-distribution error induced by the model generated samples, leading to poor performance instead. In this work, we propose a novel model-based offline RL method, named Simple Double Validation (SDV). The main idea of SDV is to introduce an additional guidance model to assist the agent in determining the rationality of the states, combined with an advantage weighting factor to avoid effects that could potentially mislead the models due to suboptimal samples. In this way, the agent can be guided to more favourable states with reliable decisions. We evaluated SDV on the widely studied offline RL benchmarks and demonstrated its state-of-the-art performance. At the same time, our work introduces the idea of double validation and model advantage weighting into the field of model-based offline RL, providing new insights for future research.

This website uses cookies

This website uses cookies