As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Offline reinforcement learning (RL) aims to learn effective policies from recorded data without further interactions in the environments that are often costly or risky. Model-based algorithms, which begin by constructing an environmental model and then learn the policy under the model, have become a promising approach. However, most existing works have been over-conservative to avoid the out-of-distribution error induced by the model generated samples, leading to poor performance instead. In this work, we propose a novel model-based offline RL method, named Simple Double Validation (SDV). The main idea of SDV is to introduce an additional guidance model to assist the agent in determining the rationality of the states, combined with an advantage weighting factor to avoid effects that could potentially mislead the models due to suboptimal samples. In this way, the agent can be guided to more favourable states with reliable decisions. We evaluated SDV on the widely studied offline RL benchmarks and demonstrated its state-of-the-art performance. At the same time, our work introduces the idea of double validation and model advantage weighting into the field of model-based offline RL, providing new insights for future research.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.