As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Least-Squares Policy Iteration [3] is an approximate reinforcement learning technique capable of training policies over large, continuous state spaces. Unfortunately, the computational requirements of LSPI scale poorly with the number of system agents. Work has been done to address this problem, such as the Coordinated Reinforcement Learning (CRL) approach of Guestrin, et al [1], but this requires that one have prior information about the learning system such as knowing interagent dependencies and the form of the Q-function. We demonstrate a hybrid gradient-ascent/LSPI approach which is capable of using LSPI to efficiently train multi-agent policies. Our approach has computational requirements which scale as O(N), where N is the number of system agents, and does not have the prior knowledge requirements of CRL. Finally, we demonstrate our algorithm on a standard multi-agent network control problem [1].
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.