IPERS: Individual Prioritized Experience Replay with Subgoals for Sparse Reward Multi-Agent Reinforcement Learning

Xie, Zaipeng; Zhang, Yufeng; Qiao, Chentai; Shen, Sitong

doi:10.3233/FAIA230586

Abstract

Multi-agent reinforcement learning commonly uses a global team reward signal to represent overall collaborative performance. Value decomposition breaks this global reward into estimated individual value functions per agent, enabling efficient training. However, in sparse reward environments, agents struggle to assess if their actions achieve the team goal, slowing convergence. This impedes the algorithm’s convergence rate and overall efficacy. We present IPERS, an Individual Prioritized Experience Replay algorithm with Subgoals for Sparse Reward Multi-Agent Reinforcement Learning. IPERS integrates joint action decomposition and prioritized experience replay, maintaining invariance between global and individual loss gradients. Subgoals serve as intermediate goals that break down complex tasks into simpler steps with dense feedback and provide helpful intrinsic rewards that guide agents. This facilitates learning coordinated policies in challenging collaborative environments with sparse rewards. Experimental evaluations of IPERS in both the SMAC and GRF environments demonstrate rapid adaptation to diverse multi-agent tasks and significant improvements in win rate and convergence performance relative to state-of-the-art algorithms.

This website uses cookies

This website uses cookies