AlegAATr the Bandit

Pedersen, Ethan; Crandall, Jacob

doi:10.3233/FAIA230475

Abstract

One design strategy for developing intelligent agents is to create N distinct behaviors, each of which works effectively in particular tasks and circumstances. At each time step during task execution, the agent, or bandit, chooses which of the N behaviors to use. Traditional bandit algorithms for making this selection often (1) assume the environment is stationary, (2) focus on asymptotic performance, and (3) do not incorporate external information that is available to the agent. Each of these simplifications limits these algorithms such that they often cannot be used successfully in practice. In this paper, we propose a new bandit algorithm, called AlegAATr, as a step toward overcoming these deficiencies. AlegAATr leverages a technique called Assumption-Alignment Tracking (AAT), proposed previously in the robotics literature, to predict the performance of each behavior in each situation. It then uses these predictions to decide which behavior to use at any given time. We demonstrate the effectiveness of AlegAATr in selecting behaviors in three problem domains: repeated games, ad hoc teamwork, and a human-robot pick-n-place task.

This website uses cookies

This website uses cookies