Budgeted Multi–Armed Bandit in Continuous Action Space

Trov&#242;, Francesco; Paladino, Stefano; Restelli, Marcello; Gatti, Nicola

doi:10.3233/978-1-61499-672-9-560

Abstract

Multi–Armed Bandits (MABs) have been widely considered in the last decade to model settings in which an agent wants to learn the action providing the highest expected reward among a fixed set of available actions during the operational life of a system. Classical techniques provide solutions that minimize the regret due to learning in settings where selecting an arm has no cost. Though, in many real world applications the learner has to pay some cost for pulling each arm and the learning process is constrained by a fixed budget B. This problem is addressed in the literature as the Budgeted MAB (BMAB). In this paper, for the first time, we study the problem of Budgeted Continuous–Armed Bandit (BCAB), where the set of the possible actions consists in a continuous set (e.g., a range of prices) and the learner suffers from a random reward and cost at each round. We provide a novel algorithm, named B–Zoom, which suffers a regret of , where d is the Zooming dimension of the problem. Finally, we provide an empirical analysis showing that, despite a lower average performance, the proposed approach is more robust to adverse settings as compared to existing algorithms designed for BMAB.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies