On Bayesian Upper Confidence Bounds for Bandit Problems - Télécom Paris Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

On Bayesian Upper Confidence Bounds for Bandit Problems

Résumé

Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. We give a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. For binary bandits, we prove that the corresponding algorithm, termed Bayes-UCB, satisfies finite-time regret bounds that imply its asymptotic optimality. More generally, Bayes-UCB appears as an unifying framework for several variants of the UCB algorithm addressing different bandit problems (parametric multi-armed bandits, Gaussian bandits with unknown mean and variance, linear bandits). But the generality of the Bayesian approach makes it possible to address more challenging models. In particular, we show how to handle linear bandits with sparsity constraints by resorting to Gibbs sampling.
Fichier non déposé

Dates et versions

hal-02286440 , version 1 (13-09-2019)

Identifiants

  • HAL Id : hal-02286440 , version 1

Citer

Emilie Kaufmann, Olivier Cappé, Aurélien Garivier. On Bayesian Upper Confidence Bounds for Bandit Problems. International Conference on Artificial Intelligence and Statistics (AISTAT), Apr 2012, La Palma, Iles Canaries, Spain. pp.592-600. ⟨hal-02286440⟩
156 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More