BelMan: An Information-Geometric Approach to Stochastic Bandits

Debabrota Basu 1 Pierre Senellart 2, 3 Stephane Bressan 4
2 VALDA - Value from Data
DI-ENS - Département d'informatique de l'École normale supérieure, Inria de Paris
3 DIG - Data, Intelligence and Graphs
LTCI - Laboratoire Traitement et Communication de l'Information
Abstract : We propose a Bayesian information-geometric approach to the exploration-exploitation trade-off in stochastic multi-armed bandits. The uncertainty on reward generation and belief is represented using the manifold of joint distributions of rewards and beliefs. Accumulated information is summarised by the barycentre of joint distributions, the pseudobelief-reward. While the pseudobelief-reward facilitates information accumulation through exploration, another mechanism is needed to increase exploitation by gradually focusing on higher rewards, the pseudobelief-focal-reward. Our resulting algorithm, BelMan, alternates between projection of the pseudobelief-focal-reward onto belief-reward distributions to choose the arm to play, and projection of the updated belief-reward distributions onto the pseudobelief-focal-reward. We theoretically prove BelMan to be asymptotically optimal and to incur a sublinear regret growth. We instantiate BelMan to stochastic bandits with Bernoulli and exponential rewards, and to a real-life application of scheduling queueing bandits. Comparative evaluation with the state of the art shows that BelMan is not only competitive for Bernoulli bandits but in many cases also outperforms other approaches for exponential and queueing bandits.
Document type :
Conference papers
Complete list of metadatas

Cited literature [45 references]  Display  Hide  Download

https://hal.inria.fr/hal-02195539
Contributor : Pierre Senellart <>
Submitted on : Friday, July 26, 2019 - 1:42:07 PM
Last modification on : Monday, August 5, 2019 - 10:21:42 AM

File

main.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02195539, version 1

Citation

Debabrota Basu, Pierre Senellart, Stephane Bressan. BelMan: An Information-Geometric Approach to Stochastic Bandits. ECML/PKDD - The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2019, Würzburg, Germany. ⟨hal-02195539⟩

Share

Metrics

Record views

101

Files downloads

347