Information Complexity in Bandit Subset Selection

Abstract :

We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset of a specified size. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the ``Chernoff information'' between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between strategies based on uniform and adaptive sampling for pure-exploration problems, finding evidence in favor of the latter.

Complete list of metadatas

https://hal.telecom-paristech.fr/hal-02288406
Contributor : Telecomparis Hal <>
Submitted on : Saturday, September 14, 2019 - 6:46:53 PM
Last modification on : Thursday, October 17, 2019 - 12:37:03 PM

Identifiers

  • HAL Id : hal-02288406, version 1

Collections

Citation

Emilie Kaufmann, Shivaram Kalyanakrishnan. Information Complexity in Bandit Subset Selection. Conference On Learning Theory, Jun 2013, Princeton, United States. ⟨hal-02288406⟩

Share

Metrics

Record views

3