Sampling and empirical risk minimization - Télécom Paris Accéder directement au contenu
Article Dans Une Revue Statistics Année : 2016

Sampling and empirical risk minimization

Résumé

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full samples is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the ‘full data’ statistics with their counterparts based on the resulting random samples, of manageable size. It is the main purpose of this paper to investigate the impact of survey sampling on statistical learning methods based on empirical risk minimization through the standard binary classification problem, considered here as a ‘case in point’. Precisely, we prove that, in presence of auxiliary information, appropriate use of optimally coupled Poisson survey plans may not affect much the learning rates, while possibly reducing significantly the number of terms that must be averaged to compute the empirical risk functional with overwhelming probability. These striking results are next shown to extend to more general sampling schemes by means of a coupling technique, originally introduced by Hajek [Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat. 1964;35(4):1491–1523].
Fichier non déposé

Dates et versions

hal-02107516 , version 1 (23-04-2019)

Identifiants

Citer

Stéphan Clémençon, Patrice Bertail, Emilie Chautru. Sampling and empirical risk minimization. Statistics, 2016, 51 (1), pp.30-42. ⟨10.1080/02331888.2016.1259810⟩. ⟨hal-02107516⟩
234 Consultations
1 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More