On Medians of (Randomized) Pairwise Means

Pierre Laforgue; Stéphan Clémençon; Patrice Bertail

Proceedings/Recueil Des Communications Année : 2019

On Medians of (Randomized) Pairwise Means

(1) , (2, 3, 4) , (5)

1
2
3
4
5

Pierre Laforgue

Fonction : Auteur

Centre Jacques-Petit - Archives, Textes et Science des Textes

Stéphan Clémençon

Fonction : Auteur
PersonId : 174491
IdHAL : stephan-clemencon
ORCID : 0000-0002-5879-9500
IdRef : 08905203X

Laboratoire Traitement et Communication de l'Information

Département Images, Données, Signal

Signal, Statistique et Apprentissage

Patrice Bertail

Fonction : Auteur
PersonId : 17670
IdHAL : patrice-bertail
ORCID : 0000-0002-6011-3432
IdRef : 034681280

Modélisation aléatoire de Paris X

Résumé

Tournament procedures, recently introduced in Lugosi & Mendelson (2016), offer an appealing alternative, from a theoretical perspective at least, to the principle of Empirical Risk Minimization in machine learning. Statistical learning by Median-of-Means (MoM) basically consists in segmenting the training data into blocks of equal size and comparing the statistical performance of every pair of candidate decision rules on each data block: that with highest performance on the majority of the blocks is declared as the winner. In the context of nonparametric regression, functions having won all their duels have been shown to outperform empirical risk minimizers w.r.t. the mean squared error under minimal assumptions, while exhibiting robustness properties. It is the purpose of this paper to extend this approach, in order to address other learning problems in particular, for which the performance criterion takes the form of an expectation over pairs of observations rather than over one single observation, as may be the case in pairwise ranking, clustering or metric learning. Precisely, it is proved here that the bounds achieved by MoM are essentially conserved when the blocks are built by means of independent sampling without replacement schemes instead of a simple segmentation. These results are next extended to situations where the risk is related to a pairwise loss function and its empirical counterpart is of the form of a U-statistic. Beyond theoretical results guaranteeing the performance of the learning/estimation methods proposed, some numerical experiments provide empirical evidence of their relevance in practice.

Domaines

Mathématiques [math] Probabilités [math.PR] Statistiques [math.ST] Machine Learning [stat.ML]

Fichier principal

clemencon19a.pdf (381.7 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stephan Clémençon : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-02463910

Soumis le : dimanche 2 février 2020-13:05:01

Dernière modification le : jeudi 4 avril 2024-03:10:25

Archivage à long terme le : dimanche 3 mai 2020-13:00:37

Dates et versions

hal-02463910 , version 1 (02-02-2020)

Identifiants

HAL Id : hal-02463910 , version 1

Citer

Pierre Laforgue, Stéphan Clémençon, Patrice Bertail. On Medians of (Randomized) Pairwise Means. 97, 2019, Proceedings of Machine Learning Research. ⟨hal-02463910⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS UNIV-FCOMTE PARISTECH MODALX LTCI IDS S2A IP_PARIS UNIV-PARIS-LUMIERES UNIV-PARIS-NANTERRE ELLIADD

66 Consultations

50 Téléchargements

On Medians of (Randomized) Pairwise Means

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager