On U -processes and clustering performance

Stéphan Clémençon

Chapitre D'ouvrage Année : 2011

On U -processes and clustering performance

(1, 2)

1
2

Stéphan Clémençon

Fonction : Auteur
PersonId : 174491
IdHAL : stephan-clemencon
ORCID : 0000-0002-5879-9500
IdRef : 08905203X

Laboratoire Traitement et Communication de l'Information

Département Images, Données, Signal

Résumé

Many clustering techniques aim at optimizing empirical criteria that are of the form of a U-statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U-processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk is proved to be of the order O P (1/ √ n). Based on recent results related to the tail behavior of degenerate U-processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered.

Domaines

Mathématiques [math] Machine Learning [stat.ML] Statistiques [math.ST] Probabilités [math.PR]

Fichier principal

4202-on-u-processes-and-clustering-performance.pdf (191.52 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stephan Clémençon : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-02107349

Soumis le : mardi 23 avril 2019-16:13:22

Dernière modification le : mercredi 24 avril 2024-13:02:19

Dates et versions

hal-02107349 , version 1 (23-04-2019)

Identifiants

HAL Id : hal-02107349 , version 1

Citer

Stéphan Clémençon. On U -processes and clustering performance. On U -processes and clustering performance, 2011. ⟨hal-02107349⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM PARISTECH LTCI IDS S2A

36 Consultations

26 Téléchargements

On U -processes and clustering performance

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager