On Tree-based Methods for Similarity Learning

Abstract : In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In [21], similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the ROC curve is an appropriate performance criterion and it is the goal of this article to extend recur-sive tree-based ROC optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoret-ical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorith-mic approaches we propose.
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.telecom-paristech.fr/hal-02461801
Contributor : Stephan Clémençon <>
Submitted on : Thursday, January 30, 2020 - 11:00:09 PM
Last modification on : Sunday, February 2, 2020 - 1:13:37 AM

File

1906.09243.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02461801, version 1

Citation

Stéphan Clémençon, Robin Vogel. On Tree-based Methods for Similarity Learning. LOD 2019, 2019, Siena, Italy. ⟨hal-02461801⟩

Share

Metrics

Record views

9

Files downloads

8