Hog and Subband power distribution image features for acoustic scene classification

Victor Bisot 1, 2 Slim Essid 1, 2 Gael Richard 1, 2
1 S2A - Signal, Statistique et Apprentissage
LTCI - Laboratoire Traitement et Communication de l'Information
Abstract :

Acoustic scene classification is a difficult problem mostly due to the high density of events concurrently occurring in audio scenes. In order to capture the occurrences of these events we propose to use the Subband Power Distribution (SPD) as a feature. We extract it by computing the histogram of amplitude values in each frequency band of a spectrogram image. The SPD allows us to model the density of events in each frequency band. Our method is evaluated on a large acoustic scene dataset using support vector machines. We outperform the previous methods when using the SPD in conjunction with the histogram of gradients. To reach further improvement, we also consider the use of an approximation of the earth mover's distance kernel to compare histograms in a more suitable way. Using the so-called Sinkhorn kernel improves the results on most of the feature configurations. Best performances reach a 92.8% F1 score.

Complete list of metadatas

https://hal.telecom-paristech.fr/hal-02287266
Contributor : Telecomparis Hal <>
Submitted on : Friday, September 13, 2019 - 4:47:03 PM
Last modification on : Sunday, September 15, 2019 - 1:11:32 AM

Identifiers

  • HAL Id : hal-02287266, version 1

Citation

Victor Bisot, Slim Essid, Gael Richard. Hog and Subband power distribution image features for acoustic scene classification. EUSIPCO, Sep 2015, Nice, France. pp.719-723. ⟨hal-02287266⟩

Share

Metrics

Record views

5