The intelligent voice system for the iberSPEECH-RTVE 2018 speaker diarization challenge

Abbas Khosravani; Cornelius Glackin; Nazim Dugan; Gérard Chollet; Nigel Cannings

doi:10.21437/IberSPEECH.2018-48

Communication Dans Un Congrès Année : 2018

The intelligent voice system for the iberSPEECH-RTVE 2018 speaker diarization challenge

, , , (1, 2, 3) ,

1
2
3

Abbas Khosravani

Fonction : Auteur

Cornelius Glackin

Fonction : Auteur

Nazim Dugan

Fonction : Auteur

Gérard Chollet

Fonction : Auteur
PersonId : 176991
IdHAL : gerard-chollet
ORCID : 0000-0003-4245-146X
IdRef : 078020824

Institut Polytechnique de Paris

Département Electronique et Physique

ARMEDIA

Nigel Cannings

Fonction : Auteur

Résumé

This paper describes the Intelligent Voice (IV) speaker diarization system for IberSPEECH-RTVE 2018 speaker diarization challenge. We developed a new speaker diarization built on the success of deep neural network based speaker embeddings in speaker verification systems. In contrary to acoustic features such as MFCCs, deep neural network embeddings are much better at discerning speaker identities especially for speech acquired without constraint on recording equipment and environment. We perform spectral clustering on our proposed CNNLSTM-based speaker embeddings to find homogeneous segments and generate speaker log likelihood for each frame. A HMM is then used to refine the speaker posterior probabilities through limiting the probability of switching between speakers when changing frames. We present results obtained on the development set (dev2) as well as the evaluation set …

Mots clés

Speaker diarization CNN LSTM Speaker embedding

Domaines

Informatique [cs] Réseau de neurones [cs.NE] Traitement du signal et de l'image [eess.SP]

TelecomParis HAL : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-02287998

Soumis le : vendredi 13 septembre 2019-17:34:34

Dernière modification le : jeudi 21 décembre 2023-11:35:05

Dates et versions

hal-02287998 , version 1 (13-09-2019)

Identifiants

HAL Id : hal-02287998 , version 1
DOI : 10.21437/IberSPEECH.2018-48

Citer

Abbas Khosravani, Cornelius Glackin, Nazim Dugan, Gérard Chollet, Nigel Cannings. The intelligent voice system for the iberSPEECH-RTVE 2018 speaker diarization challenge. IberSPEECH 2018, Nov 2018, Barcelone, Spain. pp.231-235, ⟨10.21437/IberSPEECH.2018-48⟩. ⟨hal-02287998⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM TELECOM-SUDPARIS PARISTECH UNIV-PARIS-SACLAY

40 Consultations

0 Téléchargements

The intelligent voice system for the iberSPEECH-RTVE 2018 speaker diarization challenge

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager