TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks

Abstract :

A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images from the spectrograms produced for the TIMIT and NTIMIT utterances. These images were assigned to the appropriate phone class by parsing the TIMIT and NTIMIT phone transcriptions. The GoogLeNet convolutional neural network was implemented and trained using stochastic gradient descent with mini batches. Post training, phonetic rescoring was performed to map each phone set to the smaller standard set, i.e. the 61 phone set was mapped to the 39 phone set. Benchmark results of both datasets are presented for comparison to other state-of-the-art approaches. It will be shown that this convolutional neural network approach is particularly well suited to network noise and the distortion of speech data, as demonstrated by the state-of-the-art benchmark results for NTIMIT.

Complete list of metadatas

https://hal.telecom-paristech.fr/hal-02287997
Contributor : Telecomparis Hal <>
Submitted on : Friday, September 13, 2019 - 5:34:32 PM
Last modification on : Thursday, October 17, 2019 - 12:37:01 PM

Identifiers

  • HAL Id : hal-02287997, version 1

Citation

Cornelius Glackin, Julie Wall, Gérard Chollet, Nazim Dugan, Nigel Cannings. TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks. Pattern Recognition Applications and Methods, Springer, pp.89-100, 2019. ⟨hal-02287997⟩

Share

Metrics

Record views

3