Speech Emotion Recognition Using Deep Convolutional Neural Networks Improved by the Fast Continuous Wavelet Transform

Van Zwol, Bj&#246;rn E.; Langezaal, Mathijs A.; Arts, Lukas P.A.; Gatt, Albert; Van Den Broek, Egon L.

doi:10.3233/AISE230012

Speech Emotion Recognition Using Deep Convolutional Neural Networks Improved by the Fast Continuous Wavelet Transform

Authors

Björn E. Van Zwol, Mathijs A. Langezaal, Lukas P.A. Arts, Albert Gatt, Egon L. Van Den Broek

Pages

63 - 72

DOI

10.3233/AISE230012

Category

Research Article

Series

Ambient Intelligence and Smart Environments

Ebook

Volume 32: Workshop Proceedings of the 19th International Conference on Intelligent Environments (IE2023)

Abstract

The fast Continuous Wavelet Transform (fCWT) is used to improve Deep Convolutional Neural Networks (DCNN)’s Speech Emotion Recognition (SER). While being computationally efficient, the fCWT’s time-frequency analysis overcomes traditional methods’ resolution limitations (e.g., Short-Term Fourier Transform). fCWT-induced DCNNs are compared to state-of-the-art DCNN SER systems. Comparing different wavelet parameters, we also provide an empirical strategy for balancing temporal and spectral features in speech signals. We suggest that this strategy is of generic interest for non-stationary signal processing where large amounts of data are available. fCWT’s potential for improving SER accuracy in real-time applications is confirmed. In parallel, the variance in the cross-validation folds confirmed deep learning’s vulnerability on non-big data sets.

This website uses cookies

This website uses cookies