An Emotion Estimation from Human Speech Using Speech Recognition and Speech Synthesize

Kurematsu, Masaki; Ohashi, Marina; Kinosita, Orimi; Hakura, Jun; Fujita, Hamido

doi:10.3233/978-1-58603-916-5-278

Abstract

To enhance estimation of emotion in speech, we propose three new approaches. First approach is that we use more synthetic speeches than our previous work. We define emotion in these speech based on human evaluation and use these speech data to make classifiers. Second approach is that we add some statistics values to our previous approach. Additional statistics values are quartile, range, interquartile range, the upper and lower half of interquartile range and the coefficient of the regression formula. We assume that these values show new viewpoints about speech features. Third approach is that we use phonemic features and syllabic features to estimate emotion in speech. In this paper, phonemic feature is a feature gotten from each phoneme in a speech by frequency analysis. Syllabic feature is a feature gotten from each syllable in a speech by frequency analysis. We use speech recognition to get phonemes and get syllables from a speech based on phonemes. Experimental result shows phonemic features and syllabic features are more useful than using the fundamental frequency and power to estimate anger, disgust fear and sad. The result also says that additional statistics values hardly contribute to estimate emotion. We need to analysis classifiers to evaluate contribution of these statistics. We have some future works. First work is that we use the frequency and power with phonemic features and syllabic features. Second work is that we modify our approach based on the analysis result of our experiment. Third work is that we use our approach in real-time.

This website uses cookies

This website uses cookies