Nowadays, successful pause detection plays an important role not only in the process of speech recognition and speech coding but also in the biometrical field for detecting stress in the speaker's emotional state due to uncomfortable situations or in interactive dialog systems for making more natural the human-machine interaction more natural. Most of the recordings exploited in practical applications are made under adverse conditions and few algorithms have been proposed to handle noisy conditions. This paper proposes two methods for non-speech activity pause detection in spontaneous speech recordings made in noisy environments. The input signal is transformed into log spectral energy and is divided into specific frequency bands. Each band is smoothed and tracked by dynamically adjusted thresholds based on noise energy estimation. Thresholds are adapted taking into account the dynamic changes of the speech signal under environmental noise. The proposed methods run in real time and do not require a priori knowledge of the SNR and a priori threshold values. Experimental results show that their performance is comparable with standard VADs.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com