As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Nowadays, successful pause detection plays an important role not only in the process of speech recognition and speech coding but also in the biometrical field for detecting stress in the speaker's emotional state due to uncomfortable situations or in interactive dialog systems for making more natural the human-machine interaction more natural. Most of the recordings exploited in practical applications are made under adverse conditions and few algorithms have been proposed to handle noisy conditions. This paper proposes two methods for non-speech activity pause detection in spontaneous speech recordings made in noisy environments. The input signal is transformed into log spectral energy and is divided into specific frequency bands. Each band is smoothed and tracked by dynamically adjusted thresholds based on noise energy estimation. Thresholds are adapted taking into account the dynamic changes of the speech signal under environmental noise. The proposed methods run in real time and do not require a priori knowledge of the SNR and a priori threshold values. Experimental results show that their performance is comparable with standard VADs.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.