As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Speaker attribution and labeling of single channel, multi speaker audio files is an area of active research, since the underlying problems have not been solved satisfactorily yet. This especially holds true for non-standard voices and speech, such as children and impaired speakers. Being able to perform speaker labelling of pathological speech would potentially enable the development of computer assisted diagnosis and treatment systems and is thus a desirable research goal. In this manuscript we investigate on the applicability of embeddings of audio signals, in the form of time and frequency-band based segments, into arbitrary vector spaces on diarization of pathological speech. We focus on modifying an existing embedding estimator such that it can be used for diarization. This is mainly done via clustering the time and frequency band dependant vectors and subsequently performing a majority vote procedure on all frequency dependent vectors of the same time segment to assign a speaker label. The result is evaluated on recordings of interviews of aphasia patients and language therapists. We demonstrate general applicability, with error rates that are close to what has been previously achieved in diarizing children's speech. Additionally, we propose to enhance the processing pipelines with smoothing and a more sophisticated, energy based, voting scheme.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.