Speaker attribution and labeling of single channel, multi speaker audio files is an area of active research, since the underlying problems have not been solved satisfactorily yet. This especially holds true for non-standard voices and speech, such as children and impaired speakers. Being able to perform speaker labelling of pathological speech would potentially enable the development of computer assisted diagnosis and treatment systems and is thus a desirable research goal. In this manuscript we investigate on the applicability of embeddings of audio signals, in the form of time and frequency-band based segments, into arbitrary vector spaces on diarization of pathological speech. We focus on modifying an existing embedding estimator such that it can be used for diarization. This is mainly done via clustering the time and frequency band dependant vectors and subsequently performing a majority vote procedure on all frequency dependent vectors of the same time segment to assign a speaker label. The result is evaluated on recordings of interviews of aphasia patients and language therapists. We demonstrate general applicability, with error rates that are close to what has been previously achieved in diarizing children's speech. Additionally, we propose to enhance the processing pipelines with smoothing and a more sophisticated, energy based, voting scheme.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com