Back to Speech Science – Towards a Collaborative ASR Community of the 21&lt;sup&gt;st&lt;/sup&gt; Century

Lee, Chin-Hui

Abstract

We present an historic perspective on the development of automatic speech recognition (ASR) technologies and discuss the role speech science played in the past and would likely to assume in the future. First we introduce the prevailing data-driven, pattern recognition approach to ASR. Then we show that some speech knowledge sources could be integrated into ASR to enhance the capabilities and overcome many of the limitations of current ASR systems. In order to promote a wide applicability of knowledge integration, we need to address the following four major issues, namely: (1) the need of an ASR paradigm that facilitates an easy knowledge integration; (2) an objective evaluation methodology that allows quality and robustness assessment of existing and development of new knowledge sources; (3) the necessity of enhancing ASR capabilities over the state-of-the-art systems; and (4) an open, plug-‘n’-play software development and common evaluation platform to lower ASR entry barriers and promote research collaboration. Finally, to circumvent the above difficulties, we propose a new paradigm that combines data- and knowledge-driven approaches to ASR. Under the new framework we expect researchers from all diverse areas in speech production, perception, analysis, coding, synthesis and recognition could work collaboratively towards establishing an ASR Community of the 21st Century.

This website uses cookies

This website uses cookies