An MPEG-7 Architecture with a Blind Signal Processing Based Front-End for Spoken Document Retrieval

Iacchelli, Fausto; Tummarello, Giovanni; Squartini, Stefano; Piazza, Francesco

Abstract

Metadata extracted from Multimedia or live sensoring is set to play a major role in any intelligent and multimodal interactions between humans and computers. Furthermore, it is generally required that such metadata is structured and encoded according to well agreed standards. This is fundamental to enable interoperability and create complex applications as a mesh of heterogeneous services and components. On purpose, the MPEG-7 standard for dealing with multimedia metadata and the tools developed within the Semantic Web initiative are providing today the basic framework. Their application to real world problems, however, is made problematic by the fact that the data are often captured from difficult live conditions. It is therefore of primary importance to enhance the quality of the observable signals before the metadata extraction algorithms are employed. In particular, for the case of audio signals, it is important to perform separation and deconvolution of audio signals captured in real environments and in blind conditions. In this work a real world multimedia metadata assisted living scenario is addressed using a combination of Blind Signal Processing and MPEG-7 based metadata techniques. In such example, an array of microphones captures speech signals and thanks to MPEG-7 technologies the user can select multimedia content to be played.

This website uses cookies

This website uses cookies