The most recent research about both human-human conversational interaction and human-computer agents conversational interaction is marked by a multimodal perspective. On the one hand this approach underlines the co-occurrence and synergy between different languages and channels, on the other hand it highlights the need for joined and coordinated action between various subjects (attuning and mutual tuning in). In a similar way recent research on human computer interaction points out the need to consider the vocal interaction in a multicomponential perspective, both as a multilayer phenomenon in itself and as one component in wider interactive patterns. As the communicative action is seen with its features of comprehensiveness and multicomponentiality, so the vocal act needs to be seen as a complex event. Research on models aimed at new interfaces analysis, outline the way beyond the distinction traceable in the majority of studies, where conversational action is split up into its factors and analysis focuses on the factors one by one: conversation analysis or content analysis or suprasegmental analysis. The purpose of this chapter is to offer a contribution to the creation of an analysis model that allows for the complexity of the vocal act and for its being-in-context in the interactive flow, so applying the Embodied Conversational Agents (ECAs) qualifying multicomponential focus to the vocal act. Two levels of analysis so emerge: a vertical, morphological analysis, and a horizontal, sequential analysis.
Two kinds of vocal interaction are here examined according to the proposed model, a human face-to-face and an interaction between an ECA and its user.