It is tempting to think of speech perception as a single, perhaps highly specific, processing module defined by a single set of constraints, such as the spatio-temporal resultion of the underlying analysis. We present a series of experiments that exploit a duplex stimulus to support our argument that speech perception requires analysis at different spectro-temporal scales. Auditory scene analysis, which is essential for segregating target speech sounds from competing background noises, requires analysis and processing at a fine spectral and temporal scale to exploit features such as pitch differences between target and competing sounds or small differences in the onset times of the elements making up an auditory scene. It is therefore not surprising that analysis is carried out in a high-resolution representation. Speech pattern matching, on the other hand, requires significant generalisation to allow the acoustical speech signal to be mapped into invariant representations. The pattern matching, for instance, should be independent of the speech pitch and discount fine differences in formant trajectories imposed by coarticulation or speaker differences. We show that frequency modulated sines (chirps) that are presented in the position where normal formant transitions between vowels and nasals would be expected change the speech percept independent of their slope even though the chirps are clearly segregated into a separate (duplex) percept and differences between the chirps can be identified. While our data is consistent with the view that there are specific representations or processing modules for different auditory analysis tasks, we do not feel that our data supports the case for a specific biological module that uses speech gestures as an underlying representation. We argue that the different behaviour is consistent with the different processing requirement of different auditory tasks, specifically that high-resolution processing is necessary for the segregation of speech from background noise while a low-resolution representation is much more suitable for speech pattern matching. We show that frequency modulated sines (chirps) that are presented in the position where normal formant transitions between vowels and nasals would be expected change the speech percept independent of their slope even though the chirps are clearly segregated into a separate (duplex) percept and differences between the chirps can be identified. While our data is consistent with the view that there are specific representations or processing modules for different auditory analysis tasks, we do not feel that our data supports the case for a specific biological module that uses speech gestures as an underlying representation. We argue that the different behaviour is consistent with the different processing requirement of different auditory tasks, specifically that high-resolution processing is necessary for the segregation of speech from background noise while a low-resolution representation is much more suitable for speech pattern matching.