Speech is generally considered to be an efficient way of communication between humans, and will hopefully play that same role in the future for communication between humans and machines as well. This efficiency in communication is achieved via a balancing act in which at least the following elements are involved: (1) the lexical and grammatical structure of the message, (2) the way this message is articulated, leading to a dynamic acoustic signal, (3) the characteristics of the communication channel between speaker and listener, and (4) the way this speech signal is perceived and interpreted by the listener. This chapter concentrates on the dynamic spectro-temporal characteristics of natural speech and on the way such natural speech, or simplified speech-like, signals are perceived. Dynamic speech signal characteristics are both studied in carefully designed test sentences as well as in large, annotated and searchable, speech corpora with a variety of speech. From actual spectro-temporal measurements we try to model vowel and consonant reduction, coarticulation, effects of word stress and speaking rate on formant contours, contextual durational variability, prominence, etc. The more speech-like the signal is (on a continuum from a tone sweep to a multi-formant /ba/-like stimulus) the less sensitive listeners appear to be to dynamic speech characteristics such as formant transitions (in terms of just noticeable differences). It also became clear that the (local and wider) context in which speech fragments and speech-like stimuli are presented, plays an important role on the performance of the listeners. Likewise does the actual task for the listener (be it same-different paired comparison, ABX discrimination (X being either A or B), or phoneme or word identification) substantially influence his/her performance.