Abstract
Speech, music and other complex sounds are usually characterized by their pitch, timbre, loudness, forms of modulation, and onset/offset instants. These descriptions of sound quality have a close relationship to the instantaneous spectral properties of the sound waves. Physiological, psychoacoustical and computational studies reveal that the central auditory system has developed elegant mechanisms to extract and represent this spectro-temporal information. For instance, the primary auditory cortex (AI) employs a multiscale representation in which the dynamic spectrum is repeatedly represented in AI at various degrees of spectral and temporal resolution. This is accomplished by cells whose responses are selective to a range of spectro-temporal parameters such as the local bandwidth and asymmetry of spectral peaks, and their onset and offset transition rates. In this article, we review the experimental methods developed to investigate and interpret these responses. These include, in particular, the use of rippled spectrum stimuli (the acoustic analogue of visual gratings) to characterize the spectral and dynamic properties of auditory response fields, and the derivation of computational methods to predict cortical cell responses to arbitrary complex spectra. Parallels between auditory and visual analysis of sensory inputs are also discussed, together with possible applications of these findings to sound analysis and recognition systems.