Page  475 ï~~Sound Analysis, Comparison and Modification Based on a Perceptual Model of Timbre Christopher James Langmead Dartmouth College, May 22, 1995 langmead@dartmouth.edu Abstract: A software tool for the analysis, comparison and modification of timbre is presented. The software is an implementation of a perceptual model of timbre. The preprocessor to the software, based on the MQ sinusoidal analysis algorithm, simulates the effects of cochlear analysis including auditory masking. The output of the preprocessor is a time and frequency analysis organized into spectral tracks. The software represents properties of tracks as curves. The current feature set of the model includes spectral onset asynchrony, spectral onset peak asynchrony, spectral onset peak envelope, spectral persistence, amplitude envelope, mean-frequency envelope, spectral envelope, spectral density, and harmonicity. Introduction While no definitive model of timbre exists, experimental evidence from a number of disciplines suggest some constraints in the development of a model. We know for example, that the cochlea decomposes acoustic signals into time-varying spectra [Pickles 1988]. Various detectors (e.g., onset, frequency modulation) and rate encoders (e.g., FM rate) have been found in post-cochlear regions of the auditory system [e.g., Abeles et al, 1972, Mendelson et al, 1985; Schreiner et al, 1986]. Other research has demonstrated the brain's sensitivity to certain types of features like spectral envelope, onset transient properties and formants [e.g., Helmholtz 1954, Saldahna and Corso 1964, Risset 1966, Slawson 1968, Grey 1975, Bregman 1990]. These discoveries suggest that the brain perceives timbre by extracting various features from the post-cochlear representation by means of detectors. Still, the exact set of features is unknown. Recently, a software tool (PAST- Perceptual Analysis Synthesis Tool) has been developed which allows the testing of feature sets in the development of a perceptual model of timbre. The Software The intended input to PAST are isolated instances of single timbres. No attempt is made to segregate sound sources. PAST provides four basic tools; an analysis tool, a timbral comparison tool, a timbral morphing tool and a timbre modification tool. Signal Representation The preprocessor to PAST is a program called Lemur [Fitz 1992]. Lemur performs spectral analysis on soundfiles based on a modification of the MQ sinusoidal analysis algorithm [Quatieri and McAulay 1985]. The output of a Lemur analysis ICMC PROCEEDINGS 1995 475 475

Page  476 ï~~is a spectrograph-like representation organized into spectral tracks. Tracks have an initial phase, frequency and amplitude and change frequency and amplitude over time. Feature Extraction 1 Features are extracted by making various measurements of the Lemur file. All features are represented as curves. The amplitude envelope is a record of the change in amplitude over time. The density envelope is a record of the number of tracks over time. The amplitude and density envelopes are used to calculate the end of the onset transient. The end of the onset is currently defined as the peak amplitude or density in the first 50 ms. of the sound, whichever comes later. Onset Asynchrony refers to the shape created by the start times of partials during the onset. Onset Peak Position Asynchrony refers to the shape created by the peak amplitude times during the onset. Onset Peak Envelope is the shape created by the peak amplitudes during the onset. Harmonicity is a measure of the sound's fit to a harmonic series as a function of time. The current measurement of harmonicity is the scaled mean distance of a given frame to a harmonic template. The next measurement is the change of mean frequency as a function of time. This roughly measures the change in formant regions in the timbre. The spectral persistence measurement is created by sorting all tracks by frequency and creating a morphology based on the length of each track. The final measurement is a spectral envelope. The spectral envelope is calculated by keeping a running average of each spectral bin's energy. Timbre Comparison Two timbres are compared by numerically measuring the similarity of the shapes of corresponding curves. The current similarity measurement is the OCD metric [Polansky 1987]. Essentially, the OCD metric compares how often the two morphologies move in the same direction. Values are scaled to be between [0,1]; higher numbers indicate a larger dissimilarity or distance between two curves. If all nine features are used in timbre comparison, then there are nine separate similarity measurements generated. Since usually what is desired is a single number to represent the total distance between two timbres, these nine numbers must be combined in some way. A "City-block" metric is used to calculate the overall distance. 1The measurements used in extracting these features are part of ongoing research See [Langmead 1995] for more details or contact the author. 476 6ICMC PROCEEDINGS 1995

Page  477 ï~~Modification2 There are two ways to modify a sound using PAST. The first is to morph a feature from one sound onto another sound. The software allows the user to specify a source and a target timbre. The user is then able to select a number of features to be included in the morph. The user is also allowed to specify a time varying transformation index. This index determines to what percentage the target timbre's features are applied to the source timbre. Currently, a linear interpolation function is used to create time varying transformations. The second way to modify a timbre is to draw in a new curve for the feature. The user can edit the curves on the screen. Selected dimensions will be included in the modification. Once again, a time varying transformation index can be specified. The new analysis file can be written to disk and resynthesized with Lemur. Testing the Timbre Model The current feature set has been tested in a number of ways. The numerical similarity measurements have been used to create MDS timbre spaces. The resulting timbres spaces are often logically ordered. The curves themselves have been imported to a ART-2 [Carpenter and Grossberg 1987] self-organizing neural network. The trained network has shown success in timbre recognition. The details of both of these tests are described in [Langmead 1995]. The results of timbre modification have been mixed. Certain feature transformations, like amplitude envelope, produce predictable and logical results. Other transformations, like harmonicity produce sonically interesting, but unpredictable results. Time-varying transformations are also problematic. In general, when a transformation moves completely from the source timbre to the target timbre along some feature, the resulting sound never seems to reach its target. The most likely explanation for this is that the interpolation function is linear. It may be the case that a non-linear function is needed for better results. Conclusions and Future Work The model presented in this paper is a first (small) step towards creating a unified perceptual model of timbre perception. PAST should facilitate the development of such a model. The specific features implemented in the current model are only a few of the possible features that could be measured. The refinement of the feature set and the algorithms used to measure and modify them is the main goal of this research. Such 2The algorithms used in modifying individual features are part of ongoing research See [La.ngmnead 1995] for more details or contact the author I C M C P ROC E E D I N G S 199547 477

Page  478 ï~~refinement might also include the replacing of the FFT-based Lemur analysis with a similarly organized wavelet-based[Grossman and Morlet 1984] analysis. Related goals include the determination of a proper weighting scheme for individual features. It is likely that certain features will prove to be more important than others. Finally, the ultimate goal is the resynthesis of sounds based only on their feature set representation. PAST and a copy of [Langmead 1995] are available on ftp at music.dartmouth.edu or at the author's web page http://music.dartmouth.edu/-langmead1 References Abeles, M. and Goldstein, M. H. 1970. "Response of Single Units in the Primary Auditory Cortex of the cat to tones and to Tone Pairs", Brain Res. 42: pp. 337-352 Braida, L. D. and Durlach, N.J1. 1988. "Peripheral and Central Factors in Intensity Perception", Auditory Function: Neurobiological Bases of Hearing, pp. 559-583. Wiley, New York Bregman, Albert S. 1990 Auditory Scene Analysis. The MIT Press, Cambridge, Massachusetts Carpenter, G. A., and Grossberg, S. 1987 "ART-2: Self Organizing of Stable Category Recognition Codes for Analog Input Patterns" Applied Otpics, 26, 4919-4930 Fitz, K, 1992 "Time and Frequency Scale Modification of Audio Signals Using and Extended Sinusoidal Model" M.S. thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign Grey, J. M. 1975. An Exploration of Musical Timbre. Center for Computer research in Music and Acoustics, Department of Music Report No. STAN-M-2, Stanford University, February 1975 Grossman, A., and J. Morlet. 1984. "Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape", SIAM Journal of Mathematical Analysis 15:723-736 Helmholtz, H. von. 1859. On the Sensation of Tone as a Physiological Basis for the Theory of Music. (Second English edition: Translated by A.J. Ellis, 1885) Reprinted by Dover Publications, 1954 Langmead, C.J. 1995 "A Theoretical Model of Timbre Perception Based on Morphological Representations of Time-Varying Spectra" M.A. thesis, Department of Electro-Acoustic Music, Dartmouth College, Hanover, NH Mendelson, J. R. and Cynader, M. S. 1985. "Sensitivity of Cat Primary Auditory Cortex (A) Neurons to the Direction and Rate of Frequency Modulation. Brain Res. pp. 275-296 Pickles, James 0. 1988. An Introduction to the Physiology of Hearing. Academic Press, London, England Polansky, L. 1987 "Morphological Metrics: An Introduction to a Theory of Formal Distances", Proceedings of the 1987 International Computer Music Conference 197-207 Quatieri, T.F. and McAulay, RJ. 1985, "Speech Analysis/Synthesis Based on a Sinusoidal Representation" Technical Report 693, Lincoln Laboratory, M.I.T. Risset, J.C. 1966 "Computer Study of Trumpet Tones" Murray Hill, NJ.: Bell Telephone Laboratories Saldahna, E., and Corso, J.F., 1964 "Timbre Cues and the Identification of Musical Instruments" Journal of the Acoustical Society of America, 36, 2021-2026 Schreiner, C. E, and Urbas, J.V. 1986, "Representations of Amplitude Modulation in the Auditory Cortex of the Cat. I. The Anterior Auditory Field(AAF)", Hearing Res. 21:227-241 Slawson, A. W. 1968 "Vowel Quality and Musical Timbre as Functions of Spectrum Envelope and Fundamental Frequency" Journal of the Acoustical Society of America, 43, 87-101 478 I C M C PROCEEDINGS 1995