Page  00000001 Timbre Representation of a Single Musical Instrument Hugo B. de Paula, Mauricio A. Loureiro, Hani C. Yehia CEFALA - Centro de Estudos da Fala, Linguagem Actistica e Artes Universidade Federal de Minas Gerais hugobp,, Abstract In order to map the spectral characteristics of the great variety of sounds a single musical instrument may produce, different notes were performed and sampled in several intensity levels across the whole extension of a clarinet. Amplitude and frequency time- varying curves of partials were measured by Discrete Fourier Transform. A limited set of orthogonal spectral bases was derived by Principal Component Analysis techniques. These bases defined spectral sub- spaces capable of representing all tested sounds and of grouping of them, which were validated by Mean Opinion Score auditory tests of similarity. Subspaces involving larger groups of notes were used to compare the sounds according to the distance metrics of the representation. 1 Introduction Representation of a musical instrument involves the estimation of the physical parameters that contribute to the perception of pitches, intensity levels and timbres of all sounds the instrument is capable of producing. Of these attributes, timbre poses the greatest challenges to the measuring and specifying of the parameters involved in its perception, due its inherently multidimensional nature. Intensity and pitch time-varying levels can be classified according to soft/loud and low/high one- dimensional scales and are hence capable of being quantitatively expressed by the traditional music-notation system. On the other hand, timbre is not so easily scaled. It is perceived by means of the interaction of a variety of static and dynamic properties of sound grouped in a complex set of auditory attributes. Due to the multidimensionality of this attribute, the identification of the contribution of each one of these competitive factors has been the main subject of psychoacoustics researches on timbre perception. The introduction of the notion of "similarity rate" of hearing judgment responses together with Multidimensional Scaling (MDS) techniques allowed the reduction of this multidimensionality and made possible to investigate the complex structure of this attribute, which motivated the first researches on musical timbre upon perceptual data (Plomp 1970; Wessel 1979). In one of the most classic studies on musical timbre, Grey (1975) measured subjective judgment of similarity between pairs of timbres from 16 different musical instruments, submitted them to an MDS and built a three- dimensional Timbre Space, in which multidimensional "timbre values" of different instruments would be positioned according to their similarity/ dissimilarity. Other than mapping geometrically the concept of acoustic similarity, that study also showed the capacity of the method of providing a psychological quantification of a relatively complex structure upon quite simple data: similarity/dissimilarity responses between pairs of distinct timbres. More recent studies were able to relate measurable physical parameters with the dimensions shared by the timbre represented in these spaces, combining quantitative models of perceptual relationships with psychophysical explanations of the identified parameters (Hajda et al. 1997; Misdariis et al. 1998). The possibility of establishing correlations between purely perceptual factors related to timbre and acoustic measurements extracted directly from sound, directed research on musical timbre towards more objective quantitative approaches. For a historical review of the development of research on musical timbre see (McAdams et al. 1995). A technique commonly used in research on musical timbre is Principal Component Analysis (PCA), which also builds multidimensional data representation. However, while MDS representation relate built-in variables in data obtained from similarity judgement, PCA uses mathematical methods to manipulate the variance of measured (acoustics) data. Recent work applying PCA to time- varying amplitude and frequency curves of harmonic components has produced similar results with similar sets of sounds (Cosi, De Poli, and Lauzzana 1994; Sandell and Martens 1995; Charbonneau, Hourdin, and Moussa 1997; De Poli and Prandoni 1997; Rochebois and Charbonneau 1997; Baliello, De Poli, and Nobili 1998; Beauchamp and Horner 1998). The above studies have mostly approached comparisons between isolated notes of different musical instruments outside any musical context. Instead of focusing on the perceptual mechanism that discriminates a musical instrument from another, this study investigates methods to represent acoustic parameters that contribute to the perceptual discrimination within the timbre palette produced Proceedings ICMC 2004

Page  00000002 by a single musical instrument, or even along the extent of a single note. 2 Spectral Parameter Estimation The variety of sonorities produced by the instrument is represented according to spectral sub- spaces generated by parameters that have been extracted from samples of sounds of as many as possible different timbres, performed along the instrument entire pitch range. The timbre set used in this study was limited to the sound palette commonly produced on musical instruments in traditional classical western music performance, excluding sonorities produced on the instrument on the context of other musical traditions, as well as those regularly used in contemporary music known as "extended techniques." In order to facilitate the estimation of spectral parameters, only the sustained part of relatively long sounds was considered. The exclusion of attack, decay and transition between consecutive notes from this analysis, limited the study to slow variation of musical timbre, that commonly happens along longer notes during a musical performance. Due to the dependence of timbre on intensity, the sampling of different timbre patterns of the same note was achieved upon specification of intensity levels. Four different timbres were sampled for each note by asking the player to perform each note in four different intensity levels, with no intentional dynamic variation. Four levels were defined: pianissimo (pp), mezzo- piano (mp), mezzoforte (mJ) and fortissimo (ff). The performer was asked to establish the lowest and highest level limits as softer and louder as possible, respectively, within the range of commonly used timbres on western classical music. Intermediate levels were to be defined by comparison with the lowest and highest limits. Samples were obtained through high quality recordings (sampling rate of 44.1kHz and resolution ofl6 bits/sample) of all notes of the two lowest registers of a B flat clarinet, ranging from D3 (147 Hz) through A5 (880 Hz), played at the four levels of intensity defined above, with an averaged duration of 3 seconds. The amplitude curves of the harmonic components were estimated according to McAulay and Quatieri' s (1986) method. This method searches for maximum amplitude values ("peak detection") of a Fourier Transform and establishes a correspondence between the closest peak values in adjacent analysis frames ("peak continuation"), associating these values to instant frequency and amplitude values of harmonic components (Serra 1997). It was assumed that amplitude and frequency do not vary abruptly along the entire duration of the sound. Amplitude curves were smoothed by a low pass filter with a cut- off frequency of 10 Hz and limited by a threshold of 60 dB below the maximum amplitude value. Despite the limitation to sounds played without intentional variation, spectral fluctuations occur on the sustained part of the sound, which are more accentuated in higher harmonics (see Fig. 1). These fluctuations were considered by using a relatively short time frame length of 1024 samples (23,2 ms) and a frame shift of 512 samples (11,6 ms). 3 Principal Component Spectral Basis The high level of correlation of spectral parameters, presented in both the frequency and time domains, which is a common characteristic of spectral distribution of sounds from musical instruments, allowed an efficient data reduction using Principal Component Analysis (PCA) (Johnson and Wichern 1998). Applied to a set of multidimensional variables, PCA calculates an orthogonal basis determined by the directions of maximum variance of the analyzed data. The projections of the original data on this basis, denominated principal components (PCs), describe trajectories that accumulate the maximum variance of the data in a decreasing order. This allows an approximate representation of the data, using only a reduced number of basis dimensions. The original spectra can be reconstructed by adding the basis, properly scaled by the amplitudes of the corresponding trajectories. 4 Spectral Basis of a Single Note At first, a set of orthogonal spectral bases associated to amplitude envelopes was calculated for each sound. These envelopes, combined with the correspondent basis, were able to render the sound with great precision. After that, spectral sub- spaces were built to represent the spectral distributions of all possible sounds of a single note by calculating a spectral basis using as input data the concatenation of four samples of this note, pp, mp, mf and ff, as defined in Section 2. Samples were normalized in amplitude and duration, with 75 time frames each, equivalent to 870 ms, taken from the center of the note. Table 1 shows the cumulative variance explained by the first five Principal Components in each individual sound of the note Bb3 (233 Hz) compared to the variance obtained when PCA is applied to all executions of this note. The first component explains, alone, no less than 74% of the total variance for every isolated sound, but only 68.7% if PCA is calculated for all four sounds. A reconstruction of 99% is achieved with 3 PCs for every isolated sound, but 5 PCs are needed with the PCA applied to all four sounds. Number of PCs 1 2 3 4 5 Bb3 pp 87.3 99.8 99.9 100 100 Bb3 mp 96.3 99.7 99.9 99.9 100 Bb3 mf 76.1 96.2 99.2 99.6 99.8 Bb3 ff 74.4 97.2 98.6 99.5 99.7 Bb3 pp - mp - mf- 68.7 89.5 94.4 97.2 99.0 Table]1: Cumulated Variance of the first 5 PCs for PCA bases of each isolated sound of Bb3 (233 Hz) and of all four sounds of this note. Proceedings ICMC 2004

Page  00000003 Auditory tests Mean Opinion Score auditory tests of similarity were performed to verify the perceptible loss of this representation. Original notes were compared in pairs to resynthesized sounds using different number of PCs. Participants were asked to rate the similarity between sounds from 1 to 5: (1) no relation to the original sound was identified; (2) synthesized sample was very distorted, but similarity to the original sound was still recognizable; (3) identification of original sound was evident; (4) discrimination was perceived, but no identification was possible; (5) pair was identical. The reconstruction of any tested sound using 5 PCs was rated no less than 4.2, just above "perceived discrimination without identification", with variance of 2%. Previous studies tested the same discrimination within musical contexts (de Paula 2000), in which excerpts of notes extracted from recorded musical performances were resyntesized using different numbers of PCs. With adequate amplitude normalization and 6 ms fade- in and fade- out overlapping, the resynthesized sounds were reinserted into the same recordings and submitted to auditory tests. With 5 PCs, all pairs were rated 5. Upon these results 5 PCs were considered capable of reconstructing the sounds without any perceptible loss of characteristics of timbre, allowing a data reduction rate of 64:1. 5 Spectral basis of a group of notes A set of spectral bases of a group of notes was then calculated by applying PCA to the concatenation of the sounds of each note. At first four contiguous notes were chosen due to the perceptive similarity of their timbre: A3 (220 Hz), Bb3 (233 Hz), B3 (247 Hz) and C4 (262 Hz). The spectral basis thus obtained constitutes a timbre space for these four notes, where each sound occupies a unique position, according to its spectral configuration. A comparison between the amplitude curves of the harmonics of the original sounds and their reconstructions generated from this spectral sub- space shows that the model is effective in the representation of the harmonics of larger amplitude. Fig. 1 compares amplitude curves of the 1st, 3rd, 5th and 7th harmonics of the original Bb3 with their resynthesized versions calculated with 5 PCs of the spectral sub- space of these four contiguous notes (A3 - C4). Sets of spectral bases for larger groups of notes in all four intensity levels were then calculated, in order to represent a larger variety of spectral distributions the instruments is capable of producing. As expected, as the number of involved notes increases, the less efficient the representation becomes. Table 2 shows the cumulated variance explained in the reconstruction of the above mentioned notes (A3 - C4), and the 32 notes (128 sounds) of the entire low register of the instrument, from D3 to A5. B flat 3 forte (original) - 40 30 E < 20 10 0 60 50 S40, 30 E 20 10 1.5 Time (s) Reconstruction with 5 PC l- Harmonic - 3rd Harmonic - 5th Harmonic.... 7th Harmonic.---- - --- --...... [r. -,, -** ir< i i i i i 0 0.5 1 1.5 Time (s) 2.5 Fig 1: 1st, 3rd, 5th and 7th harmonics of the original Bb3ff (top) and its resynthesized version (bottom) using the spectral sub- space of the notes A3 (220 Hz), Bb3 (233 Hz), B3 (246.9 Hz) and C4 (261.6 Hz) calculated with 5 principal components. Number of PCs 1 2 3 4 5 A3 - C4 56.4 80.4 90.3 94.2 96.9 low register D3 - Ab4 59.2 77.9 87.2 92.8 96.7 Table 2: Cumulated Variance of the first 5 PCs for 2 different PCA bases: A3 (220 Hz) - C4 (262 Hz); the entire low register, D3 (147 Hz) - Ab4 (415 Hz). The construction of spectral sub- spaces involving all possible sounds produced by the instrument made possible a compact representation of the whole timbre palette of the instrument, allowing its classification according to the distance metrics of the PCA timbre space by clustering analysis techniques. Fig. 2 shows a cluster of 36 sounds in that space, which includes the four notes exemplified above. Note that intensity level differentiation spread the sound more strongly than the pitch. The 1st PC is directly related to intensity level. Lower pitched sounds tend to be grouped together in regions of higher values of 2nd and 3rd PCs, while higher pitched sounds present lower values of these PCs. This relation shall be further analyzed with clustering techniques. Proceedings ICMC 2004

Page  00000004 Clarinet's Timbre sub-space U20, -40. -60, -80 80 B3 3Ep r.Em.i C w c4 E3mp I...............:: *........... Bb mi c, MP 3:.Bb I G 1. Bb3mp.A mp 3...B. B3 mp Ab4.oc#mp p. Ab4mP S C4. "" Ab4p -C#p p 3P GGml 5G. Gp. A3m. 40 G5mp 100 50 -20 2nd PC -40 -150 -100 1st PC Figure 2: The Clarinet timbre space constructed using 128 samples (36 notes). Only 8 notes are represented in the plot for better clarity. The lettersp, mp, mf and frepresent piano, mezzo- piano, mezzo- forte and forte respectively. 7 Conclusion Focused on the timbre of a single instrument, this study investigates methods for representing the variety of sonorities produced by one single musical instrument, sharing the same questions raised by recent researches that investigate musical performance starting from objective measurements of the different acoustics parameters that contribute to the conveyance and perception of musical expressiveness. Auditory tests of discrimination showed the effectiveness of the PCA representation model. Contiguous notes presented individual spectral bases with similar characteristics and were mapped to closer sub- spaces. This similarity allowed expansion of the size of these subspaces, facilitating representation of larger groups of notes. These expanded sub- spaces were capable of reconstructing the sounds with their dynamic variation of levels of intensity, allowing classification of the timbre palette of the instrument, making possible descriptive comparison of the dynamic variation of the timbres. Such spatial representation of timbre trajectories may serve as a useful tool for visual monitoring of timbre differentiation in musical performance as well as music teaching settings. 8 Acknowledgments This work was supported in part by CAPES (Brazilian Higher Education funding agency), and by CNPq (National Council for Scientific and Technological Development), Brazil. References Baliello, S., G. De Poli, and R. Nobili. (1998). The Color of Music: Spectral Characterization of Musical Sounds Filtered by a Cochlear Model. Journal of New Music Research 27 (4). Beauchamp, J. W., and A. Homer. (1998). Spectral Modelling and Timbre Hybridisation Programs for Computer Music. Organised Sound 2 (3):253- 258. Charbonneau, G., C. Hourdin, and T. Moussa. (1997). A Multidimensional Scaling Analysis of Musical Instrument's Time- Varying Spectra. Computer Music Journal 21 (2):40 -55. Cosi, P., G. De Poli, and G. Lauzzana. (1994). Auditory Modelling and Sef- Organizing Neural Networks for Timbre Classification. Journal of New Music Research 23:71- 98. de Paula, H. B. (2000). Analise e Re- sfntese de Som Natural de Clarineta Utilizando Andlise por Componentes Principais. Master Dissertation, Depatament of Electrical Engineering, Uiversidade Federal de Minas Gerais, Belo Horizonte. De Poli, G., and P. Prandoni. (1997). Sonological Models for Timbre Characterization. Journal of New Music Research 26:170- 197. Grey, J. M. (1975). An exploration of musical timbre. Thesis- - Stanford University, Center for Computer Research in Music and Acoustics, Dept. of Music Stanford University, Stanford, Calif. Hajda, J. M., R. A. Kendall, E. C. Carterette, and M. L. Harshberger. (1997). Methodological Issues in Timbre Research. In Perception and Cognition of Music, edited by I. Deliege and J. Sloboda. Hove: Psychology Press. Johnson, R., and D. W. Wichern. (1998). Applied Multivariate Statistical Analysis. Upper Sadlle, New Jersey. McAdams, S., S. Winsberg, S. Donnadieu, G. De Soete, and J. Krimphoff. (1995). Perceptual Scaling of Synthesized Musical Timbres: Common Dimensions, Specificities and Latent Subject Classes. Psychological Research 58:177- 192. McAulay, R. J., and T. F. Quatieri. (1986). Speech Analysis/Synthesis Based on a Sinusoidal Representation. IEEE Transactions on Acoustics, Speech, and Signal Processing 34 (4):744- 754. Misdariis, N. R., B. K. Smith, D. Pressnitzer, P. Susini, and S. McAdams. (1998). Validation of a Multidimensional Distance Model for Perceptual Dissimilarities Among Musical Timbres. Paper Ndo at Proceediongs of the 16th International Congress on Acoustics, at Woodbury, New York. Plomp, R. (1970). Timbre as a Multidimensional Attribute of Complex Tones. In Frequency Analysis and Periodicity Detection in Hearing, edited by R. Plomb and G. F. Smoorenburg. Leiden: A. W. Sijthoff. Rochebois, T., and G. Charbonneau. (1997). Cross- Synthesis Using Interverted Principal Harmonic Sub- Spaces. InMusic, Gestalt and Computing: Studies in Cognitive and Systematic Musicology, edited by M. Leman. Berlin- Heidelberg: Springer Verlag. Sandell, G. J., and W. Martens. (1995). Perceptual Evaluation of Principal- Component- Based Synthesis of Musical Timbres. Journal of The Audio Engineering Society 43 (12):1013- 1028. Serra, X. (1997). Musical Sound Modeling with Sinusoids plus Noise. In Musical Signal Processing, edited by A. Piccialli, C. Roads and S. T. Pope: Swets & Zeitlinger Publishers. Wessel, D. L. (1979). Timbre Space as a Musical Control Structure. In Foundations of Computer Music, edited by C. Roads and J. Strawn. Cambridge, Massachusetts: MIT Press. Proceedings ICMC 2004