Page  00000001 SYNTHESIS AND PERCEPTUAL MANIPULATION OF PERCUSSIVE SOUNDS Mitsuko Aramaki & Richard Kronland-Martinet & Thierry Voinier & Solvi Ystad CNRS-Laboratoire de Mecanique et d'Acoustique 31, chemin Joseph Aiguier 13402 Marseille Cedex 20 France ABSTRACT In this article we describe the design and the use of a hybrid synthesis model simulating most of percussive sounds. This model, based on a combination of physical and perceptual considerations, is implemented in real-time and mainly controlled by an electronic drum interface. Thanks to the relevance of the synthesis parameters obtained from the analysis of natural sounds, it is possible to simulate different geometric structures, materials and excitation types by varying control parameters related for instance to the inharmonicity, the spectral richness or the damping of vibrating modes. In addition, an original approach of the tuning of percussive sounds, based on chord generation theory, is proposed. Due to the high number of parameters, mapping strategies have to be developed to pilot this model. Some possible strategies are then discussed although we designed the user interface so that performers can choose their own mapping strategy. 1. INTRODUCTION Synthesis of impact sounds has already been addressed by several authors. Most of the proposed models are based on the physics of vibrating structures, like for example the so-called modal synthesis [4] [1] [5]. Nevertheless, modal synthesis is not always suitable for complex sounds like sounds containing a high density of mixed modes. Other approaches have also been proposed using algorithmic techniques based on digital signal processing. Cook [4] proposed for example a granular synthesis approach based on a wavelet decomposition of sounds. The sound synthesis model proposed here takes into account both physical and perceptual aspects related to the sounds. The physical parameters are extracted through a time-scale analysis of natural sounds. The analysis is not the scope of this paper, and we refer the reader to a more theoretical paper for more details on this method [2]. Due to the short duration of the sound, the spectral content of impact sounds generally is broadband. This effect increases for complex structures because of the high density of modes together with their fast damping. This spectral behavior led to the design of a model based on a subtractive synthesis approach. Many subjective tests have shown the existence of perceptual clues allowing the source of the impact sound to be identified merely by listening [7] [9]. Moreover, these tests have brought to the fore some correlations between physical attributes (the nature of the material, dimensions of the structure) and perceptual attributes (perceived material, perceived dimensions). Hence it has been shown that the perception of the material mainly correlates with the damping coefficient of the spectral components contained in the sound. This damping is frequency dependent and the high frequency modes are generally more heavily damped than low frequency modes. To take into account this fundamental sound behavior from a synthesis point of view, a "time-varying" filtering technique has been chosen. What the size and the shape of the vibrating object is concerned, it is well known that the object attributes mainly are perceived by the pitch of the generated sound together with the richness of the corresponding spectrum. The perception of the pitch mainly correlates with the vibrating modes [3]. For complex structures, the modal density generally increases with the frequency, so that high frequency modes overlap and become indiscernible. Under such a condition, the human ear determines the pitch of the sound from emergent spectral components with consistent frequency ratios. When a complex percussive sound contains several harmonic or inharmonic series, different pitches can generally be heard. The dominant pitch then depends on the frequencies and the amplitudes of the spectral components due to a so-called dominant frequency region [10] through which the ear is pitch sensitive (see section 3 for more details). To take into account this aspect of the perception of impact sounds, we have chosen to use the theory of musical harmony as a basis for the tuning of the complex sounds. This means that sounds with a large number of spectral components are constructed like musical chords where the root of the chord, its type (major or minor) as well as its inversions can be chosen by the performer. The large number of parameters available through such a model necessitates a control strategy. This strategy (generally called mapping) is of great importance for the expressive capabilities of the instrument, and inevitably influences the way it can be used in a musical context [6]. In this paper we mention some examples of possible strategies. However, because of the strong influence between mapping and composition, the choice of the strategy should be available to the composer.

Page  00000002 OBJECT MATERIAL EXCITATION Emergent BP modes: Filter: Stochastic process Figure 1. Sound synthesis model. 2. THEORETICAL SYNTHESIS MODEL The synthesis model we propose, presented in Figure 1, is an extension of the one proposed in [8], in the context of simulating the effect produced on piano tones by the soundboard of the instrument. This model is based on a time-varying subtractive synthesis process that acts on a noisy input signal. This sound synthesis model reproduces two main contributions characterizing the perceived material and the perceptual dimensions of the structure. We decided to model these two contributions separately (material and structure dimensions) even if they can not be totally disconnected from a physical point of view. Actually, we believe this separation to give an easier and more intuitive way of controlling the sounds. Another important aspect of the model is the re-synthesis of natural sounds, meaning that one also can reproduce a given impact sound from a perceptual point of view. This can be done by extracting parameters of the synthesis model from the analysis of natural sounds [2]. Modeling the material contribution The simulation of the damping, characteristic of the perceived material, is made by a filtering process where the coefficients vary with time. Nevertheless, it is assumed that this variation is small enough for the filter to be considered stationary in a small time interval. The input signal is chosen to be a stochastic process (generally a white noise) which generates a broadband spectrum from an energetic point of view. The "time-varying filter" controls the frequencydependent damping of the sounds, making higher components decrease faster than lower ones. The filter used for the model generally is lowpass with a gain and a cutoff frequency that decrease with time. At this stage, the synthesized sounds already contain the main perceptual features of an impacted material. In particular, the sounds present some wooden characteristics when the damping is strong, while they present some metallic characteristics when the damping is weak. At this stage, this "material" model is adequate to reproduce perceptual effects corresponding to impacted materials, even though the technique does not simulate modes. Modeling the "vibrating" object The addition of few spectral components to the initial white noise improves the sounds by providing a subjective notion of the size and shape of the sounding object. These spectral components mainly correspond to the most prominent modes. They can simply be generated by adding a sum of sinusoids to the white noise input signal. Note that another method to generate resonances, based on physical modeling, namely the banded digital waveguide approach, has been proposed in [2]. Modeling the excitator To control the bandwidth of the generated spectrum, we used a bandpass filter. The response of this filter is strongly related to both the strength of the impact and the characteristics of the collision between the excitator and the resonator. 3. TUNING In this section we discuss the problem of tuning the pitch of the impact sounds composed of multiple components. Even though we want to design an intuitive tool for musicians rather than a complete impact sound tuning system, pitch tuning is not a trivial task. Actually, complex sounds often evoke several spectral pitches like when listening to a musical chord. This is due to the fact that our hearing system tends to associate spectral components having consistent frequency ratios. Moreover, the perceived pitch of a series of spectral components, either harmonic or inharmonic, is not necessarily given by the frequency of the first component of the series. As Terhardt explains [10], complex tones elicit both spectral and virtual pitches. Spectral pitches correspond to the frequency of spectral peaks contained in the sound spectrum, while virtual pitches are deduced by the auditory system from the upper partials in the Fourier spectrum, leading to pitches which don't correspond to any spectral component. A well known example is the auditory generation of the missing fundamental of a harmonic series of pure tones. In addition, due to the presence of a dominant frequency region situated around 700 Hz through which the ear is pitch sensitive, the perceived pitch depends on both the frequencies and the amplitudes of the spectral components. Hence, the pitch of complex tones with low fundamental frequencies (under 500 Hz) depends on higher partials, while the pitch of tones with high fundamental frequencies is rather determined by the fundamental frequency since it lies in the dominance region. With all these aspects of pitch perception in mind, we have decided to base our tuning model on chord generation theory. Even though the aim is to be able to play with the tuning of the synthesis model in a musical context, we also would like to use such a tool in future research to study the relationship between pitch perception and the spectral components of complex sounds. The theory of harmony represents an interesting basis for tuning the sounds. Moreover, since it has been designed to represent the organization of tones in music, we believe that it probably includes fundamental principles of the perception of pitches. As a start, we have developed a basic tool which will eas ily be extended in the future. Hence a complex tone can be constructed and generated like musical chords by directly acting on its spectral structure. This makes it pos

Page  00000003 Figure 3. Note generator. OBJECT EXCITATION MATERIAL DYNAMICS Figure 2. Real-time implementation of the model. The note generator element is described in Figure 3. sible to compare the pitch perception of a complex sound that constitutes a specific chord with the pitch perception of the chord from a traditional musical instrument such as a piano or a guitar. The amplitudes of the spectral components together with the degree of inharmonicity of the spectral sequences makes it possible to play with the perceived pitch as a function of the dominance region. In the next sections, we shall see how the real-time model is implemented and adapted to give the user access to these parameters. 4. REAL-TIME IMPLEMENTATION The real-time implementation using MAX/MSP software, is based on the structure of the synthesis model: the "object" element, devoted to the simulation of the emergent modes, the "material" element, simulating the damping of the sounds, and the "excitation" element (Figure 2). A gaussian white noise generator simulates the broadband spectrum. As discussed in the previous section, the emergent modes follow the principle of the theory of harmony to generate the desired complex tone. Chords are composed of four notes, which implies the use of four separate note generators, each of which are based on additive synthesis of ten components. Nevertheless, the approach suffers from a lack of correlation between the stochastic part of the sound (generated by the white noise) and the determinist part (generated by the sinusoids), making the sounds unrealistic. To overcome this drawback, we further generate the spectral peaks with narrow bands of the initial white noise, making the sound better correlated. Hence, a more or less fuzzy pitch can be obtained by acting on each part's relative gain (Figure 3). To simulate the damping, we have controlled the evolution of the spectrum through 24 frequency bands, corresponding to the first critical bands of hearing, known as the Bark bands. A time-varying gain allows damping control for each Bark band (envelope generator). The last element of the model is the excitation filter, for which a bandpass filter generally is sufficient. As they are linear elements, the "time-varying filter" and the exci tation filter are permuted to simplify the implementation. Hence, each Bark band filter is successively followed by a static gain adjustement (for the excitation tuning) and a time-varying gain (for the damping). The sum of the outputs of the "time-varying filters" further goes into a dynamic filter, which is a one pole lowpass filter. The cutoff frequency of this lowpass filter depends on the force sensed by the trigger control interface (the MIDI velocity). In this way, we can imitate the well-known non-linear effect that leads to an increase in the spectral width as a function of the force of the impact. Even if it is possible to control each parameter independently, the number of parameters is obviously too high for a musical application. A higher control level on these parameters is required and will be described in the next section. As this synthesis model simulates percussive sounds, a drum interface is a natural choice for piloting the model. 5. SYNTHESIS PARAMETER CONTROL As presented in the previous section, the number of parameters is high and mapping strategies have to be used to control them more intuitively. Control of the oscillator bank parameters Concerning the oscillator bank parameters, we have to control eighty parameters (forty frequencies and forty amplitude values). Each series of ten components (dedicated to one oscillator bank) are defined by a fundamental frequency (fundamental mode) and an inharmonicity law which fixes the relationship with the nine other spectral components. Finally, all the forty components are defined by four fundamental frequencies and an inharmonicity law which is determined by drawing a curve representing the frequency ratio fk/fo of each spectral component in the series as a function of the fundamental frequency or by adjusting the parameters of the following predefined presets: harmonic, linear or piano-string inharmonicity. For example, when the signal is harmonic, the spectral components are integer multiples of the fundamental frequency and the inharmonicity curve is given by a straight line where the frequency of the k'th component equals k times the frequency of the fundamental component (ratio= k). To generate an inharmonic sound, the inharmonicity can either be chosen individually for each frequency component or globally. The four fundamental frequencies influencing the global

Page  00000004 pitch are defined using the chord generator (cf. section 3). In practice, the player can choose whether she/he wants to construct the complex sound with one single pitch or with several pitches forming a specific chord. When one single pitch is chosen, the player selects the fundamental note and the spectrum of the sound consists in one harmonic or inharmonic series of ten spectral components. When the chord case is chosen, the player selects the root of the chord, the type (major or minor), the harmonization (4th, 7th, 9th, diminished,...) and finally the inversion (four possible inversions). Thus, the spectrum of the generated sound consists in four harmonic or inharmonic series of ten spectral components each. The user can also act on the amplitude of the spectral components. They are chosen by drawing a global, continuous curve corresponding to shaping the spectrum or by predefined presets. The amplitudes of the spectral components together with the degree of inharmonicity of the spectral sequences makes it possible to play with the perceived pitch as a function of the dominance region. Control of the damping parameters The twenty-four values characteristics of the damping can be mapped by graphic method (determination of the twenty-four values on Bark scale). Otherwise, they can be parameterized by a damping law a(w) which can be written: Ca(w) = e(al+a2) (1) interface associated to additional controllers. The adjustment of the model's parameters is however often difficult and necessitates the development of a mapping strategy. We have chosen to leave the choice of the mapping strategy open to the composer, since it strongly influences the music that is to be written for this instrument. The work presented here is part of a more ambitious project aiming at constructing an analysis-synthesis platform. Actually, analysis-synthesis techniques allowing to re-synthesize a given impact sound have already been designed [2]. The association of non-linear analysis-synthesis processes will allow the re-synthesis of sounds generated by source-resonance systems, while perceptual and cognitive approaches will be proposed to study the influence of each parameter on the listeners. 7. REFERENCES [1] Adrien, J.M., 1991, "The Missing Link: Modal Synthesis", Representations of Musical Signals, MIT Press, pp. 269-297. [2] Aramaki, M. and Kronland-Martinet, R. 2005. "Analysis-Synthesis of Impact Sounds by RealTime Dynamic Filtering", IEEE Transactions on Speech and Audio Processing, to be published. [3] Carello, C., Anderson, K. L. and Kunkler-Peck, A. J. 1998. "Perception of Object Lenght by Sound", Psychological Science, vol. 9(3), pp. 211-214. [4] Cook, P. R. 2002. Real Sound Synthesis for Interactive Applications, A. K Peters Ltd., Natick, Massachusetts. [5] van den Doel, K., Kry, P. G. and Pai, D. K. 2001. "FoleyAutomatic: Physically-Based Sound Effects for Interactive Simulation and Animation", Proc. ofSIGGRAPH2001, pp. 537-544, Comp. Graphics Proc., Annual Conf. Series, August. [6] Gobin, P., Kronland-Martinet, R., Lagesse, G.A., Voinier, Th. and Ystad, S. 2004, "Designing Musical Interfaces With Composition in Mind", Lecture Notes in Computer Science LNCS 2771, pp. 225 -246, Springer Verlag. [7] Klatzky, R. L., Pai, D. K. and Krotkov, E. P. 2000. "Perception of Material From Contact Sounds", Presence, 9(4), MIT Press, pp.399-410. [8] Smith, J. 0. and van Duyne, S. A. 1995. "Commuted Piano Synthesis", Proc. of the ICMC, Banff, Canada, pp. 319-326. [9] Tucker, S. and Brown, G. J. 2002. "Investigating the Perception of the Size, Shape and Material of Damped and Free Vibrating Plates", Univ of Sheffield, Dpt. of Comp. Science, England, Technical Report CS-02-10. [10] Terhardt E., Stoll G. and Seewann M., 1982. "Pitch of Complex Signals According to Virtual Pitch Theory: Tests, Examples, and Predictions". J Acoust. Soc. Am., vol. 71, pp. 671-678. This damping law is a function of the frequency and can be directly estimated from physical considerations. We chose to model it by an exponential function so that the damping control is reduced to only 2 parameters: {ai, a2}. In this section, we have proposed some specific controls of the synthesis parameters. The parameters of the "object" element are controlled by a chord generator which is defined by four note values and an inharmonicity law. The control of the "time-varying filters" is reduced to 2 parameters characterizing the damping law. As the damping values are strongly characteristic of the nature of the perceived material, the bidimensional space defined by {a1, a2} can be considered as a "material space" where some specific zones could be representative of different materials. 6. CONCLUSION In this paper we have presented an efficient hybrid synthesis technique for impact sounds. The model, based on both physical and perceptual considerations, only takes into account two basic timbre descriptors correlated to the damping and the emergent vibrating modes. This is a first step toward a better understanding of the nature of sounds, and especially toward minimalist description of timbre. Thanks to a real-time implementation of the model, we have been able to experience its accuracy, allowing the generation of a wide variety of impact sounds. The system has been used in a musical context by a drum like MIDI