Page  00000112 Sound Modeling of Transient and Sustained Musical Sounds Ystad S0lvi, Guillemain Philippe and Kronland-Martinet Richard CNRS - Laboratoire de M6canique et d'Acoustique 31, Chemin Joseph Aiguier 13402 Marseille France Abstract This paper describes a general synthesis model of sounds based on the combination of physical and signal models. This model allows the synthesis of both transient and sustained sounds and at the same time, takes into account both physical and perceptive criteria. Moreover, it can be used to resynthesize natural sounds by following a certain procedure described in this paper. I - Introduction In this paper we give the "recipe" of the step by step construction of a sound model adapted to "sourceresonant" instruments. The model consists in two main parts:: the "excitator", represented by a signal model and the "resonator", represented by a waveguide model. The combination of these models leads to a very general synthesis process which is well adapted to transient and sustained sounds generated by most musical intruments. Moreover, one can use this model to resynthesize a given natural sound or to simulate a virtual instrument. The theoretical linear wave equation in the medium -the vibrating material which constitutes the instrument- is a good starting point to construct the resonator part of the "instrument". Actually, the solution gives an indication of the theoretical behavior of the resonant modes (frequency, damping factor). We have calculated the response in one point of a one dimensional mechanical and/or acoustical system excited by an impulse. To compare the theoretical results with the real case, the sound from the instrument excited by an impulse should then be recorded and analyzed. Time-frequency analysis should be used for this purpose. The theoretical knowledge of the behavior of the modes makes it possible to construct a matched analysis method for extracting the amplitude and frequency modulation laws of each component of the signal. The experimental values of the modes can then be used to construct a waveguide model simulating the medium in which the waves propagate. For sustained sounds the source is extracted from the signal by deconvolution. Then it is divided into a deterministic and a stochastic part by an adaptive filtering method (LMS). These parts are then modeled independently. The evolution of the deterministic part of the signal as a function of the dynamic level of the sound often indicates a nonlinear behavior. This part is therefore modeled using waveshaping synthesis methods. The Tristimulus criterion makes it possible to find out how the index of distorsion should be varied to get a spectral evolution of the synthetic signal which, from a perceptive point of view, is similar to that of the real signal. A relation between the control from the player -for example the input pressure for wind instruments- and the index of distorsion can then be found. The stochastic part of the source signal is modeled by constructing filters which give the same power spectral density and the same probability density of the real and the synthetic signals. Finally, the waveguide model and the signal model can be associated to make a very general sound simulator which can rather easily be piloted by an interface indicating the note played (fundamental frequency) and the control of the play (air pressure, pressure of the bow...). This hybrid model has been used to resynthesize guitar and flute sounds, showing its accuracy to generate both sustained and transient musical sounds. II - The hybrid model To elaborate a sound synthesis model of a musical instrument one has to consider its resonator which constitutes the vibrating part of the instrument (air beam, vibrating plate or strings etc...) and its excitator (hammer, air jet,...) with which the player creates the sound. The resonator generally is a bounded medium in which a system of stationary waves can take place. It is excited by the excitator system which also plays the role of an energy generator for sustained sounds. The waves generated this way create partials which make the richness of the sound. To be complete, we could also consider a third part, the role of which is to adapt the impedance between the propagation medium and the air and which radiates the energy. However, we shall suppose that up to a linear filtering taking into account this last part, it will be sufficient to suppose that our model constitutes only two elements, namely the resonator and the excitator. From a physical point of view the excitator and the resonator can generally not be separated, since they are interacting. It is however convenient and often -112 - ICMC Proceedings 1999

Page  00000113 sufficient to assume that they are independent. As a consequence, source-resonant models can simulate a great number of instruments. A natural model for the resonator is the so-called waveguide model. Such a choice is easy to justify: its definition incorporates filters which directly describe the propagation in a bounded medium. Such a model also allows the construction of interfaces closely related to real instruments. What the excitator is concerned, it has to take into account non-linear phenomena sometimes due to the interaction between the excitator and the resonator -like the interaction between string and bow or hammer-. These phenomena give a brightness of the sound which depends on the intensity of the excitation. Such physical phenomena are difficult to describe and we have therefore chosen to use signal models to simulate the perceptive effect produced by the excitation signal. The parameters of these models can then be estimated using perceptive criteria. By combining this synthesis model which represents the excitation and the waveguide model which represents the resonator, we obtain what we have called a "hybrid" model. II.1 - Modeling the resonator The study of the theoretical linear wave equation in the medium gives us an indication of the behavior of the modes in the resonator. By calculating the response in one point of a one-dimentional mechanical or acoustical system excited by an impulse, we find that it consists in a sum of sinusoidal functions exponentially damped [Sneddon, 1951]. To simulate this behavior we have chosen to use a waveguide model [Smith, 1992] consisting in a looped system with a.delay line and a filter taking into account dispersion, dissipation and boundary conditions (figure 1). Actually, this system generates waves consisting in a sum of exponentially damped sinuso'ds [Ystad et al., 1996] and is consequently well adapted to the simulation of the behavior of the waves in the medium. Input Output filter figure 1: waveguide model used to simulate the resonant part of the instrument II.2 - Modeling the excitation As already mentioned, the excitator and the resonator can generally not be separated, since they are interacting,, It is however convenient and often sufficient to assume that they are independent. Since the resonator is considered as a linear system, the source can be identified by deconvolution. The transfer function of the physical waveguide model being an all-pole filter, this is a legitimate operation. In the flute case for example, the source signal resulting from this operation contains both spectral lines and a broadband noise. To synthesize this signal, it is convenient to split it in two contributions [Serra, 1989]: a deterministic and a stochastic component. This is done using an adaptive filtering method [Widrow,1985] [Ystad 1998]. 11.2.1 - The deterministic contribution By considering the spectral evolution of the deterministic contribution, we generally find that it behaves non-linearly, since its spectral components evolve differently as the dynamic level changes. To model this non-linear behavior, we have chosen to use waveshaping synthesis [Arfib, 1979] [Lebrun, 1979]. This method consists in "distorting" a sinusoidal function with a variable amplitude I(t) called the index of distortion by a non-linear function Y. The generated signal s(t) can then be written: s(t)-y(I(t)cos(ot)) 11.2.2 - The stochastic contribution To model the stochastic part of the source signal, one suppose the process to be stationary and ergodic. The noise can then be described by its power spectral density and its probability density function. From a perceptive point of view the coloring of the noise is mainly related to the Power Spectral Density. By linear filtering one can then generate a noise corresponding to the instrument we want to simulate. 11.3 - Combining the models: the hybrid model By combining the source model with the physical model simulating the behavior of the waves during propagation in the medium, a very general sound model can be constructed. In figure 2 we can see how the two models can be combined and piloted by an appropriate interface. Here we have supposed that the source and the resonator are uncoupled. Completely new interfaces can be constructed to pilot such models, or traditional instruments with sensors like the flute interface described in these proceedings [Ystad et al. 1999]. Driving - Interface Physical control Interpretation State Interpretation S Excitator _ Resonator Signal model Waveguide model SSound Fi.ure 2: A hybrid model and its interface ICMC Proceedings 1999 - 113 -

Page  00000114 III - How to feed the model? In this section we shall show how the parameters of a hybrid model can be estimated from the analysis of real sounds or from theoretical knowledge. III.1 - Feeding the resonator model The parameters of the resonator models are closely related to the physics of the instrument. They are linked to the eigenfrequencies and to the attenuation of each mode of the structure. III.1.1 - From the theoretical behavior We have addressed the modeling of the flute and the guitar string. For that purpose, we have studied the propagation of longitudinal waves in fluids (wind instruments) and transversal waves in solid media (string instruments) [Guillemain et al., 1997]. We here give a rapid description of what we obtained in the wind instrument case. The one-dimentional wave equation of the acoustic pressure y inside a tube when visco-thermal losses are taken into account can be found for example in [Kergomard, 1981]. When the source term is a punctual dirac function, the solution of the equation is given by a sum of sinusoYds exponentially damped. In this case, one can notice that the modes are nearly harmonic (with a slight derivation) and that the damping factor depends on the rank of the mode. This is an important result indicating the behavior of the modes in a tube. Nevertheless, this solution corresponds to a tube without holes and with only plane waves propagating and gives therefore only an idea of the behavior of the modes in a wind instrument. To calculate the filter of the waveguide model we have compared the PSD of its transfer function with the PSD of the theoretical response [Ystad et al., 1996]. We then find the discrete values of the loop filter which depend on the damping factors of the modes and of their eigen frequencies. The construction of the impulse response from the discrete values is based on the time-frequency representation of a transient [Ystad, 1998]. To get the real values of the modes, we can also extract this information by the analysis of real impulsive sounds. II1.1.2 - From the analysis of real sounds The theory showed us that the response of the wave equation excited by a punctual dirac source is a sum of exponentially damped sinusoids. To find these values by the analysis we need information about both the frequency and the time evolution. We therefore have to use time-frequency representations such as the Gabor or the wavelet transform. The knowledge the theory gave us about the behavior of the waves made it possible to construct a matched analysis technique where the analysing functions behave like the modes [Ystad, 1998]. This gives an optimal extraction of the amplitude and frequency modulation laws associated to each component of the signal. Nevertheless, when non apriori knowledge on the signal is known, timefrequency methods generally give good results for the estimation of the modulation laws. From these laws we can directly extract the real attenuation and frequency of the modes, allowing the construction of the loop filter. In both the tube and the string cases, physical modeling gives a good description of the behavior of the sound produced by the propagation of a transient. In the string guitar case where the source is not coupled to the resonator, the synthesized sounds are very close to real sounds. Nevertheless, for sound generating systems using sustained sources -like wind instruments or string excited by a bow-, the input signal has to be identified and modeled separately. III.2 - Feeding the excitation model The stochastic part of the excitation can be easily synthesized using a filtered white noise, the amplitude of which is correlated to the level of the excitation. Resynthesizing the deterministic source signal with the help of a waveshaping method is possible if one can determine the distorsion function 7 and the index of distortion I(t) so that the dynamic evolution of the spectrum coincides with the one of the real sound. The great disadvantage of the global synthesis technique is that the representation of signals is not complete, meaning that one can not reconstruct any spectral evolution by changing the index. The distorsion function can be calculated by giving a spectrum and a value of the index of distorsion [Ystad, 1998]. Since the index of distorsion cannot take values greater than 1, it is convenient to calculate the distorsion function so that the spectrum generated corresponds to the "richest" sound (fortissimo) with and index equal to 1. The index can be estimated so that the reconstructed signal satisfies perceptive criteria. There are several methods for characterizing the perception of the temporal behavior of a musical tone, and the most well known is perhaps the spectral centroid criterion [Beauchamp, 1982]. This criterion is suitable for sounds whose spectral components globally increase. In the flute case, where the spectrum dramatically changes, this criterion does not work. In this case we therefore have used another perceptive criterion called the tristimulus criterion which consists in considering the loudness in three different parts of the spectrum [Pollard et al., 1982]. In the first group the evolution of the fundamental component is considered, in the second part the second, third and fourth comoonents are considered and in the third - 114 - ICMC Proceedings 1999

Page  00000115 part the rest of the components are considered. By minimizing the difference between the tristimulus of the natural sound and the tristimulus of the synthetic sound, we then find how to vary the index of distorsion to get a spectral evolution of the synthetic signal which is similar to the spectral evolution of the real signal. In the flute case we found that the index of distorsion should vary from 0.5 to 1 as the dynamic level of the sound varies from pianissimo to fortissimo. III.3 - Recapitulation: the general recipe in 10 steps One can recapitulate the different steps allowing to feed the hybrid model and to resynthesize a natural given sound as follows: 1- Record the sound produced by the instrument when excited by an impulse (short impulse on the string, transient obtained by closing rapidly the keypad of a wind instrument, etc...). 2- Analyse this transient sound to extract the frequency and the damping factor for each mode. 3- Construct the loop filter of the waveguide according to these data. 4- Record the sound of the instrument when played at different dynamic levels (from pianissiomo to fortissimo) 5- For each of these recordings, extract the "source" signal by deconvolution with the waveguide. 6- For each source signal, separate the deterministic and the stochastic part using adaptive filtering. 7- Calculate the DSP of the noise part of the source and use it to construct the noise filter. 8- Use the deterministic signal corresponding to the fortissimo playing to construct the distorsion function. 9- For each deterministic signals, calculate the tristimulus and find the best index of distorsion variation so that the difference between the real and the synthetic tristimulus is minimal. 10- Play the model IV - Conclusion We have presented a general hybrid synthesis model combining physical and signal models. For transient sounds a physical waveguide model has proved to be efficient. However, for sustained sounds, the source has to be extracted and modeled considering separately the deterministic and the stochastic parts. This model can be used to reproduce a given sound and to simulate the playing of a given instrument. The recipe given in this article can be used to simulate a great number of instruments. A general software using this technique is currently being developped. References Sneddon, I.N. Fourier Transforms, McGraw-Hill Book Company, 1951. Smith, J.O. Physical modeling using digital waveguides. Computer Music Journal, 1992, 16 n~ 4, pp. 74-91. Ystad S., Guillemain Ph., & Kronland-Martinet R. Estimation Of Parameters Corresponding To A Propagative Synthesis Model Through The Analysis Of Real Sounds. Proceedings of ICMC Hong Kong, 1996, pp. 19-24. Serra, X. A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition. PhD, Stanford University, October 1989. Widrow B., & Stearns S.D. Adaptive Signal Processing. Englewood Cliffs, Prentice-Hall Inc., 1985. Ystad S. Sound Modeling Using a Combination of Physical and Signal Models. PhD thesis, University Aix-Marseille II, France, 1998. Arfib, D. Digital synthesis of complex spectra by means of multiplication of non-linear distorted sine waves. Journal of the Audio Engineering Society, 1979, 27, pp. 757-768. Lebrun, M. Digital waveshaping synthesis. Journal of the Audio Engineering Society, 1979, 27, pp. 250-266. Ystad S., Voinier Th. Design of a Flute Interface to Control Synthesis Models. Proceedings of the ICMC 1999, Beijing (China) october 1999. Guillemain, Ph., Kronland-Martinet, R., & Ystad, S. Physical Modelling Based on the Analysis of real Sounds. Proceeding of the ISMA, Edinburgh 1997, Vol 19, pp. 445-450. Kergomard, J. Champ interne et champ externe des instruments a vent. These d'Etat, Universit6 Paris IV, 1981. Beauchamp, J. W. Synthesis by Spectral Amplitude and " Brightness " Matching of Analyzed Musical Instrument Tones. Journal of the Audio Engineering Society, 1982, Vol. 30, No. 6. Pollard H.F. & Jansson E.V. A Tristimulus Method for the Specification of Musical Timbre. Acoustica, 1982, Vol. 51. ICMC Proceedings 1999 - 115 -