Page  10 ï~~Additive resynthesis of sounds using continuous time-frequency analysis techniques Ph. Guillemain * +, R. Kronland-Martinet * * CNRS-Laboratoire de Mecanique et d'Acoustique, 31 ch J. Aiguier, 13402 Marseille Cedex 09, France + DIGILOG, 21 rue Frederic-Joliot, P61e d'activitcs des Milles, 13852 Aix-En-Provence Cedex 3, France ABSTRACT It has been well-known for a long time that additive synthesis techniques allow to produce a wide palette of timbres. Unfortunately, these methods involve the knowledge of a huge number of parameters corresponding to the time behavior of both the frequency and the amplitude of each elementary component. Thus, it is difficult to find the parameters for synthesis, and even more complicated to extract parameters from the analysis of a real sound. Actually, this problem is mainly due to the fact that classical techniques such as Fourier transform cannot select precisely both the frequency and the amplitude modulation law related to each component. We address in this lecture the problem of the automatic extraction of such parameters through continuous time-frequency or time-scale decompositions of real sounds combined to specific algorithms based on the phase behavior of such decompositions. We will show that such a technique allows a low cost resynthesis involving a small number of elementary cells. 1. INTRODUCTION In order to be able to use the synthesis knowhow in the creation and the manipulation of audio signals for natural sounds, we address the problem of the estimation of additive synthesis parameters through linear time-frequency methods. For that purpose, we will mainly focus on time-frequency tools such as: - short time Fourier transform or wavelet transforms, - additive synthesis with varying amplitudes and frequencies. To use these techniques means that one has to be able to extract from a given sound, parameters corresponding to an additive synthesis such that amplitude and frequency envelopes. Moreover, one sould be able to modelize these envelopes by piecewise linear functions (accordingly to the hearing tolerance). Thus, it is important to disentangle each component in order to avoid beats due to their interference. For that, we developped a method based on the behaviour of the coefficients obtained through time frequency analysis techniques. The interest of such a parametrization of the sound mainly stands in the synthesis and/or transformation of these sounds for musical applications.. 2. SOUND SYNTHESIS Basically, there are two classes of techniques which can be classified as "linear" and "global" (or non-linear) methods. - linear methods consist in adding up elementary components to create a sound (additive synthesis), or to substract components from a broad band signal (substractive synthesis). These methods are easy to implement but usually need a large amount of parameters. Nevertheless, they allow to control the sound in an intuitive way since all the parameters are independant. - non-linear methods are powerful in the sense that they can produce a wide variety of sounds using a small number of parameters. Nevertheless, each parameter acts globally on the sound making difficult the intuitive representation of the sound resulting of a modification of one parameter. An attempt to identify parameters corresponding to a FM synthesis has been described in [1] but a lot of problems still remain. This is why we decided to address the problem of resynthesis through linear techniques. 3. SOUND ANALYSIS As for synthesis, analysis methods can be classified in two classes, namely parametric and non-parametric methods: - parametric methods consist in the identification of parameters corresponding to a given model (ARMA filtering, linear prediction...). These techniques need an a priori knowledge on the signal itself. - non-parametric methods can be used without any knowledge about the signal and consequently can be always used. Nevertheless they give rise to a larger amount of data than parametric methods. For our problem, we decided to use a non-parametric technique to analyse the sound. Moreover, we used linear representations in order to assume that the transform of a sum of signals is given by the sum of their transforms. The methods we used: are of the time-frequency type, namely the short time Fourier transform (in the continuous Gabor sense) and the continuous wavelet transform. We then parametrize the data obtained in order to relate them to an additive synthesis model. 10

Page  11 ï~~4. HOW DOES IT WORK Linear continuous time frequency analysis methods consist in associating a bi-dimensional representation to a given signal. These two dimensions correspond to the time and a frequency related parameter. For further information about such methods, we refer the reader to [2]. We will see how to extract parameters for additive synthesis from such an analysis. Let s(t) be a signal and L(t,cx) a time-frequency related representation. For example, in the short Lime Fourier transform c is the frequency, and in the wavelet transform ax is the scale parameter. For more clarity, let us assume that a is a frequency, then L(t,cx) can be represented as a picture where the horizontal axis represents time and the vertical axis frequency. This picture is the equivalent for sound of the musical score for music (more precisely a barrel organ notation). With the short time Fourier transform, each horizontal event corresponds to a quasi-sinusoidal component and the "score" consists in a decomposition of the sound in term of "chords of sine waves". Under these assumptions, the problem we addressed seems to be solved, and the parameters corresponding to an additive resynthesis would be obtained directely from the data given by the analysis. Unfortunately, nature is a little bit more complicated, and some problems could appear when components are close with respect to the frequency selectivity given by the analysis window. In this case, the time frequency representation shows beats between the components, prohibiting their separation. One can notice that this effect is similar to the beating that one can hear when two notes have close frequencies. An other problem concerns the fact that the analysis is usually performed for fixed values of the frequency (with the help of the FFT algorithm). In this case, a given frequency sine wave usually inprints on its neighbouring bands unless its frequency corresponds exactly to a frequency of the analysis grid. Fortunately, mathematics can sometimes help, and this problem can be solved under some reasonnable assumptions. Let's go now to the mathematic. As we decided to describe the technique with the short time Fourier transform, one can write the following: L(t,x) = {s(t)W(t-t)e-ia(tt)dt In the same way, the wavelet transform with an o modulated wavelet would be written as: L(t,c) =_is(t)WQ(Loc)texp(-iwoUt dt Let us denote \'(o) the Fourier transform of W(t). It is usually a real function localized around o=o, with an unique maximum for this value. Consider the monochromatic signal: s(t)=Acos(o it) Then: L(t,cx)- (\J(x-o l)exp(iclt)+(+ol)exp(-icli)) For reasons that will become clear further, let us assume that \\(c+coI) is zero. Then: L('tc)=A (a--o)exp(io it). The co frequency can be found by two ways: - The first and well-known way is to look for the maximum of \'(cx-cl), given by a=cal. - The second way is to consider the oscillating part of L('t,a). If one derivates its phase 1(t,ac) with respect to r, for any value of cx, we obtain: a(' = c=l, so the solution of the equation: " = a, is a =ml, and we have done the job. Let us now focus on a sum of two sines: Consider the signal: s(t)=A 1cos(coilt)+A2cos(co2t). With the precedent assumptions, one can write: Let us assume that it exists an c' range where x'(i-t)> X (-w2). Then, within this range, one can prove that the solution of the equation: I Itud = a, is ct=0 whrTll21, k integer. "to C02 could be obtained in the same way just by permutting the indexes. For other values of T, the estimation is not exact, but the error decreases as -. For readers interested, an iterative algorithm for solving this equation has been described in [3].: 11

Page  12 ï~~Up to now, we have estimated the wl and co2 frequencies, but we still do not know their associated amplitudes. To become more general, let us consider a signal composed of a finite sum of sines: N s(t) = XAkcos(wkt) k=l N Then: L(t,x)= -k((a-k)exp(iokt)+'(a+cok)exp(-icokt)) k=l N N =Z-k((a-k+(+cok))cos(okt)+ i X -J (x(a-ok)-(x+ok))sin(okt) k=l k=1 Let's now consider the N restrictions of L(t,cx) for ap = cop. 1_p<N Then, for any p one can write: N ((, - + a+w= RefL t,ca) k=l N ( (ap-c)- (Op+ok)/2 in(cok'C) =m2 L cp)} k=1 These formulas correspond to two linear systems of N equations with N unknowns whose form is: N XWp,kXk('t) = Lp(t) Vp, 1<p<N k=l The elements of the first matrix are given by: Wp,k = ( (ap-c~k)+(xp+cok)) The elements of the second matrix are given by: Wp,k =(O(ap-wk)- (ap+ck) Then at any time 'r, the solution vector X(T) = W-'L(z) and the modulus of Xk('r) is A-. One can generalize this algorithm when the amplitudes of the components vary with respect to the time [4]. Under these assumptions, one obtains the amplitude modulation law associated to each contribution. For synthesis, each envelope can be modelized by a piecewise affine function obtained by a local minimization of the distance between the original and the modelized curves. One thing must be noticed: we have not made any assumptions on the spectral bandwidth of \(w). This means the following: even if several spectral lines appear in the spectral bandwidth of \(a-o), we are able to disentangle them with this technique. Moreover, even though Gabor functions are not zero for negative frequencies and consequently lead to interferences between positive and negative frequencies for a simple sine wave, we are able to find the right amplitude with this method. In the case of spectral lines modulated in amplitude, one can intuitively understand what happens: if the spectral line is rapidly modulated in amplitude, one is not able to catch these variations unless one chooses a window well localized in time, but consequently spread out in frequency. If the signal is composed of more than one spectral line, one is tempted to adapt the window size to separate the components. In this case, one is not able to catch rapid variations of the amplitude. With the technique described, it is not necessary' to take care of this problem. 12

Page  13 ï~~5, EXAMPLE In order to illustrate the method, we have represented below the envelope corresponding to the fondamental frequency (Dfond of a trumpet sound sampled over 0.5s, extracted in various situations. Figure 1 displays the modulus of the restriction of the Gabor transform for = fond computed with a window spread out in frequency. Many harmonics are merged and the modulus is strongly oscillating. Figure 2 displays the modulus of the restriction of the Gabor transform for a= =0fond computed with a window narrow in frequency. Only the fondamental frequency is catched but the modulus is strongly smoothed. Figure 3 displays the result obtained by our method. The window used is the same as in figure 1, but the amplitude does not oscillate and is not smoothed. Figure 4 displays the result of a piecewise affine modelization of figure 3. In this case only 14 breakpoints are needed. Fig I REFERENCES (1] N. Delprat, Ph.Guillemain, R. Kronland-Martinet. "Parameters estimation for non-linear resynthesis methods with the help of a time-frequency analysis of natural sounds". In proceedings ICMC conference, sept.90, Glasgow, p 88-90. [2] J.M. Combes, A. Grossmann, Ph. Tchamitchian (Eds); "Wavelets, time-frequency methods and phase space"; SpringerVerlag, ipti, 1989. [3] Ph. Guillemain, R. Kronland-Martinet, B. Martens. "Estimation of spectral lines with the help of the wavelet transform. Applications in NMR spectroscopy". In "Wavelets and Applications", Meyer ed, RMA series, Masson Springer-Verlag 1991, p 38-60. [4] Ph. Guillemain, R. Kronland-Martinet. "Parameters estimation through continuous wavelet transform for synthesis of audio-sounds". 90th AES Convention, feb 91, Paris, Preprint 3009 (A-2). 13