# An Improved Additive Analysis Method Using Parametric Modelling of the Short-Time Fourier Transform

Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact mpub-help@umich.edu to use this work in a way not covered by the license. :

For more information, read Michigan Publishing's access and usage policy.

Page 297 ï~~An Improved Additive Analysis Method Using Parametric Modelling of the Short-Time Fourier Transform Ph. Depalle phdircam.fr L. Tromp trompOircam. fr IRCAM, Analysis/Synthesis Department 1 Place Igor-Stravinsky, 75004 PARIS, FRANCE Abstract A new method which improves the estimation of frequency, amplitude and phase of the partials of a sound is presented. It allows the reduction of the analysis-window size from four periods to one and a half periods. It therefore gives better accuracy in parameter determination, and has proved to remain efficient at low signal-to-noise-ratios. The basic idea consists of using a parametric modelling of the short-time Fourier transform. The method alternately estimates the complex amplitudes and the frequencies starting from the result of the classical analysis method. It uses least-squares procedures and a first-order limited expansion of the model around previous estimations. 1 Introduction The additive synthesis model represents a sound s(n) as a finite sum of L partials (sinusoids whose amplitude ak, frequency fk and phase 4k, k E [1, L], vary in time), mixed with a time-varying spectrum envelope noise b(n) [DR92]: k= L s(n) = Z:ak(n) cos(k(n)) +b(n) (1) k=1 where the phase 1k is updated as follows: (k(n) = (k(n - 1) + 27rfk(n) and 4'k(0) =k Furthermore, an analysis/synthesis system based on the additive model allows reproduction and transformation of natural sounds with a precise and independent control of the time evolution of the partials. In the analysis part, the classical method consists of extracting and selecting spectral peaks from the Short-Time Fourier Transform (STFT) [RS78]. STFT is evaluated every I samples, and is given at time nr = rI by: n=+o~o Sr(F) = E w(n)s(n) e2WiFn (2) n=-oo where.T is the normalized frequency and w is a temporal, symmetrical window of length M samples. In order to obtain a sufficient frequency resolution, the signal's window size M has to be larger than four periods of the lowest frequency. This constraint becomes a severe drawback when frequencies and/or amplitudes vary too quickly. This is particularly true for sound attacks and glissandi, where parameters are smoothed by the analysis procedure. In order to overcome this, we have developed a method which increases the temporal resolution of the additive parameters. Indeed, parameter smoothing has two implications: first it averages the parameter variation over the size of the window and secondly, it introduces a temporal uncertainty concerning the precise instant of change. The classical method implies that increasing temporal precision will decrease frequential resolution. Thus, the only way to keep intact the frequency resolution while decreasing the temporal length consists of introducing assumptions on the structure of the STFT representation. We make assumptions by developing a parametric modelling of the STFT which permits us to decrease the window size to one period and a half. This size reduction minimizes the smoothing of the parameter evolution, and has been proved to remain efficient at low signal-to-noise-ratios - SNR - (e.g. 10 Power-dB). In this paper, we first present the method to estimate amplitudes and frequencies. We then derive an iterative algorithm which implements the method. Finally, experimental results are presented on synthetic sounds and then for a recorded sound. 2 Method Henceforth we only consider variables defined at instant nr. Therefore, the subscript r will be omitted. ICMC Proceedings 1996 297 Depalle & Tromp

Page 298 ï~~2.1 The Model Let us remember that the main goal of the additive analysis is to extract the frequencies and/or the complex amplitudes of the sinusoidal components of the temporal model. Usually these parameters are extracted from the STFT of the observed signal by a rather simple geometric procedure which consists in detecting its maxima and using polynomial interpolation to improve frequency resolution [DGR93]. On the other hand, the basic idea of our method is to consider explicitly an analytical spectral model of the STFT of the noiseless L sinusoids expressed by: k=L S(F) =_ { a- e""' W (.F-1k) k 1 2(3) + akiokW(A) where W is the Fourier Transform of the analysis window w. The Spectrum S(FT) of the signal can be formu= lated in the following way: S(F) = S(F) + B(F) (4) where B(.T) is the estimation error between the true spectrum S(.T) and its model. In practice, B(F) is not only the spectrum of a random noise but also the spectrum of all the left over partials. The goal of this method is to identify the parameters for which the model S(.T) best fits the observed spectrum S(T) according to a least-squares criterion. Using a least-squares method implies making some measurements of the function to be identified. We choose to measure S(F) at N equally spaced frequencies.Fj = (j- N2-1) --, j = 1,..., N; N being the power of two immediately greater than M. This is achieved simply by performing a Radix-2 FFT algorithm. Furthermore, in order to get more values, N may be chosen as a much bigger power of two than strictly necessary. Thus, concatenating the N equations (4) for j = 1,..., N, we obtain the vector equation: S= S B (5) where the column vectors are defined according to Xi(j) = X(Fj), Xj being respectively S, S, B. Then, we obtain the best estimate S according to the mean square criterion by minimizing the power of B(.F): 2.2 Amplitude and Phase estima tion At this step, we assume that we know precisely the frequencies of the sinusoids, and we estimate their amplitude and phase using the model defined in Eq. (3). We deduce a linear structure for the estimator which takes the following expression [Tro95]: L=2L Â~(f) = Z A1H1(f) l=l (7) with the known 2L expressions for k = 1,..., L: Hk(f) = {W(f - fk)Â~+-W(f-+-fk)} Hk+L(f) = 3{W(f - fk) -W(f + fk)} and the unknown 2L parameters: (8) { Ak = -COSk Ak+L =ak. -- -sin Â~ 2 (9) We then derive: s= ta (10) where?i is a matrix of dimension N.(2L) defined by (Jt);,l = Hl(F; ), and A is the 2L parameter column vector defined by 2(I) = A1. Thus, we obtain the least-squares solution [LH74]: A =(JjH )-1I jH8 (11) Since there is no constant value in the model S (f),we center the signal in order to remove its mean value. We now give some general properties of the algorithm. The first point deals with the negative frequencies which are taken into account in the second term W(YF + fk) of the model S(F-). For values of fk close to zero, this term may have nonnegligible values for positive values of Y. This prevents us from developping our method in a more simple way which would have consisted in discarding the second term of S(FT) and measure the spectrum S(sT) only for positive frequencies. The second point concerns the influence of the shape of the window. In our study, we used the BlackmanHarris 3 - BH3 - (for its good dynamic resolution) and the rectangular window (for its good frequency resolution) [Har78]. Due to its small main lobe, rectangular window allows smaller size than the Blackman-Harris one. This algorithm gave good results even for low SNRs (10 dB). This can be interesting, for example, when tracking the release of a note which is disappearing in noise. Figure 1 and 2 illustrate the ability of our algorithm to track the amplitude evolution of the partials of a sound. Figure 1 shows the amplitude estimations of the three partials of a harmonic sound mixed with white noise. The fundamental frequency is fo = 600Hz, the amplitudes of the harmonics start at time 0 with a value of n where n is the order of the harmonic. They then change abruptly, going up and down with a slope of 14 dB per ms, Depalle & Tromp 298 ICMC Proceedings 1996

Page 299 ï~~te & 5 i 3-! 12 o0 10 20 30 40 50 60 70 80 90 time (10 represents 4.1 ms) Figure 1: Amplitude estimation on a periodic sound mixed with white noise 10-.9.8 1 1 7 a E 3 2 0 50 100 150 200 250 tUrn (10 represents 16G8m. ) Figure 2: Amplitude estimation on a periodic sound mixed with an increasing-variance noise as it is represented with dashed lines. Variance of the noise is -20 Power-dB with the power reference set to the power of a sinusoid of amplitude 1. Analysis is performed by using a BH3 window of 1.5-period size sliding at each step one eighth of a window. Figure 1 shows that the evolution of the amplitude are well tracked, and that the fluctuations of the estimation remain small. Figure 2 shows the robustness of the amplitude estimation in the presence of noise. The structure of the signal and the analysis conditions are the same than in the preceding example: a sound with three harmonics mixed with white noise. The fundamental frequency is fo - 200Hz. The amplitudes of the harmonics are going up and down as in the previous example but the amplitude pattern is repeated exactly five times. Furthermore, for each repetition, variance of the white noise is increased of 20 Power-dB (variance sequence is - 80, -60, -40, -20 and 0 power-dB). Figure 2 clearly shows that even with a strong presence of noise, amplitudes does not fluctuate very much. 2.3 Frequency estimation Let us now assume that we know the amplitudes and the phases of the L sinusoids. Eq. (3) clearly shows that the dependence of the model on the frequencies is non-linear. In order to obtain a linear formulation, we assume that we already know a rough approximation Fk of each frequency fk for k = 1,...,N. We then have to estimate the distance between the rough approximation Fk and the actual value fk, Ak = Fk - fk. In order to obtain a linear expression between S(T) and A, we start by using this particular formulation of the spectrum model S(17): k==L S(.Y) = Z {AkW(.T- fk) + AkW("+ fk)} (12) k=1 where Ak = 2e--k, and we make a first-order limited expansion of S(F). To achieve the firstorder expansion, we rewrite W(.T:Ffk) as follows: W(F fk)= W((F + Fk)Â~ k) (13) and define the three vectors f, F and A as above for S, S, and B. For each measure at frequency Fj and for each partial indexed by k, we linearize the frequency dependence by using a first-order limited expansion of the Fourier Transform of the analysis window W around F2 - Fk: W(.jfk) t W(.i T Fk)Â~-OkW'(.'F; fk) (14) Thus, rewriting S using all these results, we obtain the following expression [Tro95]: S = K+ 4 (15) where vector K is defined by: k-L K(j) = Z {AkW(.T3 - Fk) + A W(.Tj + Fk)} (16) k=1 and matrix SZ by: (fl)ek =AkW'(.tj - Fk) -AW'(. +Fk) (17) The least-squares solution leads to: f = f+ IA =.+ (Hf)H(5 K) (18) Equation (18) updates the rough frequency estimation and provides a better approximation. Once the new f has been estimated, it can be used in equation (11) to obtain more precise amplitudes; these ones can then be used in equation (18). These operations constitute the first step of an iterative algorithm which alternately improves the estimates of amplitudes and frequencies starting from the results of the classical method. It is interesting to see what happens when the frequency estimation reaches the actual value. In ICMC Proceedings 1996 299 Depalle & Tromp

Page 300 ï~~3 Conclusion 2200 2000 1-1800 r1600 ~1400 f1200 1000 8001 800 --__- '_ - ' __ - a 100 0 5 10 15 time (1 represents?? ms) Figure 3: Frequency estimation improvement We have presented a new spectral analysis method which improves the estimation of time-varying frequency, amplitude and phase of the partials of a sound. These results are performed by a parametric modelling of the short-time Fourier transform of the sound. This method has proved to remain efficient at low signal-to-noise ratios. One of the future directions will consist in improving the efficency by rewriting the algorithm from Matlab code to C language. One other direction is to study the structure of the matrixes involved in the two least-squares solutions in order to deivelop faster algorithms. Finally, we will also study more precisely the convergence properties of the frequency estimation part. References [DGR93] P. Depalle, G. Garcia, and X. Rodet. Analysis of sound for additive synthesis: tracking of partials using hidden Markov models. In Proc. of Int. Comp. Mus. Conf. (ICMC'93), pages 94-97, September 10-15 1993. [DR92] P. Depalle and X. Rodet. A new additive synthesis method using inverse Fourier transform and spectral envelopes. In Proc. of Int. Comp. Mus. Conf. (ICMC'92), pages 410-411, October 14-18 1992. [Har78] F.J. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1):51-83, January 1978. [LH74] C. L. Lawson and R. J. Hanson. Solving least squares problems. Prentice-Hall, Inc., 1974. [RS78] L. R. Rabiner and R. W. Shafer. Digital procesing of speech. Prentice-Hall, Inc., 1978. [Tro95] L. Tromp. Amelioration de l'extraction de partiels dans les signaux sonores. Rapport de DEA d'automatique et de Traitement du Signal de signal, Universite de Paris XI, July 1995. time (10 represents 9 mr) Figure 4: Amplitude evolution of the 6 first partials of a piano note this case, the vector S - K is equal to the Power Spectral Density B of the noise b(n) evaluated on the N points Y3. Assuming the whitness of the noise, B has all its elements identical. Therfore B belongs to the kernel of QH since the derivative of W is antisymmetrical. Consequently, 4 is equal to zero even if S - K is not the null vector. As an example, Figure 3 shows the improvements when the whole procedure is applied to frequency estimation. In order to test the algorithm, initial frequencies are set at 850, 1030 and 2110 Hz, when the actual values are 600, 1200, and 1800 Hz. We observe that the algorithm converges to the actual values after five iterations. The last example is performed on a recorded sound. Figure 2.4 shows the tracking of the 6 first partials of a piano note (G5 sharp) using our method. Initial values, which are displayed with dashes, are provided by the classical analysis method. The BH3 window size is 3.4ms (2.8 times the period of 1.2 ms), and the sliding step is 0.9ms (a quarter of the window size). The solid lines, representing the new algorithm result, clearly evolve more quickly at the attack-time. Depalle & Tromp 300 ICMC Proceedings 1996