Page  592 ï~~An Accurate Signal Representation for Sound Resynthesis Utilizing a Time-Frequency Mapping of the DFT-Magnitude Robert R. Hdldrich Institute of Electronic Music Graz, Austria Email: hoeldric@wm.mhsg.ac.at ABSTRACT: In the Short-Time Fourier Transform, there is a tradeoff between time and frequency resolution due to the length of the analysis window used. This paper presents the Modified Moving Window Method, invented by Kodera, which performs a time-frequency mapping of the DFT magnitude yielding a more concentrated energy distribution in the t-f-plane. A simplified method is proposed to overcome the computational complexity of the original version leading to the "improved spectrogram". The resulting analysis data are used to form a parameter set of a sinusoidal signal representation for sound resynthesis. I. INTRODUCTION The spectrogram or Short-Time Fourier Transform (STFT) is a widely used tool for the time-frequency representation of signals. For resynthesis a reasonable model for deterministic discrete signals is a sinusoidal representation of the form: x(n) - 1, x=(n) -, az(n)cos(2anf1(n)+4.o). (McAulay, R., Quatieri,T.F.) suggested a similar model, see also (Serra,X.). In practical implementations the signal parameters (a,(n),(n),4.0) are obtained via a frame-based analysis using a STFT of length N with a time window w(n) and hopsize HS: X1(k) - E x(n-m HS)w(n)e-2. In each frame m, the set of frequencies, amplitudes and phases is calculated by estimating the spectral peaks of Xm(k). For details of the subsequent resynthesis via a spectral track algorithm, see (Brown,J.,Puckette,M.), (H5ldrich,R.1994a-c), (McAulay,R, Quatieri,T.), (Smith,J.,Serra,X.). Considering the spectrogram (STFT) as analysis tool, there is a tradeoff between time and frequency resolution due the analysis window used. For non-stationary signals, the t-f-slope which one can extract results in an arbitrary "time-frequency law" depending more on the properties of the analysis window than on the signal itself. In this paper an analysis technique is suggested which results in a nearly perfect t-f-localization of the signal energy while being independent of the particular window length. In Section II a time-frequency distribution invented by (Kodera,K.) is presented which performs a t-f-mapping of the spectrogram. This Modified Moving Window Method (MMWM) does not require stationary signals within a frame. To reduce the drawback of numerical complexity a simplified MMWM using overlapping analysis frames is developed in Section III. An algorithm is given which calculates the component's instantaneous frequencies and amplitudes. In Section IV the main results are concluded and further research topics are mentioned. II. "SPECTROGRAM REVISITED" Kodera in (Kodera), (Kodera,K. et al) proposed the continuous Modified Moving Window Method (MMWM) which maps the integrated energy distribution, Is(to,f0)12, to its center of gravity in the t-f-plane, (t'g, f1). The resulting energy distribution S(t,f) is defined by S(tf4:rfls(t0o)12&(t-t'g(tgj)) (f-f' (t0fo))dt0df0, where the energy distribution in the interval Af, Is(t0,fo)12 is obtaifidd from the definition:.f0o~A?2: 1f Xq)e "" It represents the output of an ideal bandpass filter with bandwidth Af moving along the frequency axis for each time to. The distribution is similar to the wellknown STFT in its spectral definition: STT (tofo:= e f''f /- fo)X()e 4' The frequency response of the analysis window, W(f), serves as the moving bandpass filter which is not exact bandlimited in practical implementations.The squared magnitude of the STFT(t40f), which is the spectrogram SPEC(to,fo), equals the integrated energy distribution: &P EC (tJ0) =ISh (tof0)1 -- Is(tgjo)2 Therefore, 592 2IC MC P ROC E E D I N G S 1995

Page  593 ï~~S(t,f) can be interpreted as a mapped spectrogram. The center of gravity (t'g, f1) is calculated via the GaborHelstrom transform and the phase stationarity principle and leads to: 1 6 arg[s(tof)] 1 6 arg[s(tojo)] gO2 2 1 6 rg s 6t)0f tg.t0f 2i a fo 2% a to For details, see (Auger,F.,Flandrin,G.) and (KoderaK. et al). This t-f-displacement: (to,fo) -- (t'g,f') leads to a t-fslope between the instantaneous frequency and the group delay time which are properties of the signal itself. Therefore, the MMWM will be nearly independent of the window. This holds even for multicomponent signals as far as the STFTs of each single component do not overlap significantly in the t-f-plane. For discrete signals, the MMWM can be calculated using the DFT of non overlapping frames and a bandpass filter F(m): lN/-1 s(n,k) -E F(m) X(lcm) eJ2'Om> with X(k) -, x(n) e-12' -4n-N/i To obtain the mapping point (t",f'i) the partial derivatives must be approximated using the backward difference. The instantaneous frequencies f'i are calculated utilizing the phase difference between two DFTs with a shift of one sample (see also (BrownJ.,Puckette, M.). In addition, each frequency component is mapped to its temporal center of gravity, t'g. Hence, the signal is not supposed to be stationary within the frame. One of the major drawbacks of the method is its enormous numerical complexity (N(log2N+2) operations per sample). In the following section a simplified MMWM is proposed which overcomes this shortcoming. m. "THE IMPROVED SPECTROGRAM" - A SIMPLIFIED MMWM In the definition of the discrete MMWM all N2 energy contributions within a frame are used for remapping: Is(n,k)I2 for -N/2 n s N/2-1, 0 s k s N-1 -- (t'g,f'1). For a simplified version one takes into account just the contributions at the frame center n=0. Using a Hanning bandpass filter F(m) one obtains the simplified distribution s'(k) within a single frame: s'(k) = s(O,k) - 0.5X(k)+O.25X(k-1).O.25X(k.I) In order to analyze all signal samples with the same weighting factor overlapping frames must be used. Without the t-fdisplacement this analysis would perform the original spectrogram. For each bin s'(k) within the frame of length N, its center of gravity (t',f'.) is calculated with: N k 1 t' -- (arg [sOk)] - arg[s(Ok-1)]) fi= N 2t (args(Ok)] - arg[s(-1,k)]) With a frame hop size HS, only the energy values Is'(k)2 located near the window center in the interval: It'g l< HS/2 are used. This analysis method suppresses those signal energies which are mapped near the frame boundaries. In contrast to the original MMWM the perfect time mapping is omitted in the simplified version. One calculates the DFT in each frame and performs the frequency mapping, k -- f1, for those bins which temporal center of gravity is in the interval: It'gI _ HS/2. The numerical complexity is reduced to approximately Nlog2N for each frame. In the simplified MMWM each single sinusoidal component contributes to more than one DFT bin. The resulting frequency mapping leads to slightly different frequency values f'1 due to the imperfect approximation of the phase derivatives and interacting window sidelobes for multicomponent signals. Therefore, one has to combine the energy contributions within a small frequency interval E to form one single component for subsequent resynthesis. Figure 1 shows a comparison between a) the original spectrogram and b) the "improved spectrogram" of a multicomponent signal (2 FM components (1,2), three swichted (3-5) and one stationary (6) sinusoidals). The "improved spectrogram" yields obviously a better time resolution, especially for the switched components. The resulting spectral broadenings appear solely in the corresponding frames (10-13, 26). IV. CONCLUSION In this paper we presented a time-frequency analysis method based on the STFT. This Modified Moving Window Method, invented by (Kodera, K.), performs a t-f-mapping of the spectrogram yielding a more concentrated energy distribution along the curve of the instantaneous frequency. A simplified MMWM operating on overlapping signal frames reduces the computational complexity dramatically and results in a well concentrated energy curve in the t-f-plane. The mapped energy contributions can be combined to form a set of ICMC PROCEEDINGS 199559 593

Page  594 ï~~component parameters (amplitude and frequency) within each frame yielding the "improved spectrogram". The output data can be fed to a spectral track algorithm which results in an appropriate signal representation for sound resynthesis. Auger, F. & Flandrin, G.(1994). La reallocation: 60.a). une methode generale d'amelioration de la lisibilit des representation temps-frequence.................iii.i.i.:. bilineaires. Proc.Colloque 'Temps-Frequence, f... ='s etApplications", Lyon, France, 15.1-15.7.. ".",--, v.iiotii1t Brown, J.& Puckette M. (1993). A high 30 " i......._.:... " " "..i- t resolution frequency determination based onpÂ~t-" phase changes of the Fourier transform. Jour. 20,*!$lU!iNtIjj9iiiiiiWqi'iiiili99iN~UI*i~ Acoust. Soc. Am. 94(2), 662-667..-_..:!I.. j"Y. Cohen, L.(1989). Time-Frequency Distributions 10.." - A Review. Proc. IEEE. Vol.77, No.7. pp.941-,-"": 981. 0 Hoidrich, R.(1994a). Ein Verfahren zur Zeit- 10 20 30 40 Fram* Frequenz-Analyse und Resynthese von Klangsignalen. Dissertation, Techn.University 60 Graz, Austria. 5 Hi1drich, R.(1994b). The Improved Spectrogram f.-.. 4 - How to Obtain an Accurate Signal 401 "' R epresentation for S ound R esynthesis? P ro c. 1. t o e e o e o o o e e-l.. o" o e Â~ g ee.+ N Simposio Brasileiro de Computacao e Musica. i30o o Holdrich, R. (1994c). Frequency Analysis of 3"-.. e Non-Stationary Signals Using a Time-Frequency 20 "o e ee e O e e e.-ooeoeeeo.u..seus Mapping of the DFT Magnitude. Proc. 5th 5..v '? "-.-- 6 ICSPAT, Dallas 1994. 10-o." Kodera, K.(1976). Analyse numtrique de 200".: " " signaeaux gdophysiques nonstationaires. 0 Thesis, Paris. 10 20 30 40 Famc Kodera, K., Gendrin, R. & Villedary C. (1978). Fig. 1: a) the original spectrogram and b) the "improved spectrogram" of a Analysis of Time-Varying Signals with Small BT multiconponent signal. HS = N/4. Values. IEEE Trans. Acoust. Speech Signal Processing, vol.ASSP-26, No.1, 64-76. McAulay, R. & Quatieri, T.F. (1986). Speech Analysis/Synthesis Based on a Sinusoidal Representation. IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-34, 744-754. Serra, X. (1989). A System for Sound Analysis /Transformation /Synthesis Based on a Deterministic Plus Stochastic Decomposition. PhD.Dissertation, Rep. STAN-M-58, Stanford University, CA. Smith, J.& Serra, X. (1987). PARSHL: An Analysis/Synthesis Program for Non-harmonic Sounds Based on a Sinusoidal Representation. Proc. Int. Computer Music Conf. 290-297. 594 5941CMC PROCEEDINGS 1995