Page  00000001 Infinite length windows for short-time Fourier transform S. Tassart Analysis-synthesis team, Ircam, PARIS, FRANCE St ephan.Tas sart~ircam. fr, http: // Abstract 2 Theory This paper presertts all extertsiort of the Short-time Fourier trartsform (STFT) to the case of irtfirtite ratiortal wirtdows. The choice of a suitable wirtdow for the STFT is a major issue ill sigrtal artalysis. The ability to use art irtfinite impulse resportse filter as all artalysis or synthesis wirtdow operts rtew perspectives. 1 Intro duct ion The short-time Fourier trartsform (STFT) is a widelyused sigrtal processirtg tool for sourtd artalysis artd synthesis. It is commonly used, for example, iii timefrequertcy artalysis as well as irt the phase vocoder. Sirtce only a finite rtumber of operatiorts is possible iii a computer implementatiort of the STFT, irt most cases the artalysis artd the synthesis wirtdow are of finite lertgth. Usirtg such artalysis or synthesis wirtdows, usually leads to a tradeoff betweert time resolutiort, frequertcy resolutiort artd amplitude of side lobes. Differertt optimal wirtdows have beert developed for several problems: Hartr, Hammirtg, Kaiser wirtdows... Iii this paper, we propose art algorithm for computirtg, irt a firtite rtumber of operatiorts, the STFT of a sigrtal wirtdowed by a ratiortal irtfirtite lertgth wirtdow, i.e. by the impulse resportse of art ARMA filter. This work extertds the STFT theory to irtfinite lertgth wirtdows. We present orte particular result of this work cortcerrnrtg the problem of design irg rtew optimized wirtdows for specific artalysis or synthesis problems. Irtfinite lertgth wirtdows do rtot follow the same cortstraints as finite lertgth wirtdows do. For irtstartce, whert design irg optimal low pass filters, the criteria comprise mirtimizilig ripples, maximizilig the slopes, artd the flatrtess of the trartsfer furtctiort. We presertt rtew wirtdow families, artd some tradeoff criteria adapted to the problem of the trackirtg of partials for additive sourtd syrtthesis. Givert a sequertce of complex rtumbers (x,),e2 artd art artalysis wirtdow (w, )ne2, we form the STFT 1% (eJW) (see [1, 9]) by: Vn C 2, Xfl(eJW) >Z X,, )ejnrnmwr (1) 2.1 Notation We defirte here a compact set of rtotatiort irt order to highlight the key poirtts of the demortstratiorts to follow. We set respectively: * il$, a complex vector of size N defirted as ~' ('WpN,~WpN+1,..., ~WpN+N 1), * ay, a complex vector of size N defirted as I' (xpNetff Nw,..., XPN+N le J(PN+-)w), * (17 7), a symmetrical bilirtear furtctiort operatirtg ort two vectors of size NV, y artd $' Givert this bloc rtotatiort, the STFT from Eq. (1) is erttirely recasted as art irtfirtite bloc summatiort: Vk e 2, XkN(eJW) 5 34w'~- rn) (2) Followirtg (2), (5 ik~) is to be irtterpreted as the result of the short-time Fourier trartsform <~N (&W) of (x,),e2 usirtg the N-poirtt artalysis wirtdow jr. 2.2 Bloc recursion We suppose here that the causal artalysis wirtdow (wI,),eNi is the irtfirtite impulse resportse of art ARMA filter, whose trartsfer furtctiort is a ratiortal furtctiort W(z) B(z)/A(z) of order P. Irt this case, there exists a set of vectors (bq)qE[O,P1l] artd coefficiertts (a~)Pe[1,p], Up ~ 0, describirtg a bloc recursiort for the artalysis wirtdow, 8q beirtg the K~rortecker symbol: Vk eZ, Wk=)5UpW'kp+5 )bqSkq, (3) Prrl q=O

Page  00000002 Note that the complex roots (Yp)pe[1,P] of the polynomial zP - 71J a zz-P are deduced from those of _'=1P-N Testo A(z), (Ip)pPE[o,P],by the relation: y7, -z=7. The setof vectors (bq)qE[O,P-1] is determined by evaluating the N - P first values of the analysis window (w,),nEN: q Vq < P, lq= Wq - apWq,_p (4) p=1l 2.3 Bloc STFT In the first step we compute a first order linear combination coXkN +1X(k - 1)N of two successive STFT, assuming only the causality of the window (w,),neN: coXkN(ejw) + Ci1X(k-1)N(ejw) +00 +00 co( o' 0k) + CO S( lm 4k-rn) +1 Y,(Cm k-1-mrn) m=1 m=o +00 +00 = co(o4jk) CO0(W m k-mrn) +1 Ci(Wm- Ik-rn) rnm=1 rnm=1 +00 = co(0o k) + 5(CO'm + C1Wrn-1 k-rn) m=1 The linear combination of STFT has been split into one term depending only on the first values of the analysis window and another term corresponding to the infinite sum exhibiting the same linear combination transposed to bloc-windows. In a very similar way, the P-order linear combinatiorn E -o cX(k-p)N can also be split into two parts. The first part originates from the P first initial values, (CP)pE[,P-1]. The second part corresponds to an infinite sum where the linear combination of STFT is transposed into a linear combination of bloc-windows: recursion (3) is transposed in terms of the following STFT recursion: P P-1 XkN(ejW) - axP N(eJ)+ (X(k-p)N k-q) (6) p=1 q=o This last expression shows that whenever the analysis window is infinite, the STFT is computed in a finite number of steps. The benefit of the autoregressive structure of the analysis window is transposed in a vectorial autoregressive structure in the STFT. The term ELQj(bq k-q) is to be interpreted, following (2), as a finite length window STFT. Its analysis window is actually a truncated version of (wn)neN, reduced to its first PN points. Thus, (6) shall be considered as a recursive STFT, based on a PN-point analysis window with an overlap of (P - 1)N points. Note that the STFT analysis and synthesis stages are always dual from each other. For instance, the overlap-add (OLA) reconstruction method results by duality from the Fourier transform interpretation of the STFT [1]. Thus, (6) shall also be interpreted as a dual expression from an infinite response synthesis filter reconstruction method. Such an infinite response synthesis filter may help to design an oversampling filter or a long term correlation... Unfortunatly, it does not seem possible to verify the perfect reconstruction condition when both analysis and synthesis windows are infinite (the least square method proposed in [4] for the reconstruction of the STFT leads for instance to an anticausal filter). 2.4 Time-frequency tradeoff In continuous time, the time-frequency resolution of a window happens to be a tradeoff between a measure of its bandwidth D, and a measure of its duration Dt. When these measures are chosen to be the standard deviation of respectively the time density for Dt and the spectral density for D,, then the gaussian window is known to minimize the product Dt, = D, Dt which is here considered as a time-frequency criterium (see [2]). Some other time-frequency criteria (equivalent noise bandwidth, the -3dB bandwidth... ) leading to different properties and tradeoff have already been proposed in [5]. For finite length discrete windows, the chosen criterium is rather a CPU-frequency than a timefrequency tradeoff. Both are usually linked since the length of a window is proportionnal to its computational cost. For instance, Harris in [5] compares the -3dB bandwidth of analysis windows of same length. Sc X(k-p)N(ejw) P=O +00 m =P P-1 = YY p qq=O 0P=O0 -p (k-q -m-P. (5) P=O - -P 4 -pk The (cp)pe[o,P] coefficients have to be chosen so that the infinite sum disapears. If the analysis window (w,),en is infinite, but rational, then, we know from section 2.2 that it admits a set of coefficients (cp)pE[o,P] simplifying each term Zjro cprn-p to 0 for m > P (see (3)). For this set of coefficients and with the help of (4), the first part of (5) can therefore be recasted as a linear combination on (bq)qE[o,P-]. In other words the analysis window

Page  00000003 For infinite length discrete windows, we have to exponential window verifies a first order recursion, take into account a time-frequency criterium but also the computational cost. As a matter of fact, the time-frequency resolution of infinite length discrete sequences has not been widely studied from a theoritical point of view [8]. While we recognize that a similar Heisenberg uncertainty principle exists in discrete time, usual acceptations of bandwidth and duration do not meet this principle: the duration Dn can be made as small as one desires whereas the bandwidth D, remains finite. Here we propose to estimate the time-frequency resolution of discrete windows by adapting continuous-time relations. For a fast decreasing sequence, (wn)nz, W(eJW) being its Fourier transform, the kth moment of the time and frequency density exists and is defined as: 7F;F (wk) z 7FWwk IW(eJ) 2dw (7) (nk)Wn k n12 (8) nE2 The discrete duration Dn and bandwidth D are then defined as standard deviations: D2 (w2)w D" (1) D2 _I2i)n n (1), ( TI ) (2 W)n~)w \ (1), / Kw2) (1W> i.e P = 1 and Vn > 0, w = awn-1 + S6 This equation is easily recasted as a vector recursion, the vector w'o being made from the N first values of exponential window. Applying relation (6) to the last recursion leads to the following relationship, already shown in [10]: XkN(eJw) NX(k-1)N(eJw) + Y (ejw) The time-frequency resolution of the analysis window evaluated from (11) depends on a. It diverges for a in the neighbourhood of 0 and 1, and admits a minimum D,, = 1.24 at a = 0.42. The coefficient aN may also be viewed as a forgetting factor in an adaptative scheme of the STFT algorithm. This algorithm is also known as an exponential average [7]. 2.6 Discussion This section points out details which slighty differ from the common STFT. Usually, the length N of the analysis window is directly related to the time-frequency resolution since the window shape is stretched to the correct size and the N-point FFT gives N bins uniformly spaced in frequency. With the recursive STFT, N is no longer linked to time resolution. The overlap factor is commonly understood as the rate of advancement of the analysis window relative to the length of the window. In theory it corresponds to a decimation coefficient, and therefore is related to the bandwidth of the analysis window. However, for applications where this analysis stage precedes a transformation and a synthesis stage, the overlap factor must be chosen with regard to the type of transformation planned. A 75%-overlap rate STFT is quite usual for common transformations such as time-stretching [6]. This overlap factor is also a means to estimate the computational cost per unit of time of processed signal. The 75%-overlap rate STFT corresponds therefore to a 4th order filter. The effective overlap factor should rather be understood as the rate of advancement of the analysis window compared to its duration. The duration of common analysis windows (Hann, Blackman, Hamminrg...) is far less thanr their lenrgth. Onre should expect similar effective overlap factors for finite or infinite length windows. This factor is related on one hand to the duration of the analysis window and on the other hand to the number of frequency bins used for evaluating the Fourier transforms (i.e. N, the size of the FFT). The overall steps needed for designing a infinite length analysis window are summed up here: * choose the computational cost (overlap factor), We want now to replace, for instance, D, by f(D,) in such a way that the product D,, = Df(D,) could be considered as a time-frequency estimation and meet an uncertainty principle. It would seem necessary that f(D,) diverges when D, tends to zero. When D, is large enough, f(D,) r D, seems enough to fullfill the constrain. Eq. (11) gives a reasonable estimation of the time-frequency resolution of discrete windows. We conjecture this quantity to be greater than 1/2 for any discrete sequence: 2 D, tar (11) Dnw = Dn tan 2D(11) K 2/ It should be noted that the well-known bilinear transform maps a discrete-time frequency into a continuous-time frequency by means of the trigonometric tangent function; that may justify the choice for the function f. The normalization factor 3/2 is chosen in order to fit the bandwidth of the unit impulse. 2.5 Exponential window The exponential causal (single-sided) window is defined as: Vn > 0, wn = an and 0 elsewhere. This

Page  00000004 N Dn D, Dnw Rectangular 256 74 0.10 7.7 Rectangular 2048 590 0.036 22 Blackman -92dB 256 26 0.020 0.52 Blackman -92dB 2048 210 0.0030 0.62 Hann 2048 290 0.0018 0.51 Hann-Poisson(0.5) 2048 260 0.0019 0.51 A(10-4, 21.6) 2048 290 0.0018 0.51 A(1.8, 0.92) 2048 200 0.0027 0.55 Butterworth order 3 2048 570 0.0011 0.61 Butterworth order 4 2048 690 0.00098 0.67 * deduce the order of the filter, * choose N the numbers of bins of FFT, * deduce the bandwidth of the filter, * design the ARMA filter, * compute the duration of its impulse response, * evaluate the effective overlap factor. 3 Extraction of spectral peaks In the context of sound analysis, Depalle and Helie in [3] proposed an efficient method to improve the estimation of frequency, amplitude and phase of partials of a signal based on a parametric modeling of the short-time Fourier transform. Frequency estimation is highly sensitive to the analysis window shape, and nosidelobe windows were necessary to prevent false detections due to local minima. A small bandwidth improves the conditionning for the algorithm whereas a small effective duration minimizes the smoothing effect of the time variation of parameters. Unfortunately the estimation of the time-frequency resolution of the windows presented in [3] is not adapted to infinite length windows. In the following results, N gives the number of bins in the FFT and an estimation of the computational cost (also related to the filter's order), Dn, D, and Dn, are evaluated following (10), (9) and (11) as an estimation of the time-frequency characteristics of the window. Butterworth filters have been chosen in the process of filter design since they are characterized by a magnitude response that is maximally flat in the passband and monotic overall. These filters sacrifice the rolloff steepness for monotonicity, and are therefore well suited for the forementionned algorithm. The 3 firsts windows presented in the table (rectangular, Blackman, Hann) have sidelobes on the contrary of the remaining other windows. The table shows that the Butterworth filters achieve approximatively the same resolution as a large -92dB Blackman window, but it seems also that optimal nosidelobe A-windows designed in [3], or nosidelobe HannPoisson windows, have still better time-frequency characteristics. However, in every case, the bandwidth of the Butterworth filter is sharper, causing the effective overlap factor to be smaller, allowing one to decrease N and to increase the order of the filter in applications where the quality of the transformation depends on this factor (phase vocoder). 4 Conclusion This paper has demonstrated how to extend the STFT to the case of infinite analysis or synthesis windows. We have also proposed different ways to compare the characteristics of these windows to those of finite length windows. Future work comprises design of new windows and application of this extension to the STFT to other analysis or synthesis algorithms. References [1] J. B. Allen and L. R. Rabiner. A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE, 65(11):1558-1564, November 1977. [2] L. Cohen. Time-frequency analysis. Signal Processing series. Prentice Hall, 1995. [3] P. Depalle and T. Helie. Extraction of spectral peak parameters using a short-time Fourier transform modeling and no sidelibe windows. In Proc. of IEEE ASSP Workshop on App. of Sig. Proc. to Audio and Acoust., Mohonk, New Paltz, USA, October 1997. [4] D. W. Griffin and J. S. Lim. Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust., Speech, and Signal Process., ASSP32(2):236-242, April 1984. [5] F. J. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform. Proc. IEEE, 66(1):51-83, January 1978. [6] J. Laroche and M. Dolson. About this phasiness business. In Proc. Int. Computer Music Conf. (ICMC'97), pages 55-58, Thessaloniki, September 1997. [7] R. Miquel. Le Filtrage Numerique par microprocesseurs. Editests, 1985. [8] J. Pearl. Time, frequency, sequency and their uncertainty relations. IEEE Trans. on Info. Theory, 19:225-229, March 1973. [9] M. R. Portnoff. Time-frequency representation of digital signals and systems based on short-time Fourier analysis. IEEE Trans. Acoust., Speech, and Signal Process., ASSP-28(1):55-69, February 1980. [10] T. Saso. On short-time Fourier transform with single-sided exponential window. Signal Processing, 55:141-148, 1996.