Page  00000001 A STREAMING OBJECT ORIENTED IMPLEMENTATION OF THE MODAL DISTRIBUTION Thomas Lysaght Department of Computer Science, NUI Maynooth, Co. Kildare, Ireland Email: Tom.Lysaght Victor Lazzarini Department of Music, NUI Maynooth, Co. Kildare, Ireland Email: Victor.Lazzarini Joseph Timoney Department of Computer Science, NUI Maynooth, Co. Kildare, Ireland Email: Joseph.Timoney ABSTRACT The Modal distribution is a time-frequency distribution specifically designed to model the quasi-harmonic, multi-sinusoidal, nature of music signals and belongs to the Cohen general class of time-frequency distributions. A streaming, object-oriented implementation of the Modal distribution is presented which forms the basis for designing other members of the Cohen class. Implementation of this routine in the C++ Sound Object Library provides a fully portable tool for time-frequency analysis across multiple platforms. The theoretical background to the Cohen general class is outlined followed by an explanation of the design and implementation of the Modal distribution in the SndObj library. Suggestions for future extensions to the new Modal class and its integration with the entire library are explored. 1. INTRODUCTION The Modal distribution was introduced by Pielemeier and Wakefield [1] as a member of the Cohen general class of time-frequency distributions [2] for the analysis of music signals. It is primarily a Wigner distribution, or more specifically, a smoothed pseudo-Wigner distribution (SPWD), with a kernel that takes account of the modes present in quasi-harmonic, multi-sinusoidal music signals. Being based on the Wigner distribution, it provides a more accurate measure of time-frequency localisation and does not suffer from the time-bandwidth trade-off inherent in spectrogram implementations. Superior accuracy in time and frequency localisation is desirable for the analysis of music signals where, for example, time resolutions of a few milliseconds are required for onset analysis and where partials may often have broadband characteristics. One drawback of the Wigner distribution is the existence of cross-terms amounting to beats between partials not existing in the original signal. The Modal distribution kernel is designed to minimize the effect of these cross terms for music signals. A C++, object oriented, implementation would allow for integration of the Modal distribution routine with a variety of existing signal processing tools in the Sound Object Library[5]. The Sound Object Library is an object-oriented library for audio programming, written in C++. The library code is fully portable across Windows, Linux/Unix (with OSS/ALSA), Irix and MacOS X. The SndObj library provides more than 100 classes that can be used for time- and frequency-domain signal processing, as well as sound and MIDI input/output. With its SndThread class, it can also manage audio processing threads. Particularly important for this work is the support found in the library for partial tracking and additive synthesis, which can be used to build analysisresynthesis programs using the Modal transform for spectral analysis. The current implementation also implements the pseudo-Wigner and smoothed pseudoWigner distributions, and allows for further extension to the Modal class, thus, facilitating development of other members of the Cohen general class or the investigation of new time-frequency distributions through novel kernel design. 2. THEORETICAL BACKGROUND Leon Cohen [2] proposed a general class of timefrequency distributions which are related through linear transformations. The set of all linear transformations of the Wigner distribution has come to be known as the Cohen general class. A two-dimensional kernel determines the linear transformation involved. Included in the Cohen general class are the spectrogram and distributions due to Riahaczek [4]. Further investigation has been carried out by O' Donovan [11]. The Cohen general class is given by: C(t, w)= j jjs u --TIsu+TI4221 2 A 2 (1) x o(O,r)e-yeT--+ cludu drdO where (0, z) is a two dimensional kernel function which determines the distribution and its properties. (0,zr) typically implements filtering in time or frequency or both. The Wigner distribution in terms of the signal s(t) and the spectrum S(w) is given by: W(t, w) = st - t + - ed SS* t--2 )O t+0 e JdO 2- I 2 2 2I 2 (2)

Page  00000002 Here the kernel is unity. The Wigner distribution is said to be bilinear in the signal as the signal enters twice in the calculation and is the sum of products of the signal at past and future times, both past and future lengths of time being equal. This is in effect an autocorrelation with the lag variable, T, producing the time-relative-time or temporal autocorrelation function (TCF) given in (4). An important property of the Wigner distribution is that it is real with W*(t, w)= W(t, w). Also, the Wigner distribution gives a clear picture of the instantaneous frequency and group delay which is not the case for the spectrogram. These are important for resynthesis [1,7]. 2.1. The Time-Relative-Time Function The Wigner distribution is obtained by taking the Fourier transform with respect to " of the 2-dimensional bf (t, -) autocorrelation function. given in Equation (4). The terminology used is this section is that of Poletti [6]. The temporal and spectral ranges of this function are first outlined before discussing the discrete implementation of the Wigner distribution. For a bandlimited and timelimited function: sampled in t at a rate of 2f/, and in Z at rate fs has the following discrete formulation: bf, k-, nt, =fh kfts )n kts2 nt (7) This function, then, has duration Figure 1. 2T in Z as shown in \Tm -(T-Tm)/2 (T-Tm)/2 fhwt) = [f (t)o h(t)]w(t) (3) the corresponding time-relative-time function is given b (t, r)=f t+T fwht--2 (4) = Lbf(t,)oob h (t,T)ib (t,) Here o denotes convolution in the t direction. If h(t) is a,p bandlimited function, then the function f (t) = f(t)o h(t) has a time-relative-time function: Figure 1. Extent of the windowed Time-relative-time function. 2.2. The Wigner Distribution The discrete Wigner distribution [3] is written as the discrete Fourier transform of (7) with respect to n for each value of k: -22T >, +-2-2fh 2 2 (8) l2NV. xe where k,m = -N,N -1. By windowing the function in Equation (7) above with the window function: bh (t, ) = b' (t, )oobh(t, " ) (5) b"(t, )= P_^](t)g(r) (9) that is f3 bandlimited in Z and 2/ bandlimited in t. Also, given the function w(t) as 8 time limited, then the function f (t)=f(t)w(t) has a time-relative-time function: where g() = 0, z > TT and Pr_ ](t)= 0, t > (T-Tm) the 2 diamond-shaped region in the (t,r) plane in Figure 1 is limited to the rectangular region: b1 (t, r)=bf(t, r)bw(t, ) (6) that is 8 time limited in t and 23 time limited in Z. Using a discrete Fourier transform to obtain a discrete version of the Wigner distribution from the bf (t,-) function, the sample rate in t of the original signal must be f, > 4,3 in order to satisfy the sampling theorem, i.e., sampling is at twice the Nyquist rate. The sample rate in 7 is f, > 2/3. The continuous bhf (t, r) function, b(,(t, )b "(t, ) = 0, t >, > cT, 2 (10) The discrete implementation of the pseudo-Wigner distribution with a frequency smoothing window function w(k), with length M = 2L -1, w(k) = 0 for kl 2 L is then defined by:

Page  00000003 L-1 -2jkmn PWD n, M = 2 g(n k)p(k)e (N m = 0,..., M where p(k)= w(k)w* (- k) and g(n,k)= f(n + k)f*(n- k) signals. The discrete form of the modal distribution is defined by: L-1 -j22/kl M(n,k)= Rs, (n,l)g (l)e 2L (14) n=-L+l where R,1 (n, 1) = = R, (n - p, )h,(p) is the timesmoothed temporal autocorrelation function. 2.2.1. Cross terms 3. IMPLEMENTATION Given a music signal model as follows: M s(t)= Akej'++ (12) k=1 where k is the partial series index, t is time, and the k th term in the summation represents a partial with constant amplitude Ak, frequency 0k, and phase Ok, the Wigner distribution is: 00O M W, (t,) fb'(t, r)e-ý dT= Z A (Wo - k ) -oo k=1 The modal distribution (MD) class Modal encapsulates all processing involved in the computation of the modal distribution and is modelled on existing time-frequency classes within the SndObj library, namely, the phase vocoder analysis (PVA) class and the spectrogram class (IFgram). These classes are derived from the FFT base class, which provides the mechanisms for short-time Fourier analysis. The FFT class itself is derived from SndObj. Figure 2 illustrates this inheritance structure. I SndOb j k M M + yAkA, cos([ok -~ + -) k=l 1=1 x/ (-k + ) 2 (13) FFT PVA IFgram PVS Modal Figure 2. Inheritance diagram for the Modal class. The partials of s(t) (auto terms) are given by the first term in (13). The second double summation indicates the cross terms, arising from products between partials, which lie between any pair of auto terms. The magnitude of the cross terms is the product Ak Al of the amplitudes of auto terms k and 1 and they oscillate at a frequency, (Wck +w/)/2 equal to the difference between the frequencies of the two auto terms. For strictly harmonic signals, the cross terms form a partial series an octave below the fundamental, resulting in cross terms which fall at the same frequencies of and therefore corrupt the autoterms. Other cross terms occur at partial frequencies not in the original signal. 2.3. The Modal distribution The modal distribution was designed to minimise these cross terms for music signals. The modal kernel consists two different filter functions. The time-smoothing window, hL (p), has the effect of smoothing the cross terms in the time direction, and the frequency-smoothing window, g, (1), implements cross term suppression in cases of frequency modulation. hL, (p), is chosen to be a low pass filter with an upper cut-off just below the minimum frequency spacing in the distribution, this being the fundamental frequency for quasi-harmonic 3.1. The Modal class The design of the Modal class models each process needed in computing the modal distribution function. Figure 3 is a program flow diagram of the processes involved in this computation. It takes as input a sampled sound file and kernel functions as well as other necessary parameters shown in the constructor definition following. signal input TCF -- STCF *- Rotate hLP LP ) DFT MD Figure 3. Stages of Modal distribution computation. Modal::Modal(Table*window, Table* swindow, SndObj* input, float scale, int fftsize, int hopsize,float sr). For the modal distribution computation, although cross term filtering allows for FFT sub sampling at hop periods equal to the filter length, the temporal correlation function must be computed at each sample point. Furthermore, any streaming computation of the Modal distribution needs to take account of the fact that both past and future samples are needed to compute the autocorrelation function at each sample point. Therefore, for each hop period, upon the invocation of the Modal class DoProcess () method, hopsize number of

Page  00000004 samples are buffered in m-_samplesframe implemented as a circular array. Beginning with the first signal sample at point -(T-T,,)/2 (Figure 1) in the tdirection, the number of samples in each successive TCF frame, increases in the T direction as the number of past and future samples grows. This number keeps increasing until the number of samples in each TCF frame reaches the FFT length, 2T,. Up to this point each TCF computation requires an odd number of input samples. From this point onwards each TCF computation requires 2T,. samples. The most important attributes and methods of the Modal class are shown in Figure 4. class Modal: public PVA attributes float* msamplesframe array holds input samples float** mtcfframe array holds TCF frames float *m stcfframe buffer holds STCF frame float** m modframe hold MD frame Table *mstable cross-term filter window methods TCFsamples(int overlap) compute TCF frames TCFconvolv() convolve TCF frames RotateRenum(rotate windowedSSTCF TCFAnalysisg 4 compute FFT of STCF Figure 4. The Modal class. 4. CONCLUSIONS AND FUTURE WORK The streaming, object-oriented implementation of the Modal distribution within the SndObj library described in this paper provides an easily accessible tool for music signal analysis and processing. The class structure of the design makes possible the integration of the Modal routine with many of the tools necessary for sound analysis and modification such as time stretching and vocoding. In particular, the support found in the library for partial tracking and additive synthesis can be used to build analysis-resynthesis programs using the Modal transform. Furthermore, this implementation provides a convenient basis on which to implement other members of the Cohen general class and novel time-frequency distributions derived from specialised kernels. Future work will involve an analysis of the speed of computation of the Modal distribution within the library and of real-time realisations of the routine on various platforms. 5. REFERENCES [1] Pielemeier, W. J., and Wakefield, G. "A highresolution time-frequency representation for musical instrument signals", Journal of Acoustical Society of America", 99(4), Pt. 1, April 1996. [2] Cohen, L. Time Frequency Analysis. PrenticeHall, New Jersey, 1995. [3] Mecklenbrauker, W. F. G., and Classen, T. A. C. M,. "The Wigner Distribution - A Tool for time-frequency signal analysis; part II: discrete time signals", Philips J. Res., Vol. 35, pp. 276 -300, 1980. [4] Rihaczek, A., W., "Signal Energy Distribution in Time and Frequency", IEEE Transactions Info. Theory, Vol. 14, pp. 369-374, 1968. [5] Lazzarini, V. "The sound object library". Organised Sound 5 (1). pp. 35-49, 2000. [6] Poletti, M. A., "The development of a discrete transform for the Wigner distribution and ambiguity function", Journal of Acoustical Society of America", 84(1), pp. 238-252, July 1988. [7] Boudreaux-Bartels, G. F. "Time-Varying Filtering and Signal Estimation Using Wigner Distribution Synthesis Techniques", IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-34, No. 3, June 1986. [8] O'Donovan, J., and Furlong, J., " A Joint timefrequency model of auditory masking", Journal of Acoustical Society of America ", 104, 1998. 401 dB 301 Figure 5. Modal Distribution of Trumpet note F#3 The TCF function is smoothed in the time direction in order to implement cross-term suppression. With streaming it is necessary to buffer a number of TCF frames up to the smoothing filter length. A 2-D buffer m stcffr ame holds these frames and the smoothed result is stored in m stcf frame. The FFT sub sampling determines the hop period or the interval between smoothing operations carried out by the TCFconvolv () method. Before applying the FFT, each STCF frame is windowed with the frequency smoothing window and then rotated by switching the positive and negative halves of the window. Only the real coefficients of the FFT are used to compute the Modal distribution output amplitudes due to the distribution being real valued. The frequency positions are estimated in Hertz from the corresponding bin values. A modal distribution for the lower partials of a trumpet tone (F#3: 193Hz) is shown in Figure 5.