Page  206 ï~~The Analysis and Resynthesis of Sustained Musical Signals in the Time Domain Roy Hung ', N.H.C. Yung 2, P.Y.S. Cheung3 Dept. of Electrical and Electronics Engineering, The University of Hong Kong, Hong Kong. E-mail:- rhung@hkueee.hku.hk ', nyung@hkueee.hku.hk 2, cheung@hkueee.hku.hk 3 Abstract In this paper, a novel algorithm employing polynomial interpolation techniques for the analysis and resynthesis of musical signals, based entirely on time domain information will be presented. This algorithm seeks, for an input signal consisting of a number of periods, a set of features that are common to most or all of these periods. With that, any changes to the amplitudes and the positions of these features, taken across the entire input signal, can be parameterised. Furthermore, the form of the curves joining successive features within one cycle of oscillation are computed such that the resynthesised signal can be a close approximation to the original. This research introduced an alternative method to manipulate sampled musical signals that concentration on the physical structure rather than frequency components, and the data reduction that can be achieved is promising even at this early stage. 1 INTRODUCTION Traditionally, the analysis and resynthesis of musical signals have been carried out based on the so called "perceptual model of timbre". That is, acoustical signals produced by musical instruments are analysed in terms of the function performed by the basiliar membrane of the cochlea, that of separating the incoming signal into frequency bands. Fourier analysis techniques[1,2,3] fits perfectly into this model since it decomposes periodic waveforms into a series of sine and cosine functions of varying amplitudes, frequencies and phases. Data reduction is achieved due to the small amount of information needed to fully specify such functions, and synthesis is made possible by the manipulation of their amplitudes and/or frequencies. In this paper, a novel algorithm is proposed to analyse acoustical signals from musical instruments whose notes can be sustained. This class of instruments can be said to functionally compose of two distinctive parts:- a nonlinear part which interacts with the player and generates the oscillations, and another which amplifies these oscillations into the surrounding space. Here, we concentrate our efforts on the former part (the excitation), because different mechanisms of excitation produce acoustical outputs that are unique to a class of instruments. The reason for basing this research in the time domain is that for the class of signals under consideration, each cycle of oscillation is composed of distinct states, each a result of different physical processes. In order to study the changes to these processes when playing nuances are varied, it is prudent to carry out our analysis in the time domain since such processes can be easily identified, which is not possible under the frequency domain. In this algorithm, a period of oscillation is characterised by a sequence of features, and of the curves joining such features. Data reduction is achieved through the use of interpolation, and synthesis is accomplished through the manipulation of these features and the shapes of the curves. 2 PRELIMINARY CONSIDERATIONS The purpose of this study is to find polynomials which are suitable for interpolating acoustic waveforms. To this end, four different interpolants[4] are chosen: -straight lines, Hermite polynomial, free and clamped cubic splines, and their performance regarding the mean square error (MSE) and number of points used in reconstruction against the allowed interpolation errors were computed. Three cello notes (roughly 70Hz, 140Hz and 275Hz) recorded at 22.05kHz with piezo-electric sensors placed at the bridge were used as test signals. In terms of the number of interpolated points used, the curves decreased most rapidly between 5% and 25% of allowed error, and leveling off after this value. This trend is characteristic of all the interpolants. At 25% of allowed interpolation error, roughly 10% of the total number of samples were used for the 70Hz signal, 20% for 140Hz signal and 35% for 280Hz signal, though Hermite polynomial and clamped splines performed slightly better(around 0.2%). In terms of MSE of the reconstructed signal, Hermite polynomials and clamped cubic splines had normalised MSE values from 30% (275Hz) to 50% (70Hz, 140Hz) lower than that of straight lines and free cubic spline. With interpolation errors larger than 25%, the normalised MSE of straight lines and free splines increased rapidly while those of Hermite and Hung et al. 206 ICMC Proceedings 1996

Page  207 ï~~clamped splines are well behaved. Summarising the above results, Hermite polynomials and clamped cubic splines are chosen as interpolants because of their superior performance in MSE and data reduction. Another factor in favour of Hermite polynomial and clamped cubic splines is that they are guaranteed to generate curves with continuous first derivatives, whereas the other two methods do not. 3 DESCRIPTION OF CURRENT ALGORITHM The present algorithm (Fig. 1) is divided into four main modules:- Pre-processing, Interpolation, Data Reduction and Resynthesis. The functions performed by each step are detailed below. 3.1 PRE-PROCESSING A portion of a sampled cello note is first extracted, and divided into individual periods. The purpose of this module is threefold:- The generation of the initial set of features which are defined as all local maxima, local minima and zero crossings; The removal of insignificant features from this initial set; The identification of common features across the periods under analysis. 3.1.1 FEATURE SELECTION All features are selected with the following criteria:1) Given n samples, for k = 0,1,,... n-1 if f(xk)> f(xkl) and iff(xk) > f(xk+1) (la) then Xk is labeled as a maximum; 2) Given n samples, for k = 0,1,,... n -1 if f(xk) < f(xkI) and iff(xk) < f(xk+l) (Ib) then Xk is labeled as a minimum; 3) Given n samples, for k = 0,1,,... n- 1 iff(xk) > Oand iff(xk+l) < 0 (lc) then Xk+1 is labeled as a negative zero crossing. if f(xk)<Oandiff(xkl) >0 (Id) then xk+1 is labeled as a positive zero crossing. 3.1.2 SPURIOUS FEATURE DELETION The initial sets of features contain some features that are both very close to each other, and differing little in their amplitudes. In order to make the detection of common features easier, these spurious features are detected from their respective sets using the following criteria: 1) A distance measure d is specified in terms of the number of samples, such that given n samples, and O< j<n-l,j<k<n-l:if (xi,)-(xi) < d (superscript s stands for feature) then both xk and x are considered too close together. 2) A group of features that are deemed to be close are then examined for the difference between their amplitudes, given a percentage deviation dev: mean{f(xJ) f(xk )}=- f(x)+ f(x ) f(xJ)- mean{f(x)),f(x )} l00dev mean{f(xj), f(x )} O then x$ and x are considered as insignificant. 3.1.3 COMMON FEATURE SELECTION (2a) (2b) A feature is said to be common if it exist in number equal to or greater than a percentage of the total number of input periods. The search for such features are carried out according to the following steps:1) Starting with the first input period, and for each feature of this period, search all subsequent periods for feature of the same type and within a distance D (in number of samples) from the present position; 2) If more than one suitable feature are found, then the one with minimum distance from the current position is taken to be a match; 3) If a match is made, then the current position is updated, and the search resumes on the next period; if no match is obtained, then this period is skipped, and the search resumes on the next period with the current position; 4) Steps 1-3 are repeated until all the features of all the periods have been searched. The lists of common features are used as starting points for the interpolation of each period. 3.2 INTERPOLATION The purpose of this step is to achieve data reduction, to extract curve parameters for the Resynthesis module, and to parameterise the curves in such a way that their shapes can be compared. To this end, straight lines are first drawn between all common features within a period, and the difference between this and the actual sample values are computed as the difference d(x,): For a curve f(xi) of n points xo,x,..... x~-_, find straight line p$i(x,) = mx; + c where f(x,,_l) -f(xo) m== nq (3) ICMC Proceedings 1996 207 Hung et al.

Page  208 ï~~d(x,) =f(x,) -ps,(x,) (4) Two curves are said to be of the same shape if and only if the following condition applies:For two curves fl (xi) and f2 (xi) consisting of n points, with same starting point f1,2 (x0) = 0, iff (x) = af2 (x;), where a is a real number, for all i =0,1,..n-1 sl x-1 at(x"_,) thenm1-...=am2 (5a) n n and since f,2 (x0) 0=O, pls, (xi) = ap2$, (x,); therefore d (xi) = f1(xi)-p11(xi) = af2 (xi) - ap2s,,(xi) =ad2(xi) (Sb) If two curves have different lengths, we can use the interpolation algorithms to resample them to the same length:iff1 (x,) = af2 (xi), where a is a real number, for all i = 0,1,..n- 1 ford,(xi),0 i<n-1 and d2(xi), 0 <j<m-1 find d2 (Xk ),0 _ k n- 1 such that d2(xo)= d2 (xo) (6a) d2 (x,,1) = d2 (x,_1) (6b) for 1<k <n-2 d2(xk) = d2 (-zij (6c) 3.3 ANALYSIS The function of this module is to parameterise the trajectories formed by the positions and amplitudes of the common features across the input periods using straight line interpolants, following the procedures below:1) Within each trajectory, a continuous region is found which contains no null entries and more than 2 entries; 2) The first and last entry of this region are used as starting points for interpolation; 3) An error percentage is calculated thus:E~x):P(xk )- f(xk ) xl100% (7) ~XkJ f(xk) for all k =0,1,...n-i1; if E(xk) is larger than a specified percentage, then the xk with the largest E(xk ) between 2 interpolated points is chosen as the new point for interpolation; 4) Steps 1-3 are repeated until all trajectories have been parameterised. 3.4 RESYNTHESIS The Resynthesis module takes the simplified feature trajectories from the Data Reduction module, curve information from the Interpolation module and reconstruct the final output according to the following steps:1) For every segment in a period, calculate the new slope of this segment:~ = (Jfor 0<ij<n- I (8) (j-i) 2) The amplitudes and slopes of the interpolated points are scaled according to:- (where m is the original slope) (9) M This scaling is chosen so that the new shape will be the same as the original, as previously defined in Eq.5; 3) If the original curve and the reconstructed curve has different lengths, then the original curve is normalised to the new length using Eq(6); 4) Steps 1-3 are repeated for all input periods. 4 RESULTS AND CONCLUSION Listening tests have been carried out with both the signals from the Interpolation stage and Resynthsis stage, with participants of no special musical backgrounds. The results of these tests suggest that for the reconstructed signals, there is a high level of subjective similarity, though some high frequency components are absent, due to the smoothing effect of the interpolation algorithms, which can be seen in Fig. 2. More significantly, the subjective similarity between the reconstructed signals and resynthesised signals are also high, even allowing for large (30%) interpolation error in the Analysis module. Fig. 3 shows the data reduction of the present algorithm, with pre-processing applied. The results show that the number of points used are significantly reduced. For example, at 25% allowed interpolation error, only around 5% of the total number of samples were used for the 75Hz signal, roughly 8% were needed for 140Hz signal and around 13% for the 275Hz signal. As before, the greatest increase in data reduction occurs between 5% and 10% allowed error, which gradually levels off after 25% allowed error. The MSE of the reconstructed signals with preprocessing are higher than those without preprocessing (Fig. 4), which is due to the much smaller number of interpolated points when pre-processing is applied. The general trend is an approximately linear increase from 70Hz to 275Hz, with the rate of increase greater as the interpolation error is relaxed. For example, the rate of increase for 25% error with Hung et al. 208 ICMC Proceedings 1996

Page  209 ï~~pre-processing is roughly 8 times higher than 15%, also with pre-processing. The significant information we can conclude from Fig. 3 and Fig. 4 is that for low frequency signals, a high interpolation error is justifiable due to the small increase in MSE, but at higher frequencies, there is a significant tradeoff between the data reduced and the resulted MSE. REFERENCES:[1 ] J-C. Risset & M. Matthews. "Analysis of Musical Instrument Tones", Physics Today. 1969 vol.22 no.2 page 23-30. [2] J. Strawn. "Approximation and Syntactic Analysis of Amplitude and Frequency functions for Digital Sound Synthesis", Computer Music Journal vol.4 no.5 Fall 1980 page 671-692. [3] M-H Serra, D. Rubine & R. Dannenburg. "Analysis and Synthesis of Tones by Spectral Interpolation", Journal of the Audio Engineering Society vol.38 no.3 1990 March, page 111-127. [4] D. Kincaid & W. Cheney. "Numerical Analysis: Mathematics of Scientific Computing", Brook/Cole Publishing Company, 1991, Pacific Grove, California, USA. Signal Fig 1 graphical representation of the analysis/resynthesis algorithm Ab "'l -,, II 1. l Fig.2:- (a) I cycle of the 75Hz test signal; (b) Hermite interpolation with 25% interpolation error; (c) Clamped cubic spline with 25% interpolation error. Fig. 3 Data reduction using H-ermite and clamped spline, for the 3 best Fig. 4 MSE for the 3 test signals, using Hermite Interpolation - signals...-. _Hernie 740..,lam ped es 7%. - Hrot,,14,... "..l...ed.pl,.1401*..l.e e275F*.pclampedopyn, 275 Allowed lnotarplafkon error (% _. * w ih e-processing. 5% lopoloerror. wwihout r-roessng, 5% mt~rpolmloo error ~.... -wipr-proceseing. 15% lbrpolaowooerror -.,...w lhoot pre-prooeeemg. 15% loorpob'aioo rror -.... withpre-proocesng. 25% interpolaio error..... w 5 outpre-processing. 25% loorpoltlon error 10 10 trequency (Hz) I ICMC Proceedings 1996 209 Hung et al.