Page  00000001 Statistical Modeling of Sound Aperiodicities Shlomo Dubnov and Xavier Rodet Analysis/Synthesis Team, IRCAM, Paris 75004, France Abstract In this work we investigate aperiodicities that occur in the sustained portion of a sound due to phase synchronous versus unsynchronous deviations of the partials. By using an additive sinusoidal model, correlation statistics and phase coupling effects are considered for various instruments. The phase coupling is shown to be an important characteristic of musical instrument. These analyses are compared to results from Higher Order Statistical analysis and are shown to be equivalent. General applications for sound synthesis are discussed and a simulation of phase fluctuations in string instrument is presented. 1 Introduction Acoustical musical instruments which are considered to produce a well defined pitch, emit waveforms which are never exactly periodic. The aperiodicities supposedly originate in some not well known fundamental mechanism of their sound production. This effect, which for time scales shorter than 100 or 200 ms is beyond the control of the player, is expected to be typical of the particular instrument or maybe of the instrument family. Among the many mechanisms of possible deviations from periodicity [4][9][10], we analyse two contrasting conditions which appear to be important for sustained musical instruments: phase synchronous deviations of the partials which are equivalent to period to period variations of otherwise unchanged waveform versus phase unsynchronous deviations, which also modulate the shape of the signal on period to period basis. In an earlier work we have shown that the particular aspect of coherence of phase fluctuations is strongly related to non-linear properties of the time series model of the signal. These properties are measured by Higher Order Statistics (HOS) or polyspectra [6] [7] and were shown to be important for characterisation of musical instruments in the sustained portion of the sound[l]. It should be noted that the particular statistical property of coherence/incoherence can not be easily revealed by the other analysis methods. In order to allow us for a better understanding of the exact meaning of HOS for musical signals and avoid possible sources of errors such as residual noise, amplitude modulations, or effects of the decorrelating filter, we used in this work the sinusoidal signal representation which allows us to take a detailed look into the separate behaviour of each of the sound components. Combining HOS results1 with a parametric model of sound, we extend the earlier research, both theoretically and practically. We statistically analyse the fluctuations of the sound parameters that occur during the sustained portion of the sound for the sinusoidal (additive) sound representation: the Second Order Statistics (SOS) of the sinusoidal model phases are compared to the HOS, specifically in respect to detection of the Quadratic Phase Coupling (QPC)2. Moreover, for the case of string instruments, we discuss a possible source-filter model that could explain the origin of the phase fluctuations. Having a better understanding of the underlying random modulation phenomena we can measure modify and partially simulate these phenomena for the purpose of resynthesis. 2 Overview of the paper The following issues will be addressed in this paper: * phase synchronous and unsynchronous aperiodicities. 1 Since parts of this work are continuation of a research on HOS which were reported earlier papers, the reader is recommended to look at these works in conjunction one with another. For the sake of continuity, many of examples appearing in papers [1],[2] are exploited in this paper also. 2It is important to note that the context of the use of SOS and HOS terms in this work is different for the time and frequency domain representation of the signal. While SOS is related to the statistics of the sinusoidal model, i.e. correlations between phases, or statistics calculated in the frequency domain, the HOS on the other hand refers to the statistics of the time domain signal. The HOS reveal themselves in the frequency domain as phase coupling phenomena

Page  00000002 """""""""""""""""""" * SOS and HOS analysis of a Sinusoidal Model of sound. * Phase fluctuations and the phenomena of Phase Coupling. * Comparative analysis of different instrumental sounds * source filter model for the phase fluctuations in the violin. Results of real and synthetic sound analyses will be detailed in the paper. Examples of sound synthesis will be demonstrated in the presentation. 3 Phase synchronous and unsynchronous aperiodicities Among the many possible mechanisms of deviations from periodicity that may occur in the sustained portion of a pitched sound, we analyse two extreme cases: 1. application of a synchronous and proportional random modulation to the phases of each partial 2. application of random and unsynchronous phase modulations to each phase. It can be seen that in case 1 the instantaneous harmonicity relations between the partials are preserved and this corresponds to local time stretching/contractions that effect the period of the signal without otherwise altering its wave form. As demonstrated in figure(2) for the case of a signal constructed from 8 equal amplitude harmonic cosine functions3, the small period variations, although not changing the basic waveform, significantly spread the spectral peaks of the partials, turning them almost into noise for partials higher then 3 or 4. For the unsynchronous case (case number 2), we observe, for the same signal, that the original pulse like shape of the waveform is completely distorted, giving visually the impression of pulses submerged in high level noise. This noise effect is also seen in the signal's spectrum. This two types of behaviour seem to be typical of different instruments, as will be shown below. 4 SOS and HOS in the context of sinusoidal model In order to understand the phase behaviour of the partials of different instruments, and observe the ef3i.e. a band limited pulse train. Figure 1: Syncronously modulated harmonics. The waveform is preserved and the period is jittered. Note also the spread in bandwidth of the partials. feet of phase synchronicity, we apply HOS analysis to a sinusoidal (additive) model. Taking the phases of a signal, derived by additive analysis, we look at the instantaneous harmonicity among groups of partials. For a triplet of harmonically related partials i,j and k = i + j, a "synchronous" phase behaviour means that the respective phases i,,sj, k observe the following relation #i + sj - 0k = 0, i.e. that any deviation that occur for 5i andjJ sum up to identically occur in ^k, up to a constant additive factor of the initial phase of each partial. In case a small error between the respective phases occurs, this error will be either bounded (and periodic) or propagate in time, passing through all phase values in the [0, 2r] range. A convenient method to evaluate the effect of this phase error is to integrate it over time as a complex power of an exponential d3(i,j) =< ei+j- >t: 1 NF 1(1) where <>t means time average for continuous signal and NF is the number of analysis frames from additive analysis. In case when the exponent argument is identically 0, d will be equal identically to 1. If the phase difference oscillates in a limited range, the resulting d will converge to a value between 1 and 0. If the error spreads on the whole [0, 27r] range, d converges to 0 in a rate proportional to the rate of the propagation of the error. In terms of HOS theory, one might show that equation (1) equals to the third order moment of a

Page  00000003 101 m -,....................... the pairwise correlation properties (Second Order of the same signal. One finds, surprisingly enough, that these are complementary properties that point at different properties in the sound's behaviour. In order to be able to calculate the correlation properties of the the phases, a careful unwrapping must me done. This involves accurate estimation of the fundamental frequency for the unwrapping and also subtraction of the ideal estimated phase for the unwrapped values (detrending), so as to get the phase fluctuations. This step is necessary for actually looking at the phase variations and it eliminates effects such as vibrato or pitch deviations. The QPC was calculated according to equation (1). The following figure presents the phase correlation and QPC for Cello and Trumpet sounds. Cello Additive Phase Correlation.. ^,,, r.. Figure 2: Non syncronously modulated harmonics. The waveform is not preserved. Spread in bandwidth of the partials occurs. decorrelated4 signal x(t), which amounts also to the integral over signal's bispectrum 5B (w1, w2) M3 (2 F2 f/fW / j12)d1d2 (2) (27= () lir 1 T = lim - (t)dt T -+00 T 0J Similar equations hold for the fourth order moment as well. Based on the above equivalences we can compare directly the HOS properties that are calculated as functions of time versus the calculation by summing the different exponential phase errors d3(i, j) in equation (1). Naturally, equation (1) is also an approximate method to calculate the bispectrum of a sinusoidally represented signal, with d3(i, j) B(w i,j) with wi = i wo and under the assumption of a harmonic signal with fundamental radial frequency wo. 4.1 Additive SOS and HOS analysis In this section we will demonstrate how polyspectral (HOS) properties of the signal emerge as the result of phase coupling (or thereof how they are destroyed when lacking this necessary coupling) between the partials. This triplewise phase coupling, also called Quadratic Phase Coupling (QPC) is compared to the 4i.e. a signal that has all it's partial amplitudes approximately equal 1. The differences in,(t) that occur due to differences in decorrelation methods will be discussed further in the text 5B (wl,wO ) =< X(w )2(wli)*(wi *w,) > Figure 3: Cello (left) and Trumpet (right) SOS and QPC analyses. Phase SOS ploted on top, QPC on the bottom. The white corresponds to 1, or high correlation/coupling. Black corresponds to 0. As can be seen from the figures, the two instruments exhibit a very different SOS and QPC behaviour. Apparently for the cello, more correlation exists between the phase fluctuation with much lesser phase coupling, while for the trumpet an almost perfect coupling exists with no phase correlations. This point, seeming paradoxical at first glance, must be properly understood. As mentioned above, the phase correlations are meaningful only for unwrapped and detrended phase signals. On the other hand, QPC does not need detrending or unwrapping since it is directly calculated as phase difference, having thus the phase increments due to frequencies of the partials cancelled out (under the assumption of approxi

Page  00000004 mately harmonic signal in terms of frequency). Thus, the effective signal components in both cases (QPC and Phase SOS) are the instantaneous deviations from perfect harmonicity. Since for the trumpet, the sound is almost perfectly harmonic, these deviations are close to zero (QPC s 1 and the error signal left after unwrapping becomes a small uncorrelated noise (Phase SOS m 0). For the cello on the other hand, the deviation from harmonicity are large and out of phase. This causes the QPC to average out to zero for most of the partials pairs, but this still exhibits a significant correlation for most of the partials since these deviations are governed by a common vibrato. Actually the determining factor in the correlation analysis is the relative shift in the phase deviations, which makes the cello harmonic partials rotate in or out of phase. This is significantly different from the QPC effect which is sensitive to actual canceling out of the phases for partials that are perfectly harmonic, up to some small and bounded fluctuations. For a more detailed analysis of this effect, let us take a closer look at the phases of these two signals. 4.2 A detailed look at phase fluctuations and coupling Figure 4 describes the phase behaviour of the cello and trumpet signals. Detrended phases of partials for the two signals are presented. Phases of partials 2,4 and 6 of Cello Phase difference (phase 2 + phase 4 - phase 6) Phases of partials 6,8 and 14 of Cello Phase difference (phase 6 + phase 8 - phase 14) Figure 5: Phase behaviour of various partials of the cello. Top - partials 2,4,6. Bottom - 6,8,14. The left side shows that partials and the right side is the phase triplet difference (see text). figure 6 Decay of QPC for partials (2,4) of Cello Decay of QPC for partials (6,8) of Cello Figure 4: Detrended phases of partials of the cello (top) and trumpet (bottom) signals. Note the difference in the absolute value of the phase fluctuations. Note the difference in the absolute value of the phase fluctuations. The trumpet deviations from harmonicity are two orders of magnitude smaller then the cello. In order take a look at the QPC behaviour, the decay of d3(i, j) is presented as a function of time, as the phase difference error develops. Figure 5 shows the phases and the phase difference for two pairs of partials for the cello: partials (2,4) (i.e. QPC between 2,4,6) and (6,8). Noticing the difference in the range of the difference function, the QPC decay accordingly follows in Figure 6: QPC of the cello partials for pairs (2,4) (left) and (6.8)(right). The graph shows the decay of QPC as a function of analysis frames (time). Comparing this to the trumpet, an astounding difference is found. Figure 7 shows the phase difference and QPC decay for partial pair (6,8) of the trumpet. Although the sum partial 14 is rather high (around 3654 Hz), it is perfectly synchronized in phase with the lower ones. 5 Kurtosis Calculation Due to the importance of the phase coupling phenomena, we present in this section some comparative kurtosis values for various instruments. These values are compared to similar kurtosis values that were reported in earlier work [1] using the time domain calculation. The detail concerning the methods of its calculation for sinusoidal model are deferred till the appendix.

Page  00000005 Trumpet Phase difference (ph. 6 + ph. 8 - ph.14) Decay of QPC for partials (6,8) of Trumpet Additive Model Kurtosis I 10 68 6 xTpt.S )Tpt.H )Trbnl )Trbn2 >FrHor Q.Cla,n I vla t uat Figure 7: Phase difference and the resulting decay in QPC of the trumpet. There is essentially no decay! Figure 8 presents the values of kurtosis calculated in the additive model. The original kurtosis/skewness are presented in figure 9. Comparing the kurtosis values reveals that while for most of the instruments the values remain close in the two methods, some instruments (mainly the woodwind family) undergo greater changes. One of the possible explanations for this phenomena is the influence of amplitude to the kurtosis of the decorrelated signal calculated by time domain method. Since in that case the decorrelation is done by inverse filtering of the signal by a filter that matches the spectral envelope (lpc order 16), this decorrelation method does not amplify to magnitude 1 weak harmonics that are between other strong harmonics, which is the case for the odd harmonics in the clarinet sounds. Thus, the time domain kurtosis for the clarinet misses most of the odd harmonics and its value is halved relative to the additive method of kurtosis calculation. 5.1 Subband dependence of Kurtosis values A simple method that we found to be useful for characterisation of the kurtosis behaviour of various signals is the growth of kurtosis values as a function of signal's subband range. This analysis can be done in the additive model representation by progressive summation of the partials in the d4(i, j, k) calculation6. This can be done also effectively in the time domain by successive decimation of the decorrelated signal. These results (using the time domain calculation) are summarized in figure 10 for four signals: Trumpet, French Horn, Oboe and Cello7. An important observation evident from the above graphs is that the kurtosis is not a monotonous function of the frequency band. Since polyspectra (dk functions) are complex expressions, the integral over 6See Appendix. 7The respective sample names corresponding to the earlier figures are Tpt.S (S - soft), FrHor, Ob.nV (nV - no vibrato) and Cello. Additive Model Kurtosis (Zoom In) S-.... 1.8 1.6 1.4 1.2 1 0.8 0.6 >,.Cia >Ob.nV )T.Sax >A.Sax >Ob.Vi >C.Fag >Clari >B.Sax >Fagot Cello A/ioln agot >/iola Flute 0.4 0.2 Figure 8: Kurtosis values calculated using the additive representation. The relative horizontal displacements (along the x-axis) among the different instruments is done for purpose of clarity of reading of the graph, and has no meaning). all frequencies (sum over all partials) depends on the relative phases of the polyspectra (dk) values among the different frequencies (partials). This effect, being important for determining the value of the higher order moments and specifically the kurtosis, seems to be an additional characteristic to the above phase coupling phenomena. It might be interesting to consider the relation between this effect and the constancy/variability property of the HOS which plays an important part in the time series Gaussianity/linearity/non-linearity test [2] [3] [8].

Page  00000006 14 STpt.S 12 2. o 10 W xTpt.H ZTrbn Zoon In. 6 -. xT rbn2- rH xFrHor 4- xA. ax - 0-. - -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Skewness good understanding of the phase behaviour is morphing between sinusoidally modeled sounds. In the following figure we present analyses of clarinet sounds in pp, mf, ff playing conditions. xragot ute Viola ax CxCag xCel4oiS xClari -0.6 -0.4 -0.2 S 0.2 0.4 0.6 Skewness Figure 9: Kurtosis values as calculated in the time domain, with decorrelation done by inverse filtering of the spectral envelope. Kurtosis dependence on subband 1 1 1,1 14 12 10 8, 6 Trumpet French Horn Oboe Cello IIII i jls 1:,l I I^ 74, 101, 201, SOW 4010 S000 6010 70.0 800, 90.0 Xspect: spectrum from cms~ffa23nii aiff................................................................................................................................................................................................................................................................................ I.......................................................................................................................................................................................................:..................................................................................................... < 0 1000 2000 3000 4000 5000 6000 7000 frequency (Hz) 8000 Figure 10: Kurtosis as function of subband for four signals: Trumpet, French Horn, Oboe and Cello. 6 Application to synthesis and sound transformations Audibly interesting effects are achieved by modifying the phase behaviour of a real signal. For instance, phase interchanging between two extreme instruments results in a sound that can hardly be identified as either one of the two original sounds but whose timbral effect on one hand and the finite temporal behaviour on the other, are clearly typical of each one of the originals. So, taking the phases of the trumpet with the spectral amplitudes of the cello results in a very synthetic sounding cello, that lacks all the "grainy" properties of cello sound. Vice versa, applying the cello phases to the trumpet gives a result that sounds very typically as cello, in terms of the vibrato and the phase behaviour, but whose "color" is very "trumpetish". Another important application that requires a Figure 11: Clarinet's F3 (sounding) in three different dynamics: from left to right: ff, mf, pp. One can note the difference in energy and the "noise floor" level from the three spectra. As seen from the spectrum, the "noise floor" for the pp sound starts as early as the eighth partial, although in terms of amplitude at least some of the higher partial's contribution to the resulting timbre is still significant. Looking at the ff sound we observe that nice partials exist up to the 40th partial. Trying to morph the pp and ff sounds, a close approximation to the mf spectral amplitude can be achieved by doing properly weighted average between amplitudes of the ff and pp sounds. In terms of the fine temporal behaviour, though, the situation is completely different. The frequency morphing fails due to the differences in the frequency variations (jit

Page  00000007 ters). In the morphed result we observe a strong jitter of the mid-high partials that occurs due to the interpolation of the frequency behaviour of the strongly jittered partials (partial close to the noise floor) in the pp sound with the rather stable and strong ff partials. This creates an undesirable effect which was not present in the original mf sound. 7 Modeling of the phase unsynchronous phenomena Trying to understand the origin of the very unsynchronous behaviour of the phases in string instruments, we have taken a close look at the temporal variations of the cello signal. Since such a sound could be in principle modeled by a harmonic comb excitation, the origin of the strange phase behaviour could be assumed as a result of filtering a slowly varying frequency excitation8 by the filter of the instrument body. If the instrument's body response at the frequencies of the partials is composed of very close and narrow peaks, two situations might occur: 1. Relative phase shifts of close to 7 could exists between partials when the peaks of the body filter have opposite phase behaviour. 2. A jump in phase would occur for a single partial when it passes through very closely located peaks of the filter body. The first case could be simulated by having harmonic time varying partials which move on the opposite slopes of very narrow resonators [5]. For the second case, a single time varying sinusoid excites successively a pair of very close and narrow filters. having both peaks present when excitation is between the two filters. In terms of phase, a jump could occur again in the middle region when descending on one and ascending on the other filter comes to the point of phase difference close to 27. We expected, at least for the more significant aspects of the second case, this effect to be visible in the spectrum as well. Figure 13 presents a high resolution analysis of a cello sound around its fifth partial. Surprisingly enough we find that instead of having one peak, we have two very close peaks with their average frequency lying at the expected harmonic. The additive analysis in such a case would capture this as a singel parital with widely varying phase. 8This could be a harmonic signal which as not a perfectly stable fO, such as having a vibrato................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ X X I........................................................................................................................................................... -2500 1'3l "11 1" '11" 1 '11 " " '1"'11 1 '" 1 1 '' " l "' IJ 0.3 0.4 0.5 0.6 c Figure 12: The occurance of double spectral peaks instead of a single partial, in a real cello signal (recording with a close microphone to eliminate room effects) Simulation of this phenomena was done using a perfectly harmonic excitation whose fundamental frequency variation was taken from the fundamental frequency analysis of the original cello, and a bank of closely spaced (30 Hz) and narrow band (BW = 10 Hz) filters, put in the middle of the vibrato range for each of the simulated harmonics. The following result demonstrates the phases and the QPC of the simulated signal for partials pair (2,4). phases 2,4 and 6 of the simulation Decay of QPC (2,4) of the simulation Figure 13: Phases 2,4,6 and QPC decay of the simulation. 8 Extentions and Further Work One of the future goals of this work is formulating a general statistical model for statistical variations occurring in sustained sounds. The presented results suggest a model that would incorporate both

Page  00000008 SOS and HOS information, i.e. SOS modeling of the amplitude and fundamental frequency modulations, with HOS modeling of the phase coupling and fine temporal frequency phenomena. In order to achieve this more complete model, SOS behaviour of the amplitudes and also cross correlations between amplitudes and frequencies must be considered. Acknowledgments We would like to thank Marcelo Wanderley and Dominique Virolle for their assistance in this work. Appendix: Calculation of the Kurtosis in the Additive Model Kurtosis of a time domain signal is defined as 74 = p4/ /4, which is the variance normalized version of the fourth order moment p4 = E(x - Ex)4. The calculation of Kurtosis for the Additive Model is done by calculating the sum over all partial triplets i=1 =l1 -(i+j+k) > d4(i,j,lk) k=l since the integration in the real signal's case is both over the positive and the negative frequencies, this adds factor 8 to the sum. Since also the contribution of each positive and negative phase component in the real signal is 7, this gives an additional 1 factor which cancels out. Nevertheless, one must note that there is one important symmetry that is missing: this is the case of triplets that have two indices equal but with an opposite sign. For instance, if j = -k, then q;i(n)+qj (n)+$k (n) -ql (n) = 0 since qj(n) = --k(n) and qi$(n) = -qi(n). It means that for the 3 regions that have one of their axis negative, a factor of Q/2 is added. Since the fourth cumulant definition requires the subtraction of -3. R (0), which equals also 3. -, these factors cancel out. To conclude this series of arguments we summarize that the T calculation of equation (3), which is performed by summing d4, is approximately equivalent to the fourth order cumulant at thus equals Kurtosis - 3. References [1] S.Dubnov, N,Tishby, D.Cohen, Investigation of Frequency Jitter Effect on Higher Order Moments of Musical Sounds with Applications to Synthesis and Classification, in the Proceedings of the Interntational Computer Music Conference, Hong-Kong, 1996, revised and extended version to appear in Journal of New Music Research. [2] S.Dubnov, N,Tishby, Testing for Non linearity and Gaussianity in Musical Signals, Proceeding of Journees d'Informatique Musicale, Caen, 1996. [3] M.J. Hinich, Testing for Gaussainity and Linearity of a Stationary Time Series, Journal of Time Series Analysis, Vol. 3, No.3, 1982. [4] M.E.McIntyre, R.T.Schumacher and J.Woodhouse, Apperiodicity in bowed string motion, Acustica 49, 13-32, 1981. [5], S. McAdams, X. Rodet, The role of FM-induced AM in dynamic spectral profile analysis, in H. Duifhuis, J. Horst, H. Wit (eds.), Basic Issues in Hearing, Academic Press, London, 1988, pp. 359-369 [6] J.M. Mendel, Tutorial on Higher-Order Statistics (Spectra) in Signal Processing and System Theory, Proceedings of the IEEE, Vol. 79, No. 3, July 1991 [7] C.L. Nikias, J.M. Mendel, Signal Processing with Higher-Order Spectra, IEEE Signal Processing Magazine, July 1993 [8] M.B.Priestley, Non-Linear and Non-Stationary Time Series Analysis, Academic Press. 1989. [9] R.T.Schumacher, Analysis of aperiodicities in nearly periodic waveforms, Journal of the Acoustical Society of America, 91 (1), January 1992. [10] P.Vettori, Fractional ARIMA Modeling of Microvariations in Additive Synthesis, Proceedings of the XI Colloquium on Musical Informatics, Bologna, 1995. with Q being the number of the partials and d4(i, j, k) defined by NF d4 (i, j, k) = -, e e(n)+g j(n)+sk()- () (4) n=l with (qi(n) being the different phase values, I = i + j + k and NF is the total number of analysis frames from additive. The third sum in equation (3) is limited to Q-(i+ j+k) to avoid summing unexisting partials higher than Q. Although this expression seems very similar to the trispectral equation RI(0,0,0) (2.)3 X ( ol)X ( o2) ~(a~ir>~i~S 77 X(w3)X* (wcl + C2 + wg)dwl d2dw3 - 3.R (0) with X(w) being the Fourier Transform of 2(t) and R (0, 0, 0) and R |(0) being the fourth and second order cumulants, respectively. Since we are dealing with real signals, each positive frequency of X(w) has its negative counterpart with negative phase. Thus, the actual kurtosis calculation takes into account both the negative and positive phase cancellations, which need/can not be calculated in the additive case since all phases are forced to be positive. Moreover,