Page  00000465 CAUSAL/ANTICAUSAL DECOMPOSITION FOR MIXED-PHASE DESCRIPTION OF BRASS AND BOWED STRING SOUNDS Nicolas d'Alessandro, Alexis Moinet, Thomas Dubuisson, Thierry Dutoit Faculte Polytechnique de Mons (Belgium) Circuit Theory and Signal Processing Laboratory ABSTRACT In this paper we present a new method for the decomposition of musical sounds into causal and anticausal contributions. The algorithm is based on a particular separation of zeros of the Z-Transform (ZZT) of signal frames, currently used in speech processing for glottal source parameters estimation. Acoustics of two particular continuous interaction instruments (CII) - trumpet and violin - is discussed and the use of a mixed-phase model for synthesis is proposed. First results are presented and relevance of extracted causal and anticausal contributions is evaluated. 1. INTRODUCTION It has historically been considered that only magnitude spectrum information was relevant in order to correctly characterize perceptual aspects of a sounding system: energy in different frequency ranges, periodicity of the partials, noise shape, etc. Based on this assumption, it was supposed that every sound signal could effectively be approximated by the response of minimum-phase digital filters to impulse plus noise solicitations. This approach has characterized both speech and music subtractive (or so called source/filter) synthesis for a long time [1, 2]. However last years research in speech processing revealed that minimum-phase representation dropped an important part of the excitation information [3]. Indeed glottal flow waveforms present typical "divergent" characteristics, due to physical properties of the larynx. Therefore a stable minimum-phase model can not produce that kind of waveforms. Consequently anticausal (or maximumphase) components have to be introduced, in order to locally model divergent oscillations in a steady way [4]. This mixed-phase representation of speech signal is illustrated in Figure 1. glottal flow derivative vocal tract response speech p1 200 - - ------- - --- - 21 fl -5n3 20 43 60 83 100 " 3 "20 4S 6 6 5too 50 100 i [I C 0 Figure 1. Speech frame (right) as the convolution of anticausal (glottal flow derivative, left) and causal (vocal tract response, center) components. In this paper, we will describe how it is possible to extend the mixed-phase approach to some typical continuous interaction instruments (CII): brass and bowed string. The aim of this work is to better characterize CII waveforms through a coherent magnitude and phase spectral representation. In section 2 mixed-phase aspects of CII sounds are justified. Then a causal/anticausal decomposition method based on zeros of the Z-Transform (ZZT) is described in section 3, and results on typical CII sounds - trumpet and violin - are presented in section 4. Finally mixed-phase synthesis issues are discussed in section 5. 2. CAUSALITY OF SUSTAINED SOUNDS Closer observations about the physics of vocal folds lead us to consider that some kinds of divergent oscillations can appear in other vibrating systems than voice, and particularly CII. Indeed vocal folds movements can be seen as sequences of two generalized phenomena. On the one hand, an opening phase: progressive displacement of the position of the system, from its initial state, resulting from a combination of continuous external forces and inertia reaction. On the other hand, a closing phase: sudden return movement, appearing when the previously constrained system reaches its elastic displacement limit. We can easily emphasize this opening/closing sequence in typical CII excitation mechanisms, like brass or bowed string. Analogies between vocal folds and lips at a mouthpiece are particularly clear. High pressure inside the mouth creates same constrained displacements and quick returns of lips [5]. Modulations are achieved by the embouchure of the musician. Moreover pressure measured at the mouthpiece shows similar anticausal aspects as glottal flow [6] (cf. Figure 2a). Literature related to bowed string modeling assumes that the bow-string interaction follows a periodic and distributed stick-slip scheme. Recently Serafin proposed a dynamic elasto-plastic friction model [7]. This approach gives a bristle interpretation of the friction mechanism, which is itself locally represented by a "spring and damper" system. Thus stick-slip sequences should imply anticausal oscillations (cf. Figure 2b). 3. ZZT-BASED DECOMPOSITION In this part we describe a method that Bozkurt et al. [8] successfully applied on speech for separation of causal 465

Page  00000466 o timel Figure 2. Pressure at the mouthpiece of a trombone (a) and relative string-bow speed for violin (b), revealing particular anticausal (divergent) components [5]. and anticausal components. The method is based on the separation of zeros of the Z-Transform (ZZT) inside and outside the unit-circle. Our purpose is to extend and generalize this method to typical CII sounds such as trumpet and violin. 3.1. Definition The Z-Transform polynomial X(z) for a discrete time signal x(n) with length N is defined as: 3.2. Causal/Anticausal Separation In section 2 we justified the use of a mixed-phase model in the production of typical CII sustained sounds. According to this model, these sounds can be represented by the filtering of a periodical and mainly anticausal source signal by some causal system. If we display the ZZT of a steady causal system in the Z-plane (complex plane), we can see that these roots lie inside the unit-circle while the Z-plane representation of an anticausal signal show roots only outside of the unitcircle (cf. Figure 3). Therefore we can easily separate both contributions by sorting the ZZT of a signal into two sets, the inner (Z ) and the outer (Zo,) roots. Then we can rebuild the DFT of the two contributions, X, the causal contribution, and Xa the anticausal contribution, separately using respectively equations (2) and (3). NO-1 m= 1 Xa (e3 ) e _(j )(No~l) -1(e3 m=1 Zm)J (2) Z;o) (3) N-1 X(z) = x(n)z-" n=O N-1 Gz-Nl J7Jl(z - Zm) m1l where G is equal to the first non-zero value of x(n). The Zeros of Z-Transform are the roots (Zm) of the ztransform X (z). They can be presented in the complex plane using either cartesian or polar coordinates. Figure 3 illustrates the ZZT of a typical mixed-phase signal in cartesian coordinates. real axis ooo 00 0 0 o00 0 0 0 0000000 00000% 0 o 0 00 00 o 0o 0 0 0 0 00 0 0 0 0 0o 0 0 1 00 1 01 Figure 3. Cartesian representation of zeros of the ZTransform of a synthesized mixed-phase (both causal and anticausal contributions) signal frame. From these DFT representations, it is then straightforward to recover the time evolution of these components, by applying an Inverse Fast Fourier Transform (IFFT). 3.3. Data Preparation In order to properly achieve the ZZT-based separation, as explained in the previous section, we first have to respect some preliminary conditions. First all analysis frames have to be centered on the Return Phase Onset (RPO) of the anticausal contribution'. We used the algorithm described in [9] to detect those RPOs for every period in CII sound recordings. The next important point is that the analysis window length should be about two times the fundamental period of the signal (2To) at considered RPO. Finally proper windowing such as Blackman or HannPoisson should be used. Indeed the choice of the window has the strongest impact on the position of the ZZT [8]. 4. RESULTS 4.1. Database In order to evaluate decomposition possibilities on typical CII sounds, a large amount of recordings have been collected, targeting two instruments: trumpet and violin. Trumpet sounds were recorded at TCTS Lab. Recording equipment and conditions were formalized. Sound production techniques were commented by the player to allow us to emphasize eventual correlations. Violin sounds 1 In speech analysis, this particular instant is usually named the Glottal Closure Instant (GCI). As a physical source signal could not be justified identically in CII sound production, this term had to be generalized. 466

Page  00000467 are part of the database from Iowa University Electronic Music Studios [10]. All files have been recorded and analysed in CD quality: 16bits / 44100Hz. 4.2. Trumpet 4.2.1. Embouchure effect In a first time two kinds of trumpet sounds were analysed. The first one has been identified by the player as a lax (also opened, round) production mechanism, the second one as pressed (also closed, thin). A frame was selected in each sound and results of the decomposition are presented in Figure 4. \ (a) o(b) frequency (Hz) frequency (Hz) Figure 5. Spectral envelopes of anticausal (a) and causal (b) contributions, for trumpet sound production with lax (solid) and pressed (dashed) embouchure. (solid) and pressed (dashed) embouchure. 10 0 100 200 300 0 100 200 300 time (samples) time (samples) Figure 4. Diagrams (a) & (c) show anticausal parts, diagrams (b) & (d) causal parts of two different trumpet sounds: first line is lax, second line is pressed sound. In time domain anticausal and causal contributions show similarities with typical speech decomposition results (cf. Figure 1). Indeed anticausal waveforms look like truncated unstable oscillations, as described in [4]. The same way, causal parts can be interpreted as the impulse response of a linear all-pole filter. The difference between the two kinds of production is more obvious in spectral domain. In Figure 5, spectral envelopes of above mentioned signals (lax/pressed decompositions) are presented. We can see that stressed production is characterized by a shift of the anticausal formant 2 to higher frequencies. In causal part, we can see more energy in high frequencies, while the causal formant remains at the same position. 4.2.2. Intensity effect In a second time, a longer sound of trumpet, corresponding to a continuous timbre modification, was analysed. The player was asked to produce an increasing-decreasing intensity. In order to emphasize the spectral impact of this performance, two spectrograms were computed. They show the evolution of the magnitude spectrum of anticausal (Figure 6a) and causal parts (Figure 6b) of the sound. 2 Using "formant" in this context, we are also generalizing terms coming from speech processing: glottal formant and vocal tract formants. Figure 6. Normalized spectrograms of anticausal (a) and causal (b) contributions of a trumpet sound corresponding to an increasing-decreasing intensity. These spectrograms illustrate that the increasing intensity performed by the player provokes a displacement of both anticausal and causal formants to the higher frequencies. In the context of our mixed-phase approach, the typical brassy effect - clearly remarkable when trumpet is played loud - can thus be precisely characterized by specific movement of anticausal and causal resonances. 4.3. Violin As for the trumpet, the sound of a violin can be decomposed by the ZZT-based processing. It also shows some similarities with speech decomposition. Anyway, as we could not collect a large and adapted expressive database for this instrument, we only validate the method. Correlations between decomposed signal features and bowing techniques are planned as further work. Results of the decomposition of a violin frame are presented in Figure 7. 467

Page  00000468 (a) 1 (b) S0 -1 ____ 0 50 100 150 200 0 50 100 150 200 time (samples) time (samples) Figure 7. Anticausal (a) and causal (b) contributions, resulting from the decomposition of a violin sound. 5. TOWARDS MIXED-PHASE SYNTHESIS OF INSTRUMENTAL SOUNDS ZZT-based decomposition demonstrated that typical CII sounds could be represented as the convolution of anticausal and causal contributions. Moreover, correlations with embouchure techniques and intensity have been highlighted for trumpet. These results lead us to consider that mixed-phase representation of CII sounds is particularly relevant for expressive synthesis. rce i,,u 0.5 -0.5 (a) 50 100 1.50 200 250 time (samples) (b) 50 100 150 200 250 time (samples) ical continuous interaction instrument (CII) waveforms. The main algorithm of this canevas has been described: the separation of causal and anticausal contributions based on zeros of the Z-Transform (ZZT) of signal frames. Decomposition results for trumpet and violin sounds are encouraging. It allowed us to establish relevant correlation with embouchure techniques and playing intensity for trumpet. These results led us to propose a generalized causal anticausal linear model for synthesis of CII waveforms in spectral domain. We can assume that extensive tests on other instruments than trumpet, such as violin but also saxophone or clarinet, have to be achieved. It is part of our further work. Other improvement concerns stabilization of the ZZT decomposition. 7. REFERENCES [1] X. Rodet and J. Barriere, "The CHANT Project: From the Synthesis of the Singing Voice to Synthesis in General," Computer Music Journal, vol. 8, no. 3, pp. 1531, 1984. [2] A. Huovilainen and V. Valimaki, "New Approaches to Digital Subtractive Synthesis," Proc. of ICMC'05, pp. 399-402, Barcelona, Spain, 2005. [3] L.B. Jackson, "Non-Causal ARMA Modeling of Voiced Speech," IEEE Trans. on Acoustics, Speech and Signal Proc., vol. 37, no. 10, pp. 1606-1608, 1989. [4] B. Doval and C. d'Alessandro, "The Voice Source as a Causal/Anticausal Linear Filter," Proc. of VOQUAL'03, ISCA Workshop, Switzerland, 2003. [5] D.C. Coplay, "A Stroboscopic Study of Lip Vibrations in a Trombone," Journal of the Acoust. Soc. Am, vol. 99, pp. 1219-1226, 1996. [6] J.W. Beauchamp, "Analysis of Simultaneous Mouthpiece and Output Waveforms," Journal of the AES, no. 1626, pp. 1-11, 1980. [7] S. Serafin, F. Avanzini and D. Rocchesso, "Bowed String Simulation Using an ElastoPlastic Friction Model," Proc. ofSMAC'03. [8] B. Bozkurt, L. Couvreur and T. Dutoit, "Chirp Group Delay Analysis of Speech Signals," Speech Comm., vol. 49, no. 3, pp. 159-176, 2007. [9] H. Kawahara, Y. Atake and P. Zolfaghari, "Accurate Vocal Event Detection Method Based on a Fixed-Point to Weighted Average Group Delay," Proc. of the ICSLP, IEEE, pp. 664-667, Beijing, China, 2000. [10] http://theremin.music.uiowa.edu/MIS.html Figure 8. Comparison of the original trumpet sound (solid) with (a) the convolution of decomposed components, and (b) the resynthesis based on all-pole spectral models of both anticausal and causal parts (dashed). We propose a new subtractive synthesis technique based on mixed-phase representation of CII waveforms. The original idea is to consider the anticausal signal as the source, and the causal signal as the filter impulse response. We can show that the convolution of anticausal and causal components brings back the original signal [8]. This convolution is illustrated in Figure 8a. Based on this assumption, both anticausal and causal components can be approximated by linear filter impulse responses, introducing two spectral models: Ha (z) for the anticausal part, H,(z) for the causal part. In order to preserve phase information, the filter representing anticausal component has to be anticausal itself, meaning that impulse response is processed time-reversed [4]. Figure 8b compares the original trumpet signal with results of this process, where filter coefficients have been estimated by LPC analysis of both anticausal and causal parts. 6. CONCLUSIONS In this paper we have presented an efficient framework in order to analyse causal and anticausal components of typ 468