Page  331 ï~~Analysis/Synthesis of Sound Using a Time-Varying Linear Model Charles W. Therrien, Roberto Cristi, and Olav E. Kjono Naval Postgraduate School Monterey, California 93943 Abstract A method for sound synthesis based on a time-varying autoregressive (AR) model is described and applied to music synthesis. This model provides a stochastic representation of the sound. The model parameters are the coefficients of the difference equation describing the AR model and a time-varying gain which are stored as a function of time. The method can be applied to a variety of musical instrument sounds and offers a wide range of possibilities for sound enhancement or modification. 1 Introduction A host of synthesis methods for music, speech, and other audio signals have become practical over the last decade because of the availability of high-speed computation, large memories, and special digital signal processing (DSP) hardware. In music synthesis, sampling and FM techniques [Chowning, 73] have played a dominant role. However, other new methods based on physical modeling [Van Duyne and Smith, 1993] and sophisticated analysis/synthesis methods [Serra, 1989] (see also [McAulay and Quatieri, 1986]) have appeared. The method described in this paper belongs to the class of analysis/synthesis methods used in audio signal processing. However it is a new approach based on a linear time-varying stochastic model for the signal which to our knowledge has not previously been used for music synthesis. 2 Time-Varying AR Model In our approach the acoustic signal is represented as the output of a time-varying autoregressive (TVAR) model of the form y[n] = -al[ny n- 1] - a2[n].y[n- 2]... -cap [n].y[n - P] + G[u]. w[n] where y[n] is the synthesized sound waveform, n is the discrete time index, {as[n] i = 1,2,..., P} for model order P are the (time-varying) model parameters, G[n] is a time-varying gain, and wn] is the white noise excitation. During the analysis phase the model parameters are estimated from digitized recorded sound by writing the above equation in a state representation of the form x[n+1] = x[n]]+v[n] y[n] = -cT xfn]Â~+w[n&] where x is known as the state vector, whose components are the parameters {aj}, v is a vector representing a random perturbation to the state vector which accounts for its time variation, and c is the vector of past signal values c = [y[n- 1], y[n- 2],.., y[n- M] ]T The Kalman filtering equations can then be used to perform the estimation (see e.g. [Haykin, 1991] for details). The gain parameter G[n] is determined separately by estimating the variance of the residual process a[n] from the Kalman filter. If a short window of length L is used then this estimate becomes 1 E /3 G[n] = 1 ~~L1a2 k] The parameters for the signal are then stored in a matrix X whose rows represent the parameters as a function of time. These can be used to synthesize the signal using the first equation (opposite). 3 Synthesis Example The TVAR synthesis model works best on sounds that can be considered to be largely stochastic. However excellent results have been achieved on sounds ICMC Proceedings 1994 331 Audio Signal Processing

Page  332 ï~~0 o.s 1 15.s 2 2.3 (a) 100 Figure 1: Spectrograms for a segment of trumpet fanfare. (a) Original sound. (b) Synthesized sound. as diverse as percussion and brass. For more general types of music the model can be combined with sine wave [McAulay and Quatieri, 1986] or ARMA modeling and used in a deterministic/stochastic decomposition [Serra, 1989]. Here we examine only the TVAR model itself. Figure 1 shows a short segment of the spectrogram corresponding to a trumpet in Copland's Fanfare for the Common Man [Jones et al.]. The digital signal was captured from CD directly on a SUN workstation at the 44.1 kHz sampling rate, downsampled to 8 kil, and processed. The original and resynthesised signals are shown in the figure. Upon listening, the quality of the synthesized sound is excellent. The representation as a TVAR model offers many possibilities for variation in the sound. Patterns and "events" in the audio waveform can be repeated, combined, elongated or compressed by operating in a corresponding way on the matrix of coefficients. The results are very realistic because every realization of an audio event is unique due to the random nature of the model excitation. In addition, the sound can be changed by modifying the model coefficients. This is most effectively done by working with the roots of the system transfer function. 4 Summary This paper introduces a new method for sound synthesis involves a time-varying AR model. Since it is purely stochastic, the method is complementary to deterministic methods such as sine wave modeling and completes the set of possibilities ranging from purely deterministic to purely stochastic models. It can also be used as one component in a deterministic/stochastic decomposition approach to synthesis. The model parameters are estimated from original digitally recorded sounds by assuming the parameters evolve in time according to a simple state model and applying the Kalman filtering equations. This model has been used with success in the synthesis of various musical sounds. The representation afforded by the model opens up many possibilities for introducing changes and variation in the sound. References [Chowning, 73] John M. Chowning. The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society, 21(7):526-534, 1973. [Van Duyne and Smith, 1993] Scott A. Van Duyne and Julius 0. Smith. Physical modeling with the 2-D waveguide mesh. In Proceedings of the International Computer Music Conference, pages 40 -47, Tokyo, September 1993. [Serra, 1989] Xavier Serra. A System for Sound Analysis/Transformation/Synthesis based on a Deterministic Plus Stochastic Decomposition. PhD thesis, Stanford University, Stanford, California, October 1989 (CCRMA Report STAN-M-58). [McAulay and Quatieri, 1986] Robert J. McAulay and Thomas F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34(4):744--754, August 1986 [Haykin, 1991] Simon Haykin. Adaptive Filter Theory, 2nd ed. Prentice Hall, Inc., Englewood Cliffs, New Jersey, 1991. [Jones et al.] Phillip Jones Brass Ensemble. Brass Spendour. London CD 411 955-2. Audio Signal Processing 332 ICMC Proceedings 1994