# An Analysis-by-Synthesis Approach to Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones

Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact mpub-help@umich.edu to use this work in a way not covered by the license. :

For more information, read Michigan Publishing's access and usage policy.

Page 356 ï~~An Analysis-by-Synthesis Approach to Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones*t E. Bryan George and Mark J. T. Smith Georgia Institute of Technology School of Electrical Engineering Atlanta, Georgia 30332 Abstract This paper presents a new approach to the analysis, synthesis and modification of musical tones based on an overlap-add sinusoidal representation. Automatic analysis of model parameters is achieved using an analysis-by-synthesis procedure which incorporates successive approximation. The paper describes this analysis procedure in detail and introduces a generalized overlap-add sinusoidal model which can modify pitched musical tones without artifacts. The synthesis model maintains the accuracy and generality of classical additive synthesis, and the analysis procedure overcomes problems encountered in the digital phase vocoder due to analysis of transient portions of tones and pitch drift. In addition, by making use of the Fast Fourier Transform (FFT) algorithm the computational load of both analysis and synthesis is reduced significantly, and the synthesis technique achieves efficiency comparable to discrete short-time Fourier transform synthesis using the FFT. 1 Introduction Much attention has been paid to the tradeoffs encountered in analysis-based sinusoidal additive synthesis, most notably to the conflict between generality and computational efficiency. While additive synthesis accurately represents the sounds of many musical instruments and can flexibly modify musical tones, the amount of computation required prohibits real-time synthesis using relatively inexpensive hardware [1]. In addition, using the digital phase vocoder (DPV) [2] to automatically determine control functions used in additive synthesis requires the further computational overhead of a non-decimated digital filter bank. One approach to overcoming the computational problems of additive synthesis is to formulate the DPV in terms of the discrete short-time Fourier transform (DSTFT) and to implement this formulation using the FFT algorithm [3]. This strategy considerably improves the computational efficiency of both analysis and synthesis; however, in the presence of modifications DSTFT-based synthesis is not identical to classical additive synthesis, and using the DSTFT to perform modifications sometimes yields objectionable artifacts [4]. An alternative approach to analysis-based additive synthesis has recently been introduced in the context of speech analysis/synthesis [5]. In this approach, amplitude and phase parameters are derived from peaks of a high-resolution short-time Fourier representation of the original signal, then interpolated to yield amplitude and phase control functions. This technique addresses problems encountered in the DPV that arise from modeling the wideband transient (attack and release) portions of a tone using narrowband signals. Furthermore, parameter interpolation and the highresolution nature of the analysis reduce the effects of pitch drift, which can cause modification artifacts in the DPV when multiple partials fall in a single filter band [6]. However, the amount of computation required for synthesis in this system is the same as classical additive synthesis. While this difficulty has been recognized and an efficient implementation developed, loss in quality has been reported. *This wvork was supplorted by the National Science Foundation under contract DCI-861 1372. tU.S. Patent and corresponding overseas conventions pending; parties interested in licensing arrangements should contact NIr. Barry Rosenberg, Georgia Tech Office of Technology Licensing, (404) 894-7059. ICMC 356

Page 357 ï~~This paper presents a new approach to analysis-based music synthesis using an overlap-add sinusoidal model formulation. Analysis of model parameters is achieved using an automatic analysisby-synthesis procedure which incorporates an iterative signal approximation technique [7, 8]. The proposed analysis/synthesis system is capable of modeling musical tones with high fidelity and provides the same generality as additive synthesis. In addition, the system provides flexibility of control without modification artifacts, overcomes the transient analysis problems and sensitivity to pitch drift encountered in the DPV and does so at a computational cost in synthesis comparable to DSTFT synthesis using the FFT [9]. 2 Synthesis Model and Analysis Procedure The synthesis model proposed is an overlap-add sinusoidal model of the form 00.n]= u[n] 1: w8[n - kN8]ik[n - kN8], k=-oo where o[n] (estimated by lowpass filtering Is[n]I) represents the envelope of the musical tone, ws[n] is a complementary synthesis window, and the synthetic contributions ik[n], composed of J[k] sinusoidal components, are given by J[k] Ak[n]= A cos( wn + j=1 Figure 1 illustrates the overlap-add structure which produces i[n] in a single synthesis frame. The amplitudes {Aj }, frequencies {w }, and phases { j } of [n] are found by iteratively approximating s[n] as follows: s[n] is first approximated over a short interval with a single envelope-weighted sinusoid. The approximation criterion used is the window-weighted mean-square error NÂ~ E = 3 Wa[n]{s[nl + kN] - o[n + kN8]At cos(Qn + 01k)}2, n=-- -Na minimized in terms of the amplitude and phase of the sinusoid. This approximation is repeated as 0 is varied over a candidate set, and wk is chosen as the "best match" frequency; the resulting weighted sinusoid is subtracted from s[n]. The approximation error is then approximated with a second sinusoid in the same manner, and so on. Figure 2 illustrates analysis-by-synthesis in blockdiagram form, demonstrating its "closed-loop" structure. At each stage the eligible frequencies are constrained so that component frequencies have the quasi-harmonic form w? = j wo +O For pitched musical tones, fundamental frequency estimate wo may be updated recursively based on the parameters of each component, thus providing fundamental frequency tracking [8, 9]. 3 Modification Model The main difficulty in performing time- and frequency-scale modification using the overlap-add model is a breakdown of phase coherence due to the differential frequencies {A }}. We have derived a refined modification model which addresses this problem by explicitly controlling phase coherence [9]. In this formulation, the synthetic contributions for time scale factor pjk and frequency scale factor /3k are given by spk.,/3[fl]= S AcosIJIlkW0on + J)+ +4jj. j=O Pk / ICMC 357

Page 358 ï~~Note that as Pk is increased, the resulting modified frequencies become more harmonic, thus preserving coherence by slowing phase evolution. Figure 3 shows the effect of coherence breakdown in a synthetic contribution and demonstrates how the refined model counteracts this effect. The time shift 6k is determined recursively using constraints which preserve temporal phase continuity. When a spectral envelope estimate is available, it is possible to change the fundamental frequency of a pitched tone without altering its spectral shape by performing frequency-scale modification on the "excitation" signal associated with the tone [10] and reimposing the estimated spectral envelope on the modified excitation. The authors have proposed a novel approach to modifying the excitation signal which provides pitch-scale modification without loss of bandwidth and which avoids the problem of noise amplification [9]. 4 Computational Considerations Since much of the computation in analysis-by-synthesis is in the form of inner products between signals and sinusoids, analysis speed may be greatly improved by performing analysis-by-synthesis in the frequency domain, using DFT's calculated via the FFT algorithm to provide approximation parameters and update the error. In addition, since the overlap-add model uses constant-amplitude, constant-frequency sinusoids, the FFT algorithm may also be used to perform synthesis, resulting in a fast implementation. Due to the fact that component frequencies in the overlap-add model are less constrained than in the DSTFT, the resulting computational load is higher than DSTFT synthesis using the FFT, but is still well below the load required for classical additive synthesis [9]. 5 Conclusions An overlap-add sinusoidal model for musical tones and a formulation appropriate for synthesis and modification of tones have been introduced, and techniques for automatic analysis and computationally efficient synthesis of tones based on this model have been discussed. An analysis/synthesis system capable of performing fixed and time-varying time-, frequency- and pitch-scale modification based on these techniques has been developed in software on a general-purpose minicomputer and tested on a variety of pitched musical tones sampled at 16 kHz. These tones are from instruments in the brass, string, single- and double-reed woodwind families, as well as the singing voice, at pitch frequencies ranging from C2 (64 Hz) to C6 (1024 Hz). In all cases accurate synthetic tones and artifact-free modified tones were produced; results will be presented at the conference and in the accompanying demonstration. The goal of future research will be to develop a keyboard-controlled prototype synthesizer based on this system which is capable of real-time operation in a performance environment. References [1] J. A. Moorer, "Signal Processing Aspects of Computer Music: A Survey," Proc. IEEE, vol. 65, pp. 1108-1137, Aug. 1977. [2] J. L. Flanagan and R. M. Golden, "Phase Vocoder," Bell Sys. Tech. J., vol. 45, pp. 1493-1509, 1966. [3] M. R. Portnoff, "Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform," IEEE Trans. on ASSP, vol. ASSP-24, pp. 243-248, June 1976. [4] M. R. Portnoff, Time-Scale Modification of Speech Based on Short- Time Fourier Analysis. PhD thesis, Massachusetts Institute of Technology, 1978. ICMC 358

Page 359 ï~~[5] R. J. McAulay and T. F. Quatieri, "Speech Analysis/Synthesis Based on a Sinusoidal Representation," IEEE Trans. on ASSP, vol. ASSP-34, pp. 744-754, Aug. 1986. [6] M. Dolson, "The Phase Vocoder: A Tutorial," Computer Music Journal, vol. 10, pp. 14-27, Winter 1986. [7] E. B. George and M. J. T. Smith, "A New Speech Coding Model Based on a Least-Squares Sinusoidal Representation," in Proc. IEEE ICASSP, pp. 1641-1644, Apr. 1987. [8] E. B. George and M. J. T. Smith, "Perceptual Considerations in a Low Bit Rate Sinusoidal Vocoder," in Proc. IEEE IPCCC, pp. 268-275, Mar. 1990. [9] E. B. George and M. J. T. Smith, "An Analysis-by-Synthesis Approach to Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones." Submitted to JAES, Dec. 1990. [10] J. A. Moorer, "The Use of the Phase Vocoder in Computer Music Applications," JAES, vol. 26, pp. 42-45, January/February 1978. wa[n].sk[n] w8[n - N8]sk+1 [n - N8] AMPLITUDE, FREQUENCY AND PHASE ENVELOPE SEQUENCE SYNTHETIC TONE Synthesis Frame k INPUT TONE Figure 1: Illustration of overlap-add synthesis. Synthesis frame k ranges from n = kN to n= (k+ 1)N-1. Figure 2: Block diagram of analysis-by-synthesis as applied to sinusoidal modeling. The synthetic tone is initially zero, and the parameters of each successive component are chosen to minimize the error left after subtracting previous components. (c) (a) (b) 0 200 -200 0 Samples Samples Samples Figure 3: Illustration of refined modification model; (a) Synthetic contribution (Na = 100); (b) Distortion of extrapolated signal due to differential frequency terms (pk = 3); (c) Phase coherence preservation by differential frequency scaling. ICMC 359