Page  18 ï~~Wavetable and FM Matching Synthesis of Musical Instrument Tones Andrew Homer1'2'3, James Beauchamp2, and Lippold Haken1 'CERL Sound Group, 2Computer Music Project, and 3CCSR University of Illinois at Urbana-Champaign, Urbana, IL 61801 e-mail: ABSTRACT Spectrum matching of musical instrument tones is a fundamental problem in computer music. This paper presents three methods for determining near-optimal parameters for synthesis of harmonic musical instrument or voice sounds using the addition of several fixed wavetables or FM modules with time-varying weights. The overall objective is to find wavetable spectra and associated amplitude envelopes which provide a close fit to an original time-varying spectrum. Techniques employed for determining the wavetable spectra include a genetic algorithm (GA) and principal component analysis (PCA). Least squares solution is utilized to determine the associated amplitude envelopes. In one study, we restricted the spectra to be those resulting from simple FM with fixed modulation indices and integral carrier-modulator ratios (such matches could be implemented using either real FM or wavetable synthesis). In another study, PCA was used to obtain a set of orthogonal spectra for the wavetables. A third method used the GA method to select spectra from the original signal at various time points. All of the methods converge gracefully to the original as the number of tables is increased, but 3-5 wavetables frequently yield a good replica of the original sound. Comparative results using the three methods will be discussed and illustrated. 0. INTRODUCTION We have tested three different matching methods for additive synthesis of complex spectra, one (GA-index Wavetable) based on a genetic algorithm (Goldberg 1989; Holland 1975) for selecting fixed spectra, another (PCA Wavetable) on principal components-analysis (Dunteman 1989) determination of basis spectra, and a third (GA-FM), also based on a genetic algorithm, which determines optimized parameters for FM synthesis. The FM model consists of a single modulator driving multiple parallel carriers with individual time-varying amplitudes. Best results were obtained with invariant modulation indices and time-varying amplitudes of the individual carriers whose frequencies were set to particular harmonics of the fundamental, which is also the modulator frequency. Genetic algorithms have been applied to a wide array of problem domains from stack filter design (Chu 1989) to computerassisted composition (Homer and Goldberg 1990). The GA-based spectrum matching methods presented in this paper find parameters which can be used to perform traditional wavetable and FM synthesis. The PCA-based technique is related to that of Stapleton and Bass (1988) except that the basis functions are in the frequency domain, rather than the time domain. Various speech applications have made use of this PCA approach (Stautner 1983; Zahorian and Rothenberg 1980). 1. WAVETABLE SYNTHESIS BACKGROUND Wavetable or fixed waveform synthesis is an efficient technique for the generation of particular periodic waveforms. Prior to synthesis, one cycle of the waveform is stored in a table. The spectrum of the waveform can be any arbitrary harmonic spectrum, which is specified by the amplitudes of its harmonics. Further control is gained by using multiple wavetables with associated time-varying weights (amplitude envelopes) in the synthesis model. The time-varying weights allow the waveforms to be crossfaded and mixed with one another in various ways. Note that the phases of the corresponding harmonics of the different wavetables must be the same to avoid inadvertent phase cancellations. In this paper we refer to the spectrum produced by a particular wavetable as its associated basis spectrum. 18

Page  19 ï~~2. CURRENT WORK IN SPECTRAL MATCHING To accomplish spectral matching synthesis, we generally begin with a time-variant analysis of the original sound. This will typically result in 500 to 2000 spectral frames which are stored in an analysis file. Next, the synthesis parameters for a "best fit" are determined with respect to the original sound and the synthesis model being used. Finally, resynthesis of the sound is performed using the matched parameters. The principal focus of this paper is on methods for determining the synthesis parameters. For additive wavetable synthesis, this entails the determining the basis spectra. For multiple-carrier FM synthesis, we must find the appropriate carrier frequencies and modulation indices. In both cases, we must also find the timevarying weights for the wavetables or FM modules. 2.1 Wavetable Matching In multiple wavetable matching, a number of options exist for the determining the basis spectra. Since each basis spectrum is completely described by its relative harmonic amplitudes, we can try to optimize these values via a genetic algorithm (GA). However, since there are generally between 20 and 40 harmonics per basis spectrum, the GA search space will be very large. While there may be clever, general ways to reduce the size of this space, we have opted to select "spectral snapshots" from the original sound's time-variant analysis. Trivially, the spectrum of each individual frame matches the original sound exactly at its corresponding time frame (and likely will be a close fit to neighboring frames), so the set of these spectra serve as a good pool from which to draw our basis spectra. When we confined the genetic algorithm to finding the best time points or indices for the basis spectra, the dimension of the search space became greatly reduced, and the GA had a much greater chance of success. Alternatively, following the lead of speech researchers, principal components analysis (PCA) can be used to compute the basis spectra. As mentioned previously, PCA uses statistical properties of the time-variant analysis of the original sound to determine a set of best-fitting orthogonal basis spectra. Unlike the previous approach, these basis spectra are very unlikely to match the spectrum of any individual frame of the analysis. However, in a statistical sense, this method provides an optimum match for any number of wavetables. Once we determine the basis spectra using either of the above methods, we must construct their amplitude envelopes. Thinking of the basis spectra as basis functions which represent the original time-varying spectrum immediately suggests linear least squares as a method for determining the amplitude weights. For each time frame, least squares uniquely determines the set of weights which minimizes the mean square error between the original spectrum and its match. 2.2 FM Matching FM matching synthesis presents a more difficult problem than the wavetable case. The most obvious solution is to allow the modulation index to vary with time in an attempt to match the original spectrum. However, it is virtually impossible to adequately match the time-varying spectrum of any acoustic instrument using a single FM module (Beauchamp 1982). Much better solutions are possible using two or more FM modules in parallel. Assuming that we know what carrier-to-modulator ratios to use for a set of modules, applying a genetic algorithm technique to determine time-varying indices is tantamount to finding the best set of indices for each time frame. We could then use least squares to determine the optimum weights for the FM spectra, which are determined using Bessel functions. However, if we must recompute these basis spectra for every time point, the procedure is at least an order of magnitude more expensive than for the wavetable case. With 1000 time points, we are faced with a huge search space. However, the modulation indices cannot assume arbitrary values at each time point, but should vary at a fairly slow rate in order to avoid spectral discontinuities. While we can attempt to force a limit on the index changes allowed from one frame to the next, the GA optimizer is then likely to confine itself to locally optimal solutions as time progresses, due to the artificially narrow search regions imposed. In addition, spectral changes due to changing modulation index values are not especially characteristic of the dynamic spectral behavior typical of acoustic instruments. This suggested that we simply look for a set of constant index values, and do the best we can with its associated fixed Bessel function spectra. Under this assumption, the method of least squares could be applied exactly as in the wavetable matching case to find the time-varying weights. Hence, in our final result, FM synthesis reverted to a special case of fixed waveform additive synthesis, where the waveforms are those which result from FM synthesis with particular module indices and modulator-to-carrier ratios. 19

Page  20 ï~~3. RESULTS So how many wavetables or FM modules are needed for a good match? How many for an exact match? The second question turns out to be trivial: if the number of modules equals the number of harmonics of the original sound an exact match can be made using time-variant additive synthesis (setting the modulation indices to zero in the FM case). The first question has to be answered more empirically however. In the matches tried to date (based on the sounds of a trumpet, a tenor voice, and a guitar), three wavetables or FM modules facilitated a good basic match, but five or more were often needed to capture the detailed nuances of the original tones. Results obtained by the three matching techniques were, in general, perceptually comparable, for the same number of wavetables or modules. Figure 1 shows the results of our objective error measurements. 1.0 0.8.c, ------ GA-index Wavetable...0.6PCA Wavetable...........GA-FM w w w 0.4 "..... 'U:: Â~ ".o 0 2 3 4 5 6 7 8 9 10 NO. OF TABLES Figure 1. Comparison of average matching error vs. number of tables for three different spectrum matching techniques With GA wavetable matching, the trumpet was only remotely approximated when one basis spectrum was used. Of course, the use of more tables gave even better matches. Even more surprising, a one-basis-spectrum match to the tenor voice was quite close to the original, while with two basis spectra, the match was almost perceptually indistinguishable from the original. Although we found the guitar's decay fairly easy to capture, its attack was more elusive, regardless of the number of tables. This may be related to the guitar tone requiring a very large number (80 or so in this case) of harmonics to adequately represent its attack transient. Even so, matches using a relatively small number of wavetables clearly sound "guitar-like". Principal-components-based matching results were generally similar in character to those found in genetic matching, although they frequently suffered from problems inherent in the underlying statistical approach. For example, the PCA match to the trumpet's release exhibited an excess of brightness. In general, PCA matches suffer from relatively large errors during the low amplitude portions of sounds. This is because most of the spectral variance occurs during the higher amplitude portions. Thus, the matching accuracies of the lower amplitude sections are sacrificed in deference to those of the higher amplitude sections. While this problem might be alleviated by using a logarithmic measure, we would then lose the additive synthesis feature which we require. 20

Page  21 ï~~4. CONCLUSION We have introduced GA- and PCA-based matching techniques for determining near-optimum parameters for resynthesizing sounds via wavetable and FM synthesis. The techniques entail spectral analysis, matching, and finally resynthesizing acoustical instrument tones. The decomposition of the matching process into tractable subproblems was central to the success of the overall process. For the cases we have tried, the quality of the resulting syntheses indicate that this is a costeffective approach when a relatively small number of basis spectra or FM modules are used. With this framework in place, matching can facilitate applications such as data reduction, data stretching, and synthesis-by-rule. For instance, the data reduction afforded by FM matching synthesis compared to time-variant additive synthesis is equal to the number of harmonics required divided by the number of wavetables or FM modules used in the resynthesis. When a large number of harmonics are generated by just a few tables or modules, the savings is substantial. 5. ACKNOWLEDGMENTS This material is based upon work supported by the CERL Sound Group at the University of Illinois. The work was facilitated by the Computer Music Project at the School of Music and Symbolic Sound Corporation's Kyma Workstation. Special thanks to members of the CERL Sound Group whose input and feedback has been invaluable in this work. These include Kurt Hebel, Carla Scaletti, Bill Walker, Kelly Fitz, and Richard Baraniuk. Thanks also to Chris Gennaula, Camille Goudeseune, Chris Kriese, Michael Hammond, and Yaz Shehab of the Computer Music Project at the School of Music for their input. 6. REFERENCES Beauchamp, J.W. 1982. "Synthesis by Amplitude and 'Brightness' Matching of Analyzed Musical Instrument Tones", J. Audio Engr. Soc., 30(6): 396-406. Chu, C. 1989. "A Genetic Algorithm Approach to the Configuration of Stack Filters", Proceedings of the Third International Conference on Genetic Algorithms and Their Applications, pp. 112-120. Dunteman, G. (1989). Principal Components Analysis. Newbery Park, CA: Sage Publications. Goldberg, D. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning, Reading, MA: Addison-Wesley. Holland, J. H. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press. Homer, A. and Goldberg, D. 1991. "Genetic Algorithms and Computer-Assisted Music Composition", in Proc. 1991 nt. Computer Music Conf., San Francisco, CA: Int. Computer Music Assn., pp. 479-482. Stapleton, J. and Bass, S., 1988, "Synthesis of Musical Tones Based on the Karhunen-Loeve Transform", IEEE Trans. on Acoustics, Speech, and Signal Processing, 36(3): 305-319. Stautner, J. 1983. "Analysis and Synthesis of Music using the Auditory Transform", Masters Thesis, Department of Electrical and Computer Science, MIT, Cambridge,MA. Zahorian, S. and Rothenberg, M. 1981. "Principal-Components Analysis for Low Redundancy Encoding of Speech Spectra", J. Acoust. Soc. Am., 69(3): 832-845. 21