Page  49 ï~~Discrete Summation Synthesis and Hybrid Sampling-Wavetable Synthesis of Acoustic Instruments with Genetic Algorithms San-kuen Chan, Jennifer Yuen, Andrew Horner Department of Computer Science The Hong Kong University of Science and Technology Clear Water Bay, Hong Kong,, Abstract This paper introduces an automated genetic algorithm method that determines dicscrete summation synthesis and hybrid sampling-wavetable synthesis for matching an arbitrary harmonic instrument tone. Results are given for both techniques on various instruments. 1 Introduction Matching acoustic instrument tones with efficient synthesis methods is a fundamental problem in computer music. Generally, for a particular synthesis model matching begins with a time-variant spectral analysis of the orignal sound. Next, the model synthesis parameters which produce a "best fit" to the analysis data are determined. Finally, resynthesis of the sound is performed using the matched parameters. These steps are shown in Figure 1. time-varying original harmonic Synthesis resynthesized sud Short Time spectrum parame ter s sond SetuMthigSnhssAlysisPoeur oe Figure 1: Matching analysis/synthesis overview This paper presents an automated genetic algorithm method that determines discrete summation synthesis and hybrid sampling-wavetable synthesis parameters for matching an arbitrary acoustic instrument tone. 2 Discrete Summation Synthesis Overview Discrete summation synthesis generates a complete set of harmonic sine waves from only a small number of them. The price we pay for this savings is that only a subset of possible spectra can be produced. More spectral control can be obtained by using multiple ring-modulated discrete summation modules, each with its own amplitude envelope. The ring modulated discrete summation formula (Moorer 1976; Moorer 1977) is sin(0){1 - a2 - 2aN+l [cos((N + 1)#) - a cos(N/3)]} 1 + a2 - 2a cos(fl) (1) where 0 = 2irNcfmt and /3 = 2irfmt. Symbols a,fm and NN represent the amplitude decay factor, the harnomic number of carrier frequency and the modulating frequency respectively. 3 Hybrid Sampling-Wavetable Synthesis Overview Hybrid sampling-wavetable synthesis can synthesize instrumental tones with complicated attacks like piano and plucked strings. The method uses sampling for the critical attack portion of the tone and multiple-wavetable synthesis to match the more gradually changing sustain and decay. In between, there is a short transition period of cross-fading the methods. Our model also includes phase matching. We use Genetic algorithm (Goldberg 1989) to automatically select a subset of basic spectra from the original set of time-varying analysis spectra and associate amplitude envelopes so that summing up of these will best match the original spectrum. However, this is a huge search space. Instead, we use a simpler approach of spectral interpolation, where spectra from different parts of the original sound simply cross-fade from one to the next as time progresses. This greatly reduces the parameter for each spectrum so that GA can run efficiently to yield an near-optimal solution. Previous work on wavetable matching (Homer et al., 1993) has set the phases to zero or arbitrary values. However, as there is a mixing of both sampling and wavetable synthesis during the transition period, we need to align the harmonics of both synthesis methods with the same phases to prevent from phase cancellations. Normally, we can extract a period of the original sound sample and perform a band-limited Discrete Fourier Transform (DFT) in order to obtain the phases. This works with the assumption that the harmonic frequencies are time-invariant and the phases accumulate uniformly. However, in many sounds, the harmonic frequencies are time-varying, thus ICMC Proceedings 1996 49 Chan et al.

Page  50 ï~~we try to find a set of phases so as to minimize the overall phase cancellation during synthesis. We choose the period from the transition period where phase cancellation would be most prominent. Prior to doing the DFT, we multiply the period samples by a hamming function to minimize the undesired effects of time-domain truncation. The phases from the DFT are then used to approximate the initial phase of each harmonic. With these initial phases, a set of wavetables are formed in which the phases for every wavetable are aligned with the same values. The ith table entries are given by: Nhar 27rki tabled =1 ak sin( 1 + 0k) (2) table E a sintable-size k=1 where 1 < i < table-size, Nhar is the number of harmonics, ak and qk are the amplitude and initial phase of the kth harmonic. 4 Spectral Matching and Genetic Algorithms The spectral matching procedure consists of two steps. At the first step, the basis spectra are determined by the genetic algorithm (we will discuss this later). The second step is to determine the best amplitude envelope for each table by solving the system of equations Ntabs E akwjr- =bkr (3) j=1 where akj is the amplitude of kth harmonic due to the jth basis spectra, wjr is the unknown amplitude weight for the jth basis spectra at the rth time frame and bkr is the amplitude of the kth harmonic of the analysis spectrum at the rth time frame. Since the number of harmonics is typically much larger than the number of modules, we cannot solve Equation 3 exactly. However, we can use least squares to find the amplitude weights that minimize the squared error E k~Wj,r - 2kr 4 for each time frame r. To compute the least squares solution of Equation 3, we use the normal equations (Press et al., 1989). By taking the first derivative of Equation 4 with respect to Wj, we can solve for the WY that gives the minimum sum of squares satisfying the normal equation ATAW=ATB (5) Now that we know how to find the amplitude envelopes, we use the GA to find synthesis parameters in order to create the basis spectra. The GA systematically searches through the solution space, using least squares to generate a different set of amplitude envelopes for each one as it goes. A fitness function tells the GA how well each set of parameters matches the original tone. We use the relative spectral error between the original spectra as the fitness function. The relative error Â~ is defined as follows: Â~ 1 N1 Nh Nh _ = -(bk,r -bkr)2 Zbkr2 (6) = 1 k-l k=1 where bkr is the amplitude of the kth harmonic of the synthesized tone at the rth time frame: Nm b Z -= ak,jwj,r j=1 (7) When Â~ = 0, we have a perfect match, while an error of 0.1 indicates an average relative difference of 10%. The relative error may not exactly reflect the quality of the synthesized tone, but it is a reasonably good measure. 5 Results We have tested our discrete summation synthesis matching method on several instruments including trombone and English horn. Figure 2-6 show the original and synthesized spectra for each instrument and the parameters found. We also give the psychoacoustic results for each instrument. Figure 2: The first 15 harmonics of the original trombone (left) and its 3-module match (right). Figure 7-10 show results for the hybrid sampling-wavetable model on the piano. The original piano tones had frequencies of 130.8Hz (C3) and 523.2Hz (C5) and the results are for 3-wavetables matches. The match uses 0.12 second and 0.09 second of samples respectively from the original C3 and C5 tones before crossfading to wavetable synthesis. The matches are quite convincing. 6 Conclusion We have introduced two synthesis techniques using genetic algorithms for matching the parameters, and shown results for various tones. The GA Chan et al. 50 ICMC Proceedings 1996

Page  51 ï~~parameters used for discrete summation synthesis spectrum produced amplitude envelope by each module of each module a, =0.40 82=0.73 NC=63 a- =0.13 I Nat, Figure 8: basic spectra used for the 3-wavetables match of piano C3 *,i -., Figure 3: Discrete summation synthesis parameters for the 3-module trombone match. Figure 4: The first 15 harmonics of the original English horn (left) and its 3-module match (right). parameters used spectrum produced amplitude envelope for discrete by each module of each module summation synthesis a1 a=0.76 Ncl-I 1 u2=0.34 I a; = 0.49 iliI 1 Figure 5: Discrete summation synthesis parameters for the 3-module English horn match. Figure 6: The percentage of trials that the average listener was able to identify synthetic trombone (left) and English horn tones (right) from a test set of both real and synthetic tones. Figure 9: The first 10 harmonics of the original piano C5 (left) and its 3-wavetables match (right) provides an efficient means of matching synthesis parameters. References A. B. Horner, J. Beauchamp, and L. Haken, "Methods for Multiple Wavetable Synthesis of Musical Instrument Tones," Journal of Audio Engineering Society, vol. 41, No. 5 (May 1993). J. A. Moorer, "The synthesis of complex audio spectra by means of discrete summation formulas," Journal of the Audio Engineering Society, vol. 24, No. 9 (November 1976). J. A. Moorer, "Signal Processing Aspects of Computer Music- A Survey," Computer Music Journal (February 1977). D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning (AddisonWesley Publishing company, Inc. 1989). W. Press, B. Flannery, S. Teukolsky and W. Vetterling, Numerical Recipes (Cambridge University Press, Cambridge, UK 1989). -- Figure 10: basic spectra used for the 3-wavetables match of piano C5 Figure 7: The first 10 harmonics of the original piano C3 (left) and its 3-wavetables match (right) ICMC Proceedings 1996 51 Chan et al.