Page  00000001 A Comparison between Local Search and Genetic Algorithm Methods for Wavetable Matching Simon Wun, Andrew Horner and Lydia Ayers Department of Computer Science, Hong Kong University of Science and Technology Abstract Picking spectral snapshots from the original tone is a successful approach to finding basis spectra in wavetable matching of musical instrument tones. Spectral snapshot selection usually requires optimization by a sophisticated technique such as a genetic algorithm (GA). This paper compares GA selection and a straightforward local search method, which first selects an initial set of basis spectra from the original tone by some simple selection strategy, and then modifies and improves them. Matching results for a range of instruments show that local search, when using the best composite results found by various simple selection strategies, is as effective as the GA. Moreover, surprisingly, starting from various initial basis spectra, local search can usually find several different matches, all with errors within 1% of the best solution, indicating that there are many near-optimal solutions in the wavetable search space. 300C 200C 1000 -1000 -200C -300C )0 10 0 S - 0 - 0 - 0 100 200 300 400 500 600 700 800 900 102 Sample no. 0 - 0 - 0 - 0 - 0 - D0 - n 60C 500 - 40C S30C 20C 100 1 5 10 Harmonic 1 INTRODUCTION Wavetable synthesis reproduces a musical instrument tone of a definite pitch by adding several enveloped static waveforms (see Fig. 1) (Serra, Rubine, and Dannenberg 1990; Horner, Beauchamp, and Haken 1993). Each wavetable stores one period of a waveform which is a sum of harmonic sinusoids with fixed frequencies and amplitudes. The spectrum of a wavetable is called its basis spectrum. Fig. 2 shows a 1024-sample wavetable and the corresponding basis spectrum. During synthesis, the wavetable samples are multiplied by an amplitude envelope, and the signals from different enveloped wavetables are added to produce the matched tone. Figure 2: A 1024-sample wavetable. (upper) Sample values. (lower) Basis spectrum. First, a spectral analysis of the original tone transforms the sound into a time-varying spectrum. Phase vocoder analysis, short-time Fourier transform for harmonic sounds, can be used (Dolson 1986). It typically divides the original tone into thousands of frames a few milliseconds long, each representing a spectral snapshot at a particular time point. The complete set of spectral snapshots can also be viewed as discrete time-varying amplitudes and frequency deviations. The target spectrum is typically limited to only some of the original frames to save computation in the later matching phase. Using 10 evenly spaced frames from the attack (that is, before the peak rms amplitude is reached) and 10 from the rest of the original tone keeps most of the perceptually important spectral information. Next, assuming we have found the basis spectra, the amplitude envelopes can be determined by solving the following system of linear equation: Figure 1: Multiple wavetable synthesis block diagram. Wavetable matching tries to find the basis spectra and amplitude envelopes that can best approximate the original tone. a1,1 ~ al,NTAB W1,1 ~ " a2,1 W2,1.* aNHAR,NTAB _ Wl,NFRM WNTAB,NFRM Proceedings ICMC 2004

Page  00000002 biFl. bl,NFRM 1 b2,1 ' bNHAR,NFRM J or AW = B (1) where NHAR = number of harmonics to match NTAB = number of wavetables NFRM = number of frames in the target spectrum akj = amplitude of the kth harmonic of the jth basis spectrum Wj,n = amplitude envelope value for the jth wavetable and the nth target frame bk,n = amplitude of the kth harmonic of the nth frame of the target spectrum. Normally, there are far more harmonics than wavetables (e.g., NHAR = 30 and NTAB = 3). In that case the match, AW = B* is the best least-squares approximate to the target, B. One metric to measure the difference between the original and matched tones is the relative spectral error as follows: * EqAllEndpts: picks the frames evenly spaced through the duration of the tone, including the end frames at the beginning and end * EqAllNoEndpts: uses equal spacing too but excludes the end frames * EqHalfEndpts: picks half the basis spectra from the attack (as defined above) and half from the rest of the tone, including the end frames * EqHalfNoEndpts: the same as EqHalfEndpts, except for excluding the end frames The simple strategies are a blind selection-we do not care whether the initial solution is good or bad. Then, a local search algorithm artificially modifies the basis spectra in ways that improve the match. Fig. 3 shows a set of three basis spectra selected from a trumpet tone and what they may be changed to after local search. Before 20 I I - - After 10 - 1 5 10 15 20 6000 r i- I I I II NFRM ZNHAR (bk - )2 N1 C= kn - k, 6 NFRM NHAR b2 n=l N k=l kn (2) 4000 -< 2000 - I I A value of c = 0.1 represents a 10% relative spectral error. Usually, the smaller the relative spectral error, the better the match. But, how do we decide the basis spectra in the first place? The current approaches to finding them broadly fall into two categories: spectral snapshot selection and artificial basis spectra generation. Spectral snapshot selection picks the basis spectra from the spectral snapshots of the original tone. Genetic algorithm (GA) selection, for example, essentially searches through the spectral snapshots combinations in an evolutionary fashion (Horner, Beauchamp, and Haken 1993). On the other hand, artificial basis spectra are often statistically generated by such techniques as principal component analysis (PCA), and can be quite different from the original spectral snapshots (Horner, Beauchamp, and Haken 1993; Sandell and Martens 1995). 2 LOCAL SEARCH METHOD Selecting basis spectra from the original spectral snapshots is intuitive, but sophisticated optimization procedures such as a GA are required to find an excellent solution. On the other hand, the search space for artificial basis spectra is infinite, in which there are potentially better solutions. Unfortunately, the chance of finding a good solution in the infinite search space is small by a one-pass statistical technique such as PCA. Therefore, we propose a local search method to combine the best of both worlds. The main idea is to start with a rough guess, and refine it through local search. The following are four simple strategies for selecting an initial set of basis spectra as the rough guess: l i l I Il I li ii iii 1 5 10 15 20 10 - 5 - 1 5 10 15 20 Harmonic no. Figure 3: Three basis spectra selected from trumpet tone before and after local search. Local search, like many other wavetable matching methods, needs an automatic evaluation metric that differentiates good matches from poor ones to compare candidate sets of basis spectra. Unfortunately, a metric that perfectly reflects human perception is still an open research problem. In this work, we used the relative spectral error metric in Equation 2 to measure the relative difference between a match and the original spectrum. Local search is an iterative improvement heuristic. It is based on the hillclimbing idea: during the search, we repeatedly transform the current solution into a close and better new solution which then becomes the current solution (for the next iteration). The search starts from an initial solution, which can actually be randomly determined, and ends when improvement is not found with any transformation. In effect, it looks for a local optimum in the neighborhood of the initial solution. Translating wavetable matching into relative spectral error minimization, we can immediately apply local search to our problem. The relative spectral error is the objective function that depends on the basis spectra. A solution can thus Proceedings ICMC 2004

Page  00000003 be represented by a variable vector of harmonic amplitudes for the basis spectra. For example, with 3 wavetables and 30 harmonics, the solution vector is as follows: a = [aial,2 * *,30a2,1. * a3,30]T where aj,k is the amplitude of the kth harmonic of the jth basis spectrum. We used Powell's method (Press et al. 1992), a common multivariable numerical optimization technique, as the transformation. It is a "zeroth-order" direction set method that requires no derivatives but instead a sequence of line minimizations for which we used golden section search. 3 RESULTS 3.1 Simple Strategies versus GA Fig. 4 shows how the four simple strategies compare to GA selection of the basis spectra. The GA consistently outperforms the simple strategies on a tenor voice tone. With 1-3 wavetables, excluding the end frames does better than including them, though EqHalfEndpts selection performs almost as well as the GA with 4-5 wavetables. The EqHalf strategies usually outperform their EqAll counterparts on the tenor. Instrument Trumpet Trombone Tuba Clarinet Oboe Bassoon Chinese dizi Chinese sheng Violin Cello Chinese erhu Guitar Chinese zheng Piano Glockenspiel Tenor voice Pitches A3, F4 G2, Ab4 C1, Eb2, Gb3 Eb3, Eb4, G5 D4, D5, D6 Db2, Bb3, C5 A3, G5 A4, A5 G3, Eb5, B6 C2, Eb3, B4 D4, G#5 A2 Gb2, Db4, Gb5 Al, C3, Eb4, Gb5 G5, G6 G3 Table 1: 39 tones tested, from 16 instruments. S7 0.5 o 0.4 CI 0.3 c c 0. 0.2 -x- GA selection -- EqAIIEndpts S.o. EqAIINoEndpts -A- EqHalfEndpts SA. EqHalfNoEndpts..... '................ 2 3 4 No. of wavetables niL No. of wavetables Figure 4: Matching results for a G3 tenor tone using simple and GA selection. Table 1 lists the 39 total tones we tested, from 16 instruments. Fig. 5 shows the average results over all of them. The simple strategies are not too bad, except EqAllEndpts with 1-2 wavetables. None of the simple strategies stands out as being consistently best, but EqHalf is better than EqAll overall. This is partly due to a bias inherent in the relative spectral error formula which uses an EqHalf target spectrum of 10 frames from the attack and 10 from the rest of the original tone. For fair comparison, we also tested with EqAll target spectra, and found that Eqall was indeed better than EqHalf 80% of the time. Figure 5: Average matching results for all test tones using simple and GA selection with EqHalf target spectra. 3.2 Simple Strategies and GA followed by Local Search This subsection compares the performance of the simple strategies and the GA when they are followed by local search. Fig. 6 shows the results for the tenor tone. The five methods mostly converge in the sense that nearly all find matches of approximately the same error level. Although the errors are about the same, the GA and simple strategy basis spectra are themselves quite different. Local search significantly improves the results of simple selection. For example, the improvement in the one- and two-wavetable matches is often greater than 8%. Local search only improves on GA selection by about 0.25%. EqAllEndpts gets stuck on a local optimum with two wavetables, though there is still a significant improvement. EqHalfNoEndpts also gets stuck with four Proceedings ICMC 2004

Page  00000004 wavetables, so that the error appears to slightly increase compared with three wavetables. No. of wavetables Figure 6: Matching results for a G3 tenor tone using simple and GA selection followed by local search. Fig. 7 shows the average results for the 39 test tones listed in Table 1. Overall, local search gives the most dramatic improvement when the number of wavetables is small. In particular, with one wavetable local search almost never gets stuck because it works very well when the number of parameters is small. Regardless of whether we start with a simple strategy or GA selection, local search results in approximately the same error level, which is close to that found by the GA before local search. Local search also sometimes gets stuck by chance. In other words, starting from an initial best does not guarantee the best final result. A practical solution to this problem is to use local search after each of the four simple strategies and keep the best solution. 4 CONCLUSION This paper has described a local search method for wavetable matching. It starts the search for artificial basis spectra using four simple spectral snapshot selection strategies, and repeatedly looks for local improvement. In effect, this locates small, promising regions of the vast search space and finds local optima. We have compared the performance of GA selection and four simple strategies (namely EqAllEndpts, EqAllNoEndpts, EqHalfEndpts, and EqHalfNoEndpts) before and after local search. Before local search, GA selection always outperformed the simple strategies, while none of the simple strategies stood out as being consistently best. After local search, the simple selection matches were dramatically improvedby up to 30%, especially when the number of wavetables was small. As a result, all five methods converged, with mean errors within a range of 1%, though occasionally local search got stuck on a bad local optimum. To avoid getting stuck on bad local optima, we took the best solution found by the four simple selection strategies followed by local search. This simpler alternative to GA optimization produced the same quality results. In fact, local search actually improved on GA selection slightly. Moreover, even starting from various initial basis spectra, local search can usually find different matches of approximately the same error level. These results suggest that the GA matches are near-optimal, and also that there are many near-optimal solutions in the wavetable search space. References Dolson, M. (1986). The phase vocoder: a tutorial. Computer Music J. 10(4), 14-27. Homer, A., J. Beauchamp, and L. Haken (1993, May). Methods for multiple wavetable synthesis of musical instrument tones. J. Audio Eng. Soc. 41(5), 336-356. Press, W. H. et al. (1992). Numerical recipes: The Art of Scientific Computing in C (Second ed.)., pp. 412-420. Cambridge, UK: Cambridge University Press. Sandell, G. J. and W. L. Martens (1995, December). Perceptual evaluation of principal-component-based synthesis of musical timbres. J. Audio Eng. Soc. 43(12), 1013-1028. Serra, M. H., D. Rubine, and R. Dannenberg (1990, March). Analysis and synthesis of tones by spectral interpolation. J. Audio Eng. Soc. 38(3), 111-128. 0.6F 0.5 - - 0.4 ~ 0.3 0.2 -x-- GA selection -B- EqAIIEndpts S.. EqAIINoEndpts -A- EqHalfEndpts A~ EqHalfNoEndpts ~. 0.1 1 2 3 4 5 No. of wavetables Figure 7: Average matching results for all test tones using simple and GA selection followed by local search. Proceedings ICMC 2004