Page  215 ï~~Genetic Algorithm Optimization of Additive Synthesis Envelope Breakpoints and Group Synthesis Parameters Andrew Homer and Ngai-Man Cheung Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. email: homer@cs.ust.hk James Beauchamp School of Music and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, ilinois 61820 USA. email: j-beauch@uiuc.edu ABSTRACT: This paper presents genetic algorithm-based techniques for solving two music synthesis problems: 1) picking the best set of Nbreakpoints for line segment approximation of a tone's amplitude envelopes, and 2) finding the group synthesis parameters to best match a particular musical instrument tone. Both techniques avoid problems associated with more traditional approaches and provide an effective means of data reduction. We give results for the two techniques and show matches of various instrument tones. 1.1 Piecewise Linear Approximation of Envelopes Computer music applications commonly use piecewise linear (straight line segment) approximations (PLAs) to model amplitude envelopes. Line segments require a minimal amount of data to specify, and it is easy to modify and generate them. This technique dates back to the early days of computer music, and over the years many digital synthesizers have utilized the PLA method. Additive synthesis is one of the main synthesis applications which has often used piecewise linear approximations. A closed-form solution to the line segment approximation problem does not exist because the problem is very nonlinear, moving just one breakpoint changes how well the approximation matches the original envelope over the length of the neighboring segments. Several researchers have reported fitting amplitude and frequency envelopes by hand (Risset and Mathews 1969, Grey and Moorer 1977), and at least two have proposed automated methods for reducing the approximation error below some predetermined threshold (Beauchamp 1969, Strawn 1980). However, our objective in this article is to determine the best possible approximation for a specified number of line segments. For example, if we want to design an instrument patch using a keyboard synthesizer constrained to only 5 line segments per envelope, we need an automatic procedure which tells us how to best utilize the line segments. Strawn's 1980 method (ADJUST) attempted this, but it required an initial estimate of the solution, and the solution it ultimately found was usually only slightly improved over the initial guess. This is because ADJUST used a hiUclimbing procedure to improve on an initial guess and thus was restricted to converging to some nearby local optimum. While most PLA attempts in the past, whether by hand or by automatic method, used independent breakpoints for each envelope, we have decided to focus on the problem of picking the N best interior breakpoints common to all harmonics of a sound. For amplitude envelopes, we assume zero-valued boundary breakpoints at the beginnings and ends of tones, and thus do not count them among the breakpoints we must optimize. Using common breakpoint times has a number of advantages: For one, common breakpoints immediately save a factor of two in data storage requirements over independent breakpoint times. Moreover, independent times give equally good or better results compared to common times when we compare according to equivalent memory requirements. Finally and perhaps most importantly, we can easily show that additive synthesis is mathematically equivalent to wavetable interpolation synthesis if the time-varying partial frequenciesf(e) are harmonically related (i.e.,fk(t) = k IC MC P ROC EE D I N G S 199521 215

Page  216 ï~~f(t)) and the phases are consistent (Serra et al, 1990). In most situations wavetable interpolation synthesis is much faster than additive synthesis. The next section introduces two new greedy and genetic algorithm techniques for breakpoint selection, and compares them against simple intuitive breakpoint selection procedures. As a fitness measure for evaluating how well an approximation matches the original signal, we define a relative error measure as follows: R elative = 1 k. (ak(n) - a'k(n2 1 1 Error NptsNb_ where a'k(n) is the piecewise linear approximation to the kth harmonic amplitude at the nth frame, ak(n) is the corresponding amplitude of the original signal, Noa, is the number of harmonics used, and N,3 is the number of analysis frames spanning the particular musical sound. 1.2.1 Greedy Breakpoint Selection Greedy breakpoint time selection uses an iterative procedure to determine the best breakpoint times. First, the greedy procedure enumerates all possible breakpoint times and selects the one providing the best fit to the original according to Equation 1 (giving two line segments per envelope). Next, keeping the first breakpoint time, the procedure tries all the other possible times, and, again retains the best one. The procedure continues following this pattern to find more times, adding one at each step. While the first breakpoint set, consisting of a single time value, is optimal, in successive steps the procedure does not consider changing previously found breakpoint times, but rather only augments the previously found set. The greedy method therefore only returns locally optimal solutions, instead of the best ones possible. This method also has the disadvantage that it needs to compute N-1 breakpoint approximations before it can solve for the Nth approximation, but since each iteration takes little time to compute, this is not a big problem. As demonstrated below, the greedy breakpoint selection algorithm does not generally yield as good approximations as the GA method for the same number of breakpoints. However, it does achieve results faster. Also, the greedy procedure can achieve the same amount of error if the penalty of requiring more breakpoint times is acceptable. For a few breakpoints, with a typical instrument tone having about 30 harmonics, the greedy algorithm completes in a few seconds on a Silicon Graphics Indigo workstation (this machine benchmarks at approximately 100,000 Dhrystones/second); for 20 or more breakpoints, the greedy algorithm requires about a minute. 1.2.2 Genetic Algorithm-Based Breakpoint Selection The method for piecewise linear approximation proposed in this section uses genetic algorithm optimization to select the breakpoint times. Genetic algorithms (GAs) use natural selection and evolutionary-inspired operators like crossover and mutation to optimize combinatorial problems such as breakpoint selection (Goldberg 1989). GAs work with a population of candidate solutions to efficiently examine the search space and explore its various local optima in parallel. This robust parallel exploration is well-suited to the PLA problem where it is better to avoid converging to local minima as hillclimbing does. We can use the GA to systematically explore the search space and find the best combination of breakpoint times. To use a GA, we must specify an objective fitness function and a binary encoding of the breakpoints. Equation 1 gives the error which we wish to minimize, and it defines the fitness function. Coding the breakpoint times is straightforward: We simply represent each time frame's integer value as a binary number. 216 6IC MC PROCEEDINGS 1995

Page  217 ï~~Considering its robustness in achieving near-optimal results, the genetic algorithm is an efficient iterative procedure. The speed of the GA depends directly on the number of breakpoints and the number of bits required to represent the breakpoint times. For a few breakpoints, with a typical instrument tone having about 30 harmonics, the GA completes in about a minute on a Silicon Graphics Indigo workstation; for 20 or more breakpoints, the GA requires several minutes. Skipping some of the analysis frames in the fitness function reduces the turnaround time with only a modest loss of accuracy (Homer et al. 1993). 1.3. Results for GA-Based Envelope Approximation This section gives the results of applying the equally-spaced, random, greedy, and GA procedures to determine common breakpoint times for a guitar tone (we have also tested the procedure on other instruments like the trumpet and piano). We use an equally-spaced selection procedure that selects half the breakpoints in the attack and the rest over the remainder of the tone. For random selection, we select the best breakpoint set of 100 random trials. As is shown in Figure 1, the guitar tone's harmonic amplitude envelopes all have very rapid attacks. Some of the harmonics (e.g., 1, 3, and 6) go immediately into rather long decays. Other harmonics, such as the 2nd and 7th, have "two segment decays", where very rapid initial decay curves are followed by much slower decays. In general, the higher harmonics tend to fade out faster than the lower ones, leaving the 1st, 3rd, and 6th harmonics to dominate the sound of the final decay.,,,.,.,,,,,..,.. --..... Figure 1. Amplitude envelopes for harmonics 1-5 (left) and 4-10 (right, different scale) of the 98Hz guitar tone. Figure 2 summarizes the results found for the various breakpoint selection procedures. The equally-spaced breakpoint selection procedure returned errors of more than 100% for 7 or fewer breakpoints; the strategy simply did not select breakpoints in places which properly characterized the decay pattern. The other 3 methods performed about the same as one another up to 3 breakpoints, but the GA selection method significantly outperformed the others in the range of 4-7 breakpoints. For 8 or more breakpoints, the greedy and GA methods give about the same results. Â~ -b16,......... 2 Thesam breakoint are.. use for...... eac harmoni..1 --- 1 S D4 6 O T a 9 10 11 1Z I 14 16 e1O 91 Figre 2._ Errors returned by the breakponint selection methods on the guitar, Figure 3 illustrates that the GA strategy indeed manages to find a reasonable solution with as few as 4. breakpoints. The first few breakpoints focus on the tone's decay ignoring the attack completely. Since ICMC PROCEEDINGS 199521 217

Page  218 ï~~overestimating the amplitudes can cause relative errors greater than 100% (i.e. the equally-spaced solutions for 7 or fewer breakpoints), it makes sense to conservatively build up the decay first. The first and third harmonics of the 3-breakpoint GA approximation look good and also result in a good sound, if you ignore the attack. The 4-breakpoint solution sounds very much like the original, while the 5-breakpoint version shortens and refines the initial decay of the first and third harmonics. Indeed, this 5-breakpoint version sounds quite similar to the original tone. Comparable sounding results require 10 breakpoints for the greedy method, and more than 20 breakpoints using equal-spacing or random selection. GA optimization of the breakpoint times clearly pays off with the guitar sound. original 2 breakpoints 3 breakpoints Fiur. Hamnc.- fG piceie ina aprxmtost h utr 1.4neprtto ofbreakpoint Seeto brelts nRted6Work acoi We have found that the basic genetic algorithm-based procedure determines the best shared breakpoint times for piecewise linear approximation of harmonic amplitude envelopes. The GA breakpoint selection method is also easily extended for piecewise linear approximation of frequency envelopes. Because of all the methods we have investigated the GA has consistently given superior results, we consider this method to be the best when looking for optimum solutions for specific numbers of breakpo~ints. Indeed, for hardware synthesis, where the number of breakpoints is often quite limited, the GA approach is clearly the best to use. However, when 10 or more breakpoints are available, the greedy method of breakpo~int selection also performs very well, generally converging to a solution similar to that produced by the GA. Since the greedy approximation method usually computes faster then the GA (only taking a few seconds for each iteration), it is perhaps the best method for software synthesis, where fast analysis/synthesis could be important and the possible need for a few extra breakpoints (to reduce the error to a level as low as the GA would give) would not be a great concern. While an objective fitness function is necessary to compute good PLA solutions, these results still do not tell us how many segments are needed for a "good synthesis." However, our formal listening tests showed that 12 GA-optimized breakpoints for a trumpet tone and 9 for piano and guitar tones were needed in order to cause average listeners to confuse synthetic tones for original ones at least 50% of the time. 218 2C MC P ROC E E D I N G S 1995

Page  219 ï~~2.1 Group Synthesis This section introduces an automated method for matching and reproducing musical instrument tones through a special case of wavetable synthesis where the wavetables contain disjoint sets of harmonics. Previous work has called this "group additive synthesis" (Kleczkowski 1989), but we will just call it "group synthesis," since it is really a form of wavetable synthesis, not additive synthesis. As an example, one wavetable might contain only the even numbered harmonics, while another only the odd harmonics. We will use the term basis spectrum to refer to the harmonic content of a wavetable, and for this example, the basis spectrum of the first wavetable would only contain non-zero values at the odd harmonics. Group synthesis is efficient and its parameters are easy to interpret. Since each wavetable controls a disjoint set of harmonics, modification of the wavetables or their amplitude envelopes yields predictable effects. Kleczkowski's original work did not shown how to automatically group the wavetables' harmonics. Handgrouping works okay for some simple tones like the grouping of odd and even harmonics on the clarinet, but we need an automated method when using more than two groups with more complicated spectra. Researchers have proposed several techniques to determine wavetable synthesis parameters (Stapleton and Bass 1988; Serra et al. 1990, Homer et al. 1993), but none of these directly apply to the special case of group synthesis. Eaglestone and Oates (1990) followed up Kleczkowski's work and presented a clustering scheme for group synthesis optimization based on building a tree structure of groups using an iterative greedy procedure. As a greedy procedure, the clustering scheme is prone to finding local, rather than global optima. We need a procedure with more flexibility to explore the search space if we expect to find a near-optimal solution. Figure 4 shows an outline of the matching and synthesis steps for reconstructing a tone through group synthesis. First, a short time spectrum analysis of the original sound gives the tone's frequency representation. The next step is to find the groupings, wavetables, and wavetable amplitude envelopes that give the best spectral fit. Finally, we reconstruct the tone with the determined parameters and group synthesis. This paper introduces a genetic algorithm-based method for group synthesis parameter optimization. ttm,-veryum gamueic tmm I ahnIod ix Figure 4. Overview of the group synthesis matching process. 2.2 Group Synthesis Matching with Genetic Algorithms Group synthesis matching requires optimization of two types of parameters: the wavetables' basis spectra and their time-varying amplitude envelopes. This section presents a genetic algorithm (GA) method to determine the grouping and relative amplitudes of the harmonics. We can easily compute the amplitude envelopes if we have already determined the wavetable basis spectra. We calculate them using least squares 1C MC P R O C E E D I N GS 199521 219

Page  220 ï~~as described in Homer et al. (1993). We determine the group synthesis basis spectra through a two-step process --- GA optimization followed by hillclimbing refinement. GA optimization locates the regions of good basis spectra in the search space, while hillclimbing finds the local optima in these regions. The GA uses a fitness function to guide the search for a good solution. We use the same relative error measure as we used in breakpoint optimization in Equation 1. We determine the harmonic relative amplitudes of the basis spectra from the peak RMS spectral snapshot of the original tone. We then use the GA to determine the grouping of the harmonics. We encode the parameters for the GA as shown in Figure 5. The grouping is given by Hk for 1< k N,s. If H4 is equal to J, then the kth harmonic appears in basis spectra J. Even though all the amplitudes come from the same spectral snapshot, the groups have independent amplitude envelopes which allow for dynamic spectral variations. bitstnng length haBitLength H1 f H2 H3..... Nabs = No. of Wawtab)s Nhan- No. of Harmoic haBitL.ongth =ceat( log 2 Ntabs ) Litsikng length = haBitLcngth * Nhars Figure 5. Encoding of the grouping. We use hillclimbing to refine the GA's answer. Hillclimbing works as follows: we take the grouping determined by the GA, and starting from the first harmonic, test other possible groupings by changing one harmonic at a time. For example, with a 3-wavetable match we might obtain the grouping <312211> from the GA. We test different groupings for the first harmonic, that is, <112211> and <212211>. We then select the grouping with the lowest relative error. After that, we test the second harmonic in the same way. The hillclimbing pass finishes after completing the highest harmonic. 2.3 Results for GA Group Synthesis We have applied the genetic algorithm group synthesis procedure to match trumpet, guitar and tenor voice tones, and present the trumpet results in this section. We have found excellent matches using four groups. Of course, more wavetables give even better matches. Figure 6 shows the groupings for the 4-table trumpet matches. The GA clearly classifies similarly shaped harmonics into the same groups, especially the high-amplitude harmonics. Groups 3 and 4 look similar, but group 3 has a slower initial attack and an earlier decay. Figure 7 shows the amplitude envelopes of the trumpet's original and matched 2nd and 4th harmonics. The 2-wavetable match is not adequate to model the envelopes, but the 4-wavetable match is quite excellent. Figure 8 shows the convergence of error using the GA and hillclimbing refinement. The errors decrease monotonically and exponentially with increasing number of wavetables. The results are similar to those from a related genetic algorithm wavetable matching model (Homer et al. 1993), and group synthesis offers additional control since the amplitude envelopes directly determine the shapes of the harmonics. 220 0IC MC PROCEEDINGS 1995

Page  221 ï~~Harz HarS Har3 Har4 Harl Har6 Har9 Mar 1 Mar 11 "Only the fimt 11 harmonics are shown' Figure 6. Grouping determined by the GA for a 4-wavetable trumpet 2nd Harmonic 4th Harmonic no tmo Mme Â~oe tmo tooo tse tw ems +mo use tme tse >re. tso tmo tao 2 table matcba o "Ie~c em as so 2 table mach 4 table mach O Â~ 4 - Figure 7.2nd and 4th harmonics of the trumpet in the original, and 2- and 4-wavetable group synthesis matches. I C MC P ROCE ED IN G S 1 99 521 221

Page  222 ï~~OA and hilclimbing group synthesis matches 0.5, mw etgutar 0.45 tnr...... 0.4 0.35 0.3 0.25 0.2 0.1 0.05 ---._.................................-- - - -.Â~............ 2 3 4 5 6 7 8 9 Number of Tables Figure 8. Convergence of error with increasing numbers of wavetables for the GA group synthesis matches. 3 Conclusion Genetic algorithms provide a powerful general tool for data reduction of spectral data, as we have seen for the piecewise linear approximation and group synthesis problems. We expect to find further spectral modelling applications for the GA method in future work. Acknowledgements This work was supported in part by the Hong Kong Research Grant Council's Project HKUST597,94E. References Beauchamp, J. 1969. "A Computer System for Time-Variant Harmonic Harmonic Analysis and Synthesis of Musical Tones", in Music By Computers, H. von Foerster and J. Beauchamp, eds. New York: John Wiley & Sons. Eaglestone, B. and oates, S. 1990. "Analytical Tools for Group Additive Synthesis", in Proc. 1990 Int. Computer Music Conf., San Francisco, CA: Int. Computer Music Assn., pp. 66-68. Goldberg, D. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning, Reading, MA: Addison-Wesley. Grey, J. and Moorer, J. 1977. "Perceptual Evaluation of Synthesized Musical Instrument Tones", Journal of the Acoustical Society of America 62: 454-462. Homer, A., Beauchamp, J., and Haken, L. 1993. "Methods for Multiple Wavetable Synthesis", Journal of the Audio Engineering Society 41(5): 336-356. Kleczkowski, P. 1989. "Group Additive Synthesis", Computer Music Journal, 13(1): 12-20. Risset, J. and Mathews, M. 1969. "Analysis of Musical Instrument Tones", Physics Today 22(2): 23-30. Serra, M.-H., Rubine, D., and Dannenburg, R. 1990. "Analysis and Synthesis of Tones by Spectral interpolation", Journal of the Audio Engineering Society 38(3): 111-128. Stapleton, J. and Bass, 5., 1988, "Synthesis of Musical Tones Based on the Karhunen-Lobve Transform", IEEE Trans. on Acoustics, Speech, and Signal Processing, 36(3): 305-3 19. Strawn, J. 1980. "pproximation and Syntactic Analysis of Amplitude and Frequency Functions for Digital Sound Synthesis", Computer Music Journal, 4(3), pp. 3-24. 222 I C M C PROC EE D I N GS 1995