Parameter Estimation of a Plucked String Synthesis Model with Genetic AlgorithmSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact email@example.com to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 00000283 Parameter Estimation of a Plucked String Synthesis Model with Genetic Algorithm Janne Riionheimo1, Vesa Valimaki,2 ILaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland 2Pori School of Technology and Economics, Tampere University of Technology, Pori, Finland email: firstname.lastname@example.org Abstract We describe a technique for estimating control parameters for a plucked string synthesis model using genetic algorithm. The model has been intensively used for sound synthesis but the fine tuning of the parameters has been carried out with a semi-automatic method that requires some hand adjustment with human listening. An automated method for extracting the parameters from recorded tones is described in this paper. 1 Introduction Model-based sound synthesis is a powerful tool for creating natural sounding synthesized tones by simulating the sound production mechanisms and physical behavior of real instruments. These mechanisms are often too complex to simulate in every detail, so simplified models are used for synthesis. The aim is to generate a perceptually indistinguishable model for real instruments. One workable method for physical modelling synthesis is based on digital waveguide theory proposed by Smith (Smith 1992). In the case of the plucked string instruments that can be extended to model also the plucking style and instrument body. Synthesis model of this kind can be applied to synthesize various plucked string instruments by changing the control parameters and using different body and plucking models. Our interest in this paper is the parameter estimation of the model. We use recorded tones as a target sound with which the synthesized tones are compared. Genetic algorithm is used to minimize the perceptual error between the tones. This paper is sectioned as follows. The plucked string synthesis model and the control parameters to be estimated are described in Section 2. Parameter estimation problem and methods for solving it, particularly genetic algorithm, are discussed in Section 3. Experiments and results are analyzed in Section 4 and conclusions are made in Section 5. 2 Plucked String Synthesis Model The model used for plucked string synthesis in this study is shown in Figure 1. It is based on digital waveguide synthesis theory (Smith 1992) that is extended in accordance with commuted waveguide synthesis approach (Smith 1993) (Karjalainen,Vdilimdiki, and Janosy 1993) to include also the body modes of the instrument to string synthesis model. Horizontal polarization Vertical polarization Figure 1: The plucked string synthesis model. Different plucking styles and body responses are stored as wavetables in the memory and used to excite the two string models Sh(z) and S, (z) that simulate the effect of the two polarizations of the transversal vibratory motion. A single string model S(z) in Figure 2 consists of a lowpass filter H(z) that controls the decay rate of the harmonics and a delay line z-L that controls the fundamental frequency in conjunction with a fractional delay filter F(z) (Jaffe and Smith 1983). Two string models are typically slightly mistuned to produce a natural sounding beat effect. A one-pole filter H (z) + a H ) + az-1 (1) 283
Page 00000284 Figure 2: The ba~sic string model. is used as a loop filter in the model. Parameter g (0 < g < 1) in equation 1 determines the overall decay rate of the sound while parameter a (-1 < a < 0) controls the frequencydependent decay. Excitation signal is divided by mixing coefficient m, before sending it to two string models. Coefficient g, enables coupling of the two polarizations. Mixing coefficient m0 defines the proportion of the two polarization in the output sound. All parameters mp, gc and m0 are chosen to have values between 0 and 1. The transfer function of the entire model is wriffen as M~(z) mpnmSh(z) + (1 - m,)(1 - m0)S,(z) + + mlp(1 - m0)gcSh(z)S,(z), (2) where the string models Sh (Z) and S~,(z) for two polarizations can be written as an individual string model S) 1 - z-LF(z)H(z) (3 Synthesis model of this kind is proposed by Karjalainen et al. (Karjalainen, V~ilim~iki, and Tolonen 1998) and has been intensively used for sound synthesis of different plucked string instruments. Diflferent methods for estimating the parameters has been used, but in consequence of interaction between the parameters, systematic methods are at least troublesome but probably impossible. Eight parameters that are used to control the synthesis model are listed in Table 1. 3 Parameter Estimation of the Synthesis M/odel Parameter estimation for different sound synthesis systems is a difficult problem that has usually been realized by trial and error method or by measuring the physical behavior of the real sound of an instrument and extracting the synthesis parameters from obtained data. This type of calibration technique for a plucked string synthesis model is proposed parameter effect f Oh fundamental frequency of the horizontal string model fo,, fundamental fr-equency of the vertical string model ghoverall gain of the horizontal string model ah frequency dependent gain of the horizontal string model Svoverall gain of the vertical string model afrequency dependent gain of the vertical string model in1. input mixing coefficient m0, output mixing coefficient ______ coupling gain of the two polarizations Table 1: Parameters of the synthesis model. by V~ilim~iki et al. (V~ilim~iki, Huopaniemi, Karjalainen, and.hinosy 1996). The fundamental frequency of a recorded tone is first estimated, and then amplitude trajectories for partials are analyzed using the short-time Fourier transform (STFT). The decay rate of each partial is measured, and the one-pole loop filter is designed to approximate the decay behavior. Another way to approach the parameter estimation problem is to consider it as a general search process where one tries to find an optimal set of parameters in the parameter space. Each point in the parameter space corresponds to a set of parameters and our aim is to find a set of parameters that produce desired output sound when used with a synthesis model. Thus, one has to define a quality value for each point in the parameter space, that denotes "how good" is the sound synthesized by the parameter set related to that particular point. This performance metric is usually called a fitness function or inversely error function, that denotes the error between the candidate solution and the desired sound. These functions give a numerical grade to each solution by means of which we are able to classify all possible parameter sets. The search process can be carried out with several methodologies but evolutionary algorithms have shown a good performance in parameter estimation of synthesis models. Vuori and V~ilimiiki (Vuori and V~ilim~iki 1993) tried a simulated evolution algorithm for the flute model and Homer et al. proposed an automated system for parameter estimation of FM synthesizer using a genetic algorithm (Homner, Beauchamp, and Haken 1993). Garcia also used a genetic algorithm for designing sound synthesis techniques automatically (Garcia 1998). 284
Page 00000285 3.1 Genetic Algorithm Genetic algorithms mimic the evolution of the nature and take advantage of the principle of "survival of the fittest" (Mitchell 1998). Genetic algorithms operate on a population of potential solutions improving characteristics of the individuals from generation to generation. Each individual called a chromosome is made up of an array of genes that contain in our case the actual parameters to be estimated. Parameters can be represented as binary or floating point numbers. A simple genetic algorithm can be implemented as follows: 1. Initialization: Create a population of popSize individuals (chromosomes). The initial values of the genes (estimated parameters) are randomly assigned. 2. Fitness calculation: Calculate the fitness of each individual in the current population. 3. Parent selection: Select individuals from current population for mating based upon the fitness values of individuals such that the better individuals have an increased chance of being selected. 4. Crossover: Create a new population (generation) by mating the selected parents to produce offspring. The fittest parents can also survive to new generation. 5. Mutation: Mutate a few individuals in the new population. 6. Replace the current population with the new one. 7. Repeat steps 2-6 until termination. The algorithm is normally terminated when a specified number of generations is produced or when the sum of the deviations among the individuals becomes smaller than some specified threshold. Several schemes are feasible for selection, crossover and mutation processes depending on which chromosome representation is used. The most critical and application dependent stage of operation is the calculation of the fitness value. Care has to be taken when considering how to rank the candidate solutions. 3.2 Fitness Calculation There is no known objective method for measuring the similarity between two sounds that would perfectly match with human perception. Common method used by Homer (Homer, Beauchamp, and Haken 1993) and Garcia (Garcia 1998) is to measure the least squared error of the short-time spectrum analyses of the two sounds. The STFT of signal y(n) as defined in is a sequence of discrete Fourier transforms (DFT): N-l Ym(k) = S w(n)y(n+mH)e-jiwn n=O m = 0, 1, 2,.. (4) with 27rk Wk =' N ' k =0, 1,2,...,N - 1 (5) where N is the length of the DFT, w(n) is a window function, and H is the hop size or time advance (in samples) per frame. If o(n) is the output sound of our model and t(n) is the target sound that is desired then the inverse of the fitness (error) of the candidate solution is calculated as follows: L-1 N-1 F= (L (iOm(k)l- \Tm(k)\)2, (6) m=0 k=0 where Om(k) and Tm(k) are the STFT sequences ofo(n) and t(n) and L is the length of the sequences m = 0, 1, 2,..., L1. 4 Experimentation and Results To study the efficiency of the proposed method we first tried to estimate the parameters from the sound produced by the model itself. First the same excitation signal extracted from the real recorded tone by the method described in (Vailimiiki and Tolonen 1998) is used for target and output sounds. More realistic case is simulated when the excitation for resynthesis is extracted from the target sound. Original and estimated parameters are shown in Table 2. parameter original estimated estimated parameter parameter parameter (original (extracted excitation) excitation) fOh 330 330.406 330.5598 fO, 331 331.412 331.5355 gh 0.989 0.9888 0.9893 ah -0.18 -0.1699 -0.1971 gv 0.991 0.9901 0.9914 a, -0.22 -0.2270 -0.1173 mp 0.5 0.4414 1.0000 mo 0.5 0.6061 0.9310 g_ 0.1 0.1474 0.4738 Table 2: Original and estimated parameters. 285
Page 00000286 When using the original excitation signal for resynthesis almost the exact parameters are estimated. Only small differences in mixing parameters mo, mp and coupling coefficient g, can be noticed. When trying to optimize only the mixing parameters with different values of gc we find that maximal fitness values are very equal. Synthesized tones produced by corresponding parameter values are also indistinguishable. That is to say the parameters mp, mo and g, are not orthogonal. Similar behavior is found when using an extracted excitation. Although the parameter values differ from the original ones the corresponding synthesized tone is indistinguishable from the target sound. Purpose of the two slightly mistuned string models is to simulate the beating effect. Here the beating effect is obtained by closing totally the straight path to the vertical polarization string (1 - mp) and increasing the coupling gain value gc. Because not only the excitation but the whole tone produced by horizontal polarization is now exciting the vertical string model the mixing coefficient mo is adjusted to attenuate the output of the vertical string model. The function of the gain parameters of the vertical string model differs now from the basic situation so they are not comparable with the original ones in the terms of efficiency of the optimization method. The cause of this functional rearrangement of the blocks of the synthesis model is the small dissimilarities between the excitation signals. Optimization method tries to adapt the model for best performance with the particular excitation signal. Convergence of the genetic algorithm when estimating the parameters for real sound can be seen in Figure 3. The perceptual quality of the resulting synthesized sounds is comparable or better than with our previous methods and it does not require any hand tuning after the estimation procedure. 1201C 1 1 I 5 Conclusions and Future Work Our method showed good performance in estimating parameters for the plucked-string synthesis model and it takes into account the difference between the fundamental frequencies of the two string models that was previously adjusted by hand. Through this method we are able to create parameter sets that corresponds to recorded real instrument tones with certain features. Final judgement of quality of a synthesized tones created with these parameter sets, is always done by human so the most critical field is the perceptual error calculation. In the future we will include more perceptual behavior to the fitness function. Also the automated arrangement of the functional blocks of the synthesis model is in our interest. References Garcia, R. (1998). Automatic generation of sound synthesis techniques. Master's thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. Homer, A., J. Beauchamp, and L. Haken (1993). Machine tonques 16: Genetic algorithms and their application to fm matching synthesis. Computer Music Journal 17, 17-29. Jaffe, D. and J. 0. Smith (1983). Extension of the Karplus-Strong plucked string algorithm. Computer Music Journal 7, 43-55. Karjalainen, M., V. Vilimaki, and Z. Janosy (1993, Sept. 10 -15). Towards high-quality sound synthesis of the guitar and string instruments. In Proc. 1993 Int. Computer Music Conf, Tokyo, Japan, pp. 56-63. Karjalainen, M., V. Vilimiki, and T. Tolonen (1998). Pluckedstring models: from the Karplus-Strong algorithm to digital waveguides and beyond. Computer Music Journal 22, 17-32. Mitchell, M. (1998). An Introduction to Genetic Algorithms. A Bradford Book, The MIT Press, Cambridge, Massachusetts, USA. Smith, J. 0. (1992). Physical modeling using digital waveguides. Computer Music Journal 16, 74-91. Smith, J. 0. (1993, Sept. 10-15). Efficient synthesis of stringed musical instruments. In Proc. 1993 Int. Computer Music Conf, Tokyo, Japan, pp. 64-71. Vilimiki, V., J. Huopaniemi, M. Karjalainen, and Z. Janosy (1996). Physical modeling of plucked string instruments with application to real-time sound synthesis. Journal of the Audio Engineering Society 44, 331-353. Vilimiki, V. and T. Tolonen (1998). Development and calibration of a guitar synthesis. Journal of the Audio Engineering Society 46, 766-788. Vuori, J. and V. Vilimaki (1993, Sept. 10-15). Parameter estimation of non-linear physical models by simulated evolutionapplication to the flute model. In Proc. 1993 Int. Computer Music Conf, Tokyo, Japan, pp. 402-404. 100 0) 8C 60 W 40 20 1\ ol 50 100 150 200 Generation 250 300 Figure 3: Convergence of the algorithm. 286