Page  2 ï~~Extended Nonlinear Waveshaping Analysis/Synthesis Technique James Beauchamp1 and Andrew Homer1'2 1 ComputerMusic Project and 2CERL Sound Group, University of Illinois at Urbana-Champaign, Urbana, IL 61801 e-mail: ABSTRACT Nonlinear synthesis can be used to capture the primary features of a musical instrument sound. In the conventional model, a sine wave of variable amplitude is used to index a single fixed wavetable. The analysis problem is to find a wavetable function (or a target spectrum to derive it), a sine wave amplitude (a(t)), and a post-multiplier amplitude envelope (3(t)) which minimize the error between the synthetic and original spectra. In addition, we process the output with a fixed two-parameter high pass filter We have found that appreciably better matches can be obtained using two or more nonlinear modules. A genetic algorithm is used to select the filter parameters of the modules and the time points for selecting target spectra from the original sound. The best strategy found so far is an iterative process which determines the c(t) and a(t) envelopes for each module by using a(t) to match the time-variant spectral centroid of the input spectrum and p(t) to minimize the spectral error of the match, where the input spectrum is first the original spectrum but on subsequent iterations is the error spectrum. Average relative errors for three sounds tested were approximately 20% for one module, 15% for two modules, and 10% for three modules. 0. INTRODUCTION Several methods have been proposed for computational models of musical instrument sounds. A successful model should be capable of emulating typical musical instrument behavior over wide pitch and dynamic ranges. Ideally, it should also have the ability to vary timbral characteristics on a note-to-note basis, which implies that the model allows for direct manipulation of salient timbral features. Nonlinear waveshaping is a synthesis method which has the ability to match a particular spectrum at one value of its index (e.g., at one point in time) and to match the original signal's spectral centroid ("brightness") and overall spectrum (in the least squares sense) over the signal's duration. Traditionally, the method has utilized a single nonlinear module. Since this method can incur appreciable matching error, we have decided to extend the method by using multiple nonlinear modules. We have found that the use of the two modules often offers a decided improvement in matching accuracy over the use of one module. 1. SINGLE NONLINEARITY MATCHING METHOD A polynomial function F(x) of order n is constructed such that the harmonic spectrum generated by F(cos(2wrflt)) exactly matches a particular target spectrum (dk(l)). Thus, n F(cos(2irflt)) = L_ dk(1) cos(k2ntflt). [la) k=l For a method for computing the: polynomial coefficients of F(x), please refer to [Arfib 1979; Beauchamp, 1982]. With F fixed, we can now vary the spectrum { 41 by varying the cosine amplitude x:

Page  3 ï~~F(cxcos(2tf1t)) = X dk(a) cos(k2irflt). [Ib] k=1 Each harmonic amplitude is a function of a, and when ca=l the target spectrum is achieved. However, we also desire that as a increases, the dk will increase smoothly to the target values and beyond in such a way that the spectral centroid and overall amplitude increase monotonically. Unfortunately, such monotone behavior only occurs for a certain restricted class of target spectra. We know of no proof in this area, but experience demonstrates that target spectra which roll off smoothly with a certain rate of descent result in "well-behaved" dk(a) behavior. However, this does not work for most useful musical instrument spectra, which are too bright. To fix this problem, a second-order high pass filter (with response H(j2itf)) is used after the polynomial function to enhance the higher partials, producing the spectrum {ck(cx)} = { IH(jk2itf1)I dk(a) ). This means that if {Ck(1)} matches a particular musical instrument spectrum, the target spectrum for the nonlinearity can be calculated using dk(l) = ck(l) / IH(jk2tf1)I, [2a] where H(s) = s2 / (s2 + 2Cscoc + wCc2), cwc = 2tfc, s = jco = j2rtf. [2b] fc and are the cutoff frequency and damping factor of the filter, respectively. These parameters are optimized to achieve the best overall match using a genetic algorithm (GA) [Horner et al 1992]. The GA is also be used to select the best spectrum "snapshot" (corresponding to a best time value) from the original time-varying spectrum to serve as the target spectrum. The synthetic output signal is then given by h(t) * [P3(t) F(a(t) cos(2irf1t))], [3a] where h(t), the impulse response corresponding to H(s), is convolved with the nonlinearity output and P3(t) is a post-multiplier envelope. a(t) and P3(t) are time-varying parameters which are determined by first matching the spectral centroid of the original sound and then matching the entire spectrum in the least-squares sense at each time value [for details, see Beauchamp 1982). In the frequency domain this is equivalent to ck'(t) = H(jk2itf1) fa(t) dk(c(t)), [3b] which can be used to predict the spectral behavior of the synthesis model. Note that when ci=l, the target spectrum is achieved, and P also equals 1; otherwise P3(t) can be used to correct any deficiencies in the match between ck'(t) and the original sound spectrum. The { dk} can be computed as a function of a and the polynomial coefficients of F, according to procedures given in previous articles [Arfib 1979; Beauchamp 1982]. We define the degree of relative error between the synthesized and original signal spectra to be n n sre1(t) = sqrt( X (ck(t) - Ck'(t)) ) / sqrt( X Ck(t) -). [4a] k=l k=l and the average relative error over the tone's duration is given by

Page  4 ï~~tdu Ere= ( / tdr) Erei(t) dt [4b] 0 We have found that for most instrument tones, we can not achieve a sufficiently small relative error with a single nonlinear module. Also, another problem is that the time behaviors of the individual synthesized harmonics (ck'(t)) are too similar. To introduce dissimilarity, two or more nonlinear modules are needed. 2. MULTIPLE NONLINEARITY MATCHING METHOD In this model, two or more nonlinear modules are combined, but the same high pass filter is used. Taking a cue from Equation 3a, the algorithm becomes h(t) * [ p1(t) F(cxl(t) cos(2irf1t)) + p2(t) F(cx2(t) cos(2tf t)) +... ], [5a] which has the spectral domain equivalent cmk(t) = H(jk2if 1) [ p1(t) dlk(cxl(t)) + 132(t) d2k(cx2(t)) +... + f3m(t) dmk(al(t)) ] [5b] To match this model against the original time-varying spectrum, we use the following 'greedy" strategy: First, the best values of fC and (which determine H() ), {dlk(1)}, al(t), and [1(t) are found using the method of the previous section. Second, the first nonlinear approximation of the original {ck(t)} using clk(t) = H(jk2irf1) f31(t) d lk(cl (t)) is computed and subtracted from the original. The residue thus formed becomes the basis for the next step of the analysis. The same method as the previous section is applied to the residue, thus determining { d20}, o2(t), and p2(t). These give the nonlinear approximation to the residue,which, in turn, is added to the previous approximation. The updated approximation is then subtracted from the original to form the next residue, and so forth. Thus, for each iteration, a new module is "peeled off' from the original and hopefully makes a substantial improvement in the match. Meanwhile, in order to gauge progress, the relative error according to Equation 4a and 4b is recomputed for each iteration. We have found, however, that, in terms of calculated and perceptual error vs. complexity, the payoff is usually not sufficient to justify going beyond two modules. 3. RESULTS We expect the nonlinear analysis/synthesis technique described above to work best for sounds with fairly smooth spectra characterized by a consistent spectral centroid vs. dynamic level behavior. Thus far, we have tested the technique on tones of a trumpet, an oboe, and a guitar. The resulting relative errors are given in the table below. Number of Modules 1 2 3 Trumpet (F4) 0.220 0.154 0.136 Oboe (A4) 0.179 0.131 0.115 Guitar (A2) 0.182 0.152 0.046 Average 0.194 0.146 0.099 Table 1. Relative errors ( rc) for three instruments with variable number of nonlinear modules. 4

Page  5 ï~~4. CONCLUSIONS AND OBSERVATIONS To build a synthesis model which can handle a wide range of pitches and dynamics, one approach is to store multitudinous parameter sets to handle all the various cases. A more elegant, although not necessarily as accurate, approach is to find parameters which can stretch over wide ranges of pitch and dynamic. As long as an instrument's spectral envelope is reasonably consistent and smooth and does not change much as a function of time, we can reasonably expect that the parameters of the high pass filter do not need to change drastically as a function of pitch. The dual function of the principal nonlinear module, to exactly match a particular target spectrum and to provide a good rendition of the instrument's dynamic behavior on a specific pitch, could also be compromised. For example, rather than using a special nonlinear function to match a particular target spectrum within each note, a single nonlinearity could be used to give a representative fit over a group of tones. Since the intuitive spectral centroid parameter, which we call BR ("brightness"), can be used to control cx in a one-to-one fashion, this parameter could be used creatively to produce all sorts of variants of the synthetic tones. A second nonlinear module can be thought of in two respects: 1) It provides a better fit to the original. 2) It provides a dissimilarity amongst the various harmonic amplitude envelopes. This is obviously a more complex, probably nonintuitive, module to control. It remains to be seen whether this module can be handled creatively, or whether one must rely on a memory-intensive approach to determine appropriate parameter values for it. One area in our analysis-matching method which needs improvement is the algorithm we use for computing error (see Equation 4 a,b). This has been formulated to measure the rms difference in two spectra relative to the current rms amplitude over the duration of a tone. We find that this fails to predict an improvement in synthesis quality when the synthetic sound contains many low level, high frequency components not present in the original sound. Such components contribute little to the rms error, but are very important perceptually. An error criterion which takes into account human perceptual characteristics, and thus predicts human preference, would be a substantial improvement over our present criterion. In conclusion, probably the most important results of the nonlinear analysis/synthesis technique described in this paper is that substantial data reductions can be achieved, that the parameters so derived are fairly intuitive to handle, and that they have a good chance to stretch well over broad ranges of pitch and dynamic. 5. REFERENCES Arfib, D. 1979, "Digital Synthesis of Complex Spectra by Means of Multiplication of Nonlinear Distorted Sine Waves", J. Audio Engr. Soc., 27: 757-768. Beauchamp, J.W. 1982. "Synthesis by Amplitude and 'Brightness' Matching of Analyzed Musical Instrument Tones", J. Audio Engr. Soc., 30: 396-406. Homer, A., Beauchamp, J., Haken, L., 1992, "FM Matching Synthesis with Genetic Algorithms", submitted to the Computer Music Journal.