Page  00000001 A Waveguide Mesh Model of High-Frequency Violin Body Resonances Patty Huang* Stefania Serafin and Julius O. Smith III CCRMA, Stanford University, CA, USA pph@ccrma.stanford.edu, serafin @ ccrma.stanford.edu, jos@ccrma.stanford.edu Abstract This paper describes use of a digital waveguide mesh as a "small box reverberator" which provides certain desirable components of the violin-body frequency response. The goal is to find a mesh with resonances which are sufficiently similar statistically to those of a real violin body at high frequencies. Low-frequency modes of the violin body are modeled explicitly using a bank of resonators. The Energy Decay Relief of the high-frequency portion of violinbody "tap response" is analyzed over a Bark frequency axis to determine the decay time for each "band of modes" in the high-frequency response. A 3D waveguide mesh with edge dimensions comparable to those of a real violin body is embedded with first-order lowpass filters to yield the desired decay time in each Bark band. 1 INTRODUCTION An important element of a violin is its body, which filters vibrations that propagate from the string through the bridge. In real-time synthesis of a violin, there is some difficulty in modeling the body because of a tradeoff between accuracy and computational cost. If all the resonances of the body are accounted for by modeling each one with its own pair of filter poles, the computational cost is too high. On the other hand, one cannot implement too few filter poles and neglect the large number of resonances, because the complex filtering of the body contributes strongly to the characteristic timbre of the violin. 2 RELATED PRIOR WORK An efficient implementation of plucked and struck string synthesis can be based on "commuted synthesis" [Smith, 1993, Karjalainen et al., 1993], in which the body response is read from a wavetable and injected into a waveguide string as the excitation source. However, this solution is applicable only to linear, time-invariant (LTI) systems, a condition which plucked and struck strings generally satisfy well. In the case of a bowed string, the nonlinearity of the bow/string interaction does not allow this method to be used in a precise way, although approximations are possible [Smith, 1997]. 3 THE VIOLIN BODY The violin body acts as a resonator for the vibrations generated from the strings. The coupling of air cavity modes and top and back plate modes produces the complex filtering which contributes strongly to the characteristic timbre of the violin [Hutchins, 1998]. At lower frequencies below 3kHz or so, the wood modes predominate, and at higher frequencies there are more air modes than wood modes [Hirschberg et al., 1995]. * Summer work at the Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, supported by the Academy of Finland. The tap response of a violin body is shown in Fig. 1. This response was obtained by exciting the body vertically with an impulse hammer in an anechoic chamber, with the microphone placed approximately 30 cm directly above the midpoint of the bridge. The bottom of the figure shows the corresponding "frequency response".1 Note the large number of high-frequency resonances. (a) E 0.5 - 0 10 20 30 40 50 60 70 80 90 Time (ms) 100 0 2 4 6 8 10 12 14 16 18 20 Frequency (kHz) Figure 1: Violin body tap response in the (a) time domain and (b) frequency domain. The "noise floor" in plot (b) begins around 14 kHz. 4 PSYCHOACOUSTIC IMPORTANCE At low frequencies, the ear is sensitive to the precise tuning of the resonant modes because its frequency resolution is high in this range. At higher frequencies, on the other hand, 1In this study, the violin frequency response we use is the Fourier transform of the tap response itself, and not the tap response deconvolved with the tap excitation signal measured by the force hammer.

Page  00000002 the frequency resolution of the ear is coarser, and a "reasonable" approximation of the spectral envelope shape can be perceptually equivalent, provided the time-domain characteristics (determined by resonance bandwidths and phases) are also sufficiently similar perceptually. The violin body response is perceived under a variety of playing conditions, so it is difficult to specify a single, overall characterization of how it is perceived. For example, an individual slip of the string under the bow generates a fairly discrete pulse which excites all modes of the violin body in a manner similar to the tap used to create the data in Fig. l(a). Normal Helmholtz motion results in a series of such "tap responses" superimposed at all integer multiples of the pitch period (see, e.g., [Smith, 1993, Smith, 1997] for how this fact can be exploited in synthesis algorithm design). The most salient effect of the violin body is to set an "EQ" for the tone produced by the vibrating string. The uniform roll-off in the spectral envelope of a bowed-string waveform is given the characteristic magnitude profile displayed in Fig. 1(b). Another way body resonances affect the sound is by inducing overtone modulation in the presence of vibrato. Vibrato causes each high-frequency harmonic of the tone to sweep through many modal resonances, thus causing those harmonics to modulate in amplitude and spatial origin. Since the violin body is a 3D object, it puts out a rather complicated radiation pattern relative to a point source. For highest quality results, the directional variations in frequency response may be desired. 4.1 Bark-Warped EDR When evaluating a synthesis model, it is helpful to use a representation for sound which corresponds well to what is sought. For this purpose, we use the Energy Decay Relief (EDR) [Jot, 1992] plotted over a Bark frequency axis [Smith and Abel, 1999], as shown in Fig. 2. In the analysis, a length 30 ms Hanning window was used with 50% overlap. This maximized frequency resolution while leaving sufficiently many analysis time frames at high frequencies. The Energy Decay Relief at time t and frequency w is defined as the sum of all remaining energy at that frequency from time t to infinity. It is a frequency-dependent generalization of Schroeder's Energy Decay Curve [Jot, 1992]. We prefer it over the more usual short-time Fourier power spectrum because it de-emphasizes beating decay envelopes due to closely tuned coupled modes (which occur often in acoustic measurements of resonating bodies). This facilitates estimating decay times for ensembles of resonators which are being characterized statistically. 4.2 Bark-Summed EDR At high frequencies, the greatly increased resonant-mode density forces us to deal with bands of high-frequency mode distributions. It is impractical in most cases to attempt to resolve individual high-frequency modes, because they are so heavily overlapped in frequency, with many modes occurring per mode-bandwidth. To conform to the bodyv6t Bark Warped Energy Decay Relief 40 -20 -0 -20 -40 -60 -80 -100 -1207 0;:;.............21 u10 15c (B 20 25 a30 Time (frames) Frequency (Barks) Figure 2: Bark-warped Energy Decay Relief of data in Fig. l(a). characteristics of human hearing, we form bands by summing the spectral power in each critical band of hearing, as defined by the Bark [Zwicker and Fastl, 1990] frequency scale.2 (Matlab software for this purpose may be found via [Smith and Abel, 1999].) The results of summing power in each critical band can be seen in Fig. 3. Within each band, statistically similar mode distributions can be expected to sound musically equivalent. bodyv6t Summed EDR Time (frames) Band Centers (kHz) Figure 3: Fig. l(a). Bark-Summed Energy Decay Relief of data in 5 SYNTHESIS MODEL Our body model is attached to the waveguide bowed string model described in [Serafin et al., 1999]. In the body model, second-order resonant filters model the first 13 resonances (up to about 3200Hz), and a waveguide mesh 2A more accurate choice would be the ERB scale [Moore, 1997], and we plan to compare results for the ERB in later work. The Bark scale can be viewed as a scale based on slightly overestimated critical bandwidths.

Page  00000003 [Van Duyne and Smith, 1993] is used to approximate the dense modes of the violin body at higher frequencies. The second-order filters simulate primarily wood modes, and the mesh simulates more air modes than wood modes. As a result, a 3D waveguide mesh provides the most accurate asymptotic mode density. However, we are interested also whether the high-frequency modes of a 2D mesh may be sufficient psychoacoustically. The bridge velocity calculated by the bowed string model is fed to the resonant filters and waveguide mesh in parallel, and their outputs are added, as shown in Fig. 4. OUTPUT vob = string velocity at the bridge Figure 4: Model Structure. 6 MODEL IDENTIFICATION First, the low-frequency resonances are identified and extracted from the measured violin-body frequency response (see, e.g., [Karjalainen and Smith, 1996]). The residual impulse response then becomes the target of a waveguide mesh design. 6.1 Waveguide Mesh Design The goal of mesh design is to find a mesh having an impulse response which sounds identical to the high-frequency residual obtained after removing the deterministic lowfrequency resonators. Within each band, we wish to match statistics of the mesh response to those of the violin body response. We desire that they be "musically equivalent" based on psychoacoustic principles. The relevant parameters in each band include mode spacing (in frequency), bandwidth (decay time), amplitude, and phase. These parameters may be characterized statistically in a variety of ways, such as by a simple average (mean) value in each critical band, or as a per-band distribution having, e.g., a mean and variance. There appears to be very little literature available characterizing how faithfully such distributions should be preserved in order to retain psychoacoustic equivalence. We sidestep this issue to some extent by choosing a model structure which inherently provides similar statistical mode behavior, as discussed further below. 6.2 Choice of Mesh Type There several mesh types available, ranging from the original rectilinear mesh [Van Duyne and Smith, 1993], which is relatively simple to program, to the tetrahedral mesh [Van Duyne and Smith, 1996], which is most efficient in 3D, or the warped triangular mesh [Savioja and Vilimiki, 2000], which is most accurate at high frequencies. In our project, we chose the simple rectilinear mesh on the grounds that we only need a psychoacoustically equivalent distribution of high-frequency modes, so that the greater accuracy of the triangular and warped meshes did not justify the additional programming overhead. 6.3 Choice of Mesh Geometry By starting with mesh dimensions comparable to those of a real violin body, we may expect it to spontaneously have a similar mode spacing at high frequencies where the air modes dominate [Cremer, 1984, Hirschberg et al., 1995]. Additionally, the mode phase should be sufficiently randomized in any such mesh that it should not be necessary to worry about explicitly setting it. The fine details of mode amplitude distribution within a band are similarly taken to be implied by the mode frequency and bandwidth distribution; in other words, the within-band amplitude distribution is taken to be the natural amplitude fluctuation obtained when summing a set of identical modes at center frequencies chosen randomly according to the appropriate distribution. We therefore expect the choice of proper mesh geometry to take care of this as well at high frequencies. For greater physical realism, the 3D "air-mode mesh" can be affixed to a 2D "wooden plate" mesh which is designed to act as a "stiffened membrane". In fact, body modeling with a waveguide mesh is upwards compatible with any degree of desired modeling accuracy, including natural boundary contours, f-holes, bass bars, sound post, and so on. However, in our study, we chose to use the mesh more as a "statistical spectral modeling tool" than as a physical modeling tool. Our 3D waveguide mesh is initially chosen to correspond to a physical box with dimensions 35.5 x 21.0 x 3.0 cm. For a sampling rate of 44.1 kHz, assuming a soundspeed c = 344 m/s, these dimensions translate to a rectangular mesh that is approximately 26 x 16 x 2 samples along each edge. (Using the results of the von Neumann stability analysis [Van Duyne and Smith, 1993], a physical length d is converted to samples using the formula Nd = dfs/(cv3) for the 3D rectilinear mesh. For the 2D rectilinear mesh, the correction factor is /2, etc.) In keeping with common practice in reverberator design, we prefer to use edge lengths which are mutually prime, since that tends to make the modal distribution more uniform. Therefore, we prefer to modify (26, 16, 2) to (27, 17, 2), or (25, 17, 2), etc. At a sampling rate of 22.05 kHz, the desired physical dimensions translate to the mesh dimensions 13 x 8 x 1 samples (35.9 x 20.3 x 3.1 cm). With only 1 sample in the z direction, we expect that a 2D mesh might do pretty well. Designing explicitly for 2D (using v/2 in place of v3) gives 16 x 9 samples (35.3 x 19.9 cm). Using larger meshes should decrease the crossover frequency where the modes become "sufficiently dense" in frequency. 6.4 Damping Determination We need to introduce lowpass filters into the mesh so as to obtain the correct average decay time in each frequency band. This is analogous to setting the reverberation time of the mesh as a function of frequency. Our current implemen tation uses a one-pole reflection filter H(z) = g/(1-pz 1)

Page  00000004 at each boundary point of the mesh, i.e., at every node along its boundary faces. Each "wall filter" is introduced as the reflectance of a unit-delay waveguide attached to each boundary node. Its wave impedance is the same as all others in the mesh. Thus, a signal is extracted from the node, delayed one sample, passed through the one-pole filter, and finally summed back into the node according to the usual rules of lossless scattering at the node [Van Duyne and Smith, 1993]. We plan to investigate sparser filtering, but the case of "uniformly absorbing walls" has interesting parallels in reverberator design which we are still exploring. For example, it is well known that in large, irregular rooms, the reverberation time is proportional to the volume of the room divided by the total absorbing surface area (times the log of the absorption coefficient). We hope to use such formulas to compute more precisely the damping filters which yield a desired decay time in each band. Preliminary experiments indicate that the classical formulas can predict decay time from filter gain to within approximately 10%, and the error appears to be somewhat systematic, therefore amenable to further empirical correction. After the absorbing boundaries have been determined as accurately as possible, we next hope to find useful rules for using fewer, stronger lowpass filters while retaining the desired modal decay times in each band. 7 CONCLUSIONS In this paper, we proposed a hybrid model of the violin body which models low-frequency body resonances with second-order filters and high-frequency body resonances with a waveguide mesh. The model is relatively inexpensive for the number of resonating modes it provides, making it attractive for high quality bowed-string synthesis applications. The mesh component is especially efficient when implemented in custom VLSI. The general technique can be used to develop synthesis models for other stringed instruments, and any instrument having a complex resonator and a nonlinear excitation which inhibits the use of commuted synthesis. References [Cremer, 1984] Cremer, L. (1984). The Physics of the Violin. MIT Press, Cambridge, MA. [Hirschberg et al., 1995] Hirschberg, A., Kergomard, J., and Weinreich, G., editors (1995). Mechanics of Musical Instruments. Springer-Verlag, Berlin. [Hutchins, 1998] Hutchins, C. M. (1998). The air and wood modes of the violin. J. Audio Eng. Soc., 46(9):751-765. [Jot, 1992] Jot, J.-M. (1992). An analysis/synthesis approach to real-time artificial reverberation. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing, San Francisco, pages II.221-II.224, New York. IEEE Press. [Karjalainen and Smith, 1996] Karjalainen, M. and Smith, J. 0. (1996). Body modeling techniques for string instrument synthesis. In Proc. 1996 Int. Computer Music Conf., Hong Kong, pages 232-239. Computer Music Association. [Karjalainen et al., 1993] Karjalainen, M., Vilimiki, V., and Janosy, Z. (1993). Towards high-quality sound synthesis of the guitar and string instruments. In Proc. 1993 Int. Computer Music Conf., Tokyo, pages 56-63. Computer Music Association. Available online at http://www.acoustics.hut.fi/~vpv/publications/icmc93 -guitar.htm. [Moore, 1997] Moore, B. C. J. (1997). An Introduction to the Psychology of Hearing. Academic Press, New York. [Savioja and Vilimaiki, 2000] Savioja, L. and Vilimiki, V. (2000). Reducing the dispersion error in th digital waveguide mesh using interpolation and frequency-warping techniques. IEEE Trans. Speech and Audio Processing, pages 184-194. [Serafin et al., 1999] Serafin, S., Smith, III, J. 0., and Woodhouse, J. (1999). An investigation of the impact of torsion waves and friction characteristics on the playability of virtual bowed strings. In Proc. IEEE Workshop on Appl. Signal Processing to Audio and Acoustics, New Paltz, NY, New York. IEEE Press. [Smith, 1993] Smith, J. 0. (1993). Efficient synthesis of stringed musical instruments. In Proc. 1993 Int. Computer Music Conf., Tokyo, pages 64-71. Computer Music Association. available online at http://wwwccrma. stanford.edu/~j os/cs/. [Smith, 1997] Smith, J. 0. (1997). Nonlinear commuted synthesis of bowed strings. In Proc. 1997 Int. Computer Music Conf., Greece. Computer Music Association. available online at http://wwwccrma. stanford.edu/~j os/ncbs/. [Smith and Abel, 1999] Smith, J. O. and Abel, J. S. (1999). Bark and ERB bilinear transforms. IEEE Trans. Speech and Audio Processing, pages 697-708. Matlab code for the main figures are available online at http://wwwccrma. stanford.edu/~j os/bbt/. [Van Duyne and Smith, 1993] Van Duyne, S. A. and Smith, J. 0. (1993). Physical modeling with the 2-D digital waveguide mesh. In Proc. 1993 Int. Computer Music Conf., Tokyo, pages 40-47. Computer Music Association. [Van Duyne and Smith, 1996] Van Duyne, S. A. and Smith, J. 0. (1996). The 3D tetrahedral digital waveguide mesh with musical applications. In Proc. 1996 Int. Computer Music Conf., Hong Kong. Computer Music Association. [Zwicker and Fastl, 1990] Zwicker, E. and Fastl, H. (1990). Psychoacoustics, Facts and Models. Springer Verlag, Berlin. see also later 1999 edition.