Page  00000001 New techniques to model reverberant instrument body responses Henri Penttinen, Matti Karjalainen, Tuomas Paatero, Hanna Jiirvelkinen Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology email: Abstract In this study we introduce new methods to design and realize computationally efficient models for musical instrument bodies. The methods take into account the time-frequency resolution of the human auditory system. First we present a method where the body response (or the low-frequency-part of it) is represented as a single digital filter structure which is known as a Kautz filter Secondly we divide an instrument body response into low- and high-frequency parts and consider how to implement these sections more efficiently. The low-frequency region is modeled as separate modal resonances. The high-frequency part can be implemented as a reverb algorithm that is designed to match the target response both spectrally and by temporal decay based on auditory resolution criteria. The crossover frequency and the needed spectral and temporal accuracy is determined by formal listening tests. The methods are examined through case studies with guitar and violin responses. 1 Introduction Commuted waveguide synthesis (Smith 1992) of string instruments is an efficient yet a high quality technique, because it avoids explicit modeling of the body by consolidating its response with an excitation waveform. However, since available processor power is increasing the advantages of explicit body models such as parametric controllability, start to draw attention to advanced body modeling. Digital filters have been successfully applied to modeling the body of string instruments (Mathews and Kohut 1973), (Smith 1983), and (Karjalainen and Smith 1996). A model, based on acoustical measurements, can be implemented for example as an FIR structure, but in this case the order of the FIR filter that produces an accurate model is typically from 4000 to 10000 or even higher. Although, filter orders of 1000 - 2000 already yield useful results. The computational load can be reduced by using other filter structures such as IIR filters and frequency warped structures (Schtissler and Winkelnkemper 1970), (Strube 1980), (Hirmai et al. 2000). Furthermore, plausible IIR filter design techniques include linear prediction (LP) (Markel and Gray 1976) and the Prony's method (Parks and Burrus 1987), which can also be frequency warped (Karjalainen et al. 2000). Frequency warping enables modeling of the human auditory system, and consequently allows the order of the body model to be reduced. Also, waveguide meshes have been used for the high-frequency region (Huang, Serafin, and Smith 2000). Moreover, Rocchesso mentions in (Rocchesso 1993) the possibility of using solely reverberation algorithms to model the characteristics of an instrument body. It is possible to tune a reverberator with short delay lines to act as a sparsely resonating system. However, with currently available reverb algorithms it is impossible to tune a single resonance to have a desired frequency, bandwidth, and magnitude, without effecting the other resonances. Therefore, it is very difficult if not even impossible to achieve perceptual equivalence (transparency) between a complex resonating system, e.g., a body model, and a reverb algorithm. In contrast, one of the methods introduced in this study proposes a practical way to obtain a more or less perceptually transparent instrument body model by using reverb algorithms as a part of the complete body model. The computational burden is often critical in real-time applications implemented, e.g., on a DSP processor. Also, the number of coefficients used to represent a model is crucial while transferring them over the Internet and especially in wireless applications. In this case, it is also very useful for the user of a body model to obtain control over parameters that have a physical significance. This study introduces new methods to reduce both the computational load and the number of parameters used to describe instrument body models. Furthermore, it yields deeper understanding of the body behavior and its perceptual properties. 2 Modeling of Instrument Bodies with Kautz Filters Kautz filters (Kautz 1954), (Paatero and Karjalainen 2001) are fixed-pole IIR filters organized structurally to produce orthonormal tap-output impulse responses. The linear-in-parameter model defined by the transversal Kautz filter can be seen as a generalization of FIR and Laguerre filter structures, providing IIR-like spectral modeling capabilities with well-known favorable properties resulting from the orthonormality. Kautz filter design is a two-step procedure involving the selection of poles (with possible multiplicity) and assigning of the tapoutput weights. For the latter, we use the orthonormal (Fourier) expansion coefficients, which are easily attained, and are the solution of the corresponding least square (LS) problem. By choosing the poles we incorporate a priori information to the model, e.g., knowledge of system poles, resonant frequencies and corresponding timeconstants, or desired allocation of the modeling resolution. In this case of instrument body modeling Kautz filters are utilized for two purposes: (a) to approximate the whole response and

Page  00000002 3 Two-part Body Modeling M ZU 0-~ -20 - -40 102 103 104 Frequency / Hz Figure 1: Magnitude responses of body simulation filters: original model (bottom) and Kautz model of order 220 (top). The vertical lines (top in figure) indicate the positions of the poles. (b) to model accurately the low frequency range. For both tasks, it is relatively easy to find good pole sets by direct selection of prominent resonances and proper pole radius tuning. We have also adopted a method, proposed originally to pure FIR-to-IIR filter conversion, to the optimization of Kautz filter poles (Paatero and Karjalainen 2001). Furthermore, we also take into account the human auditory system's frequency resolution by frequency warping the body responses. A good representation of the whole resonance structure is typically achieved for Kautz filter orders 160-240, and detailed models for the low frequency response are produced with Kautz filter orders of 60-150. The order is dependent on the mode density of the body model, so that the higher the mode density the larger the order of the model filter. In addition, a response with a relatively flat magnitude envelope at highfrequencies requires a larger order for the Kautz filter than a response with a low-pass characteristic. Fig. 1 shows the magnitude responses of an original measured acoustic guitar body model and the respective Kautz filter model of order 220. The Kautz model was created by selecting poles from two Kautz filters that model the whole response. First a warped model (order = 160) with a warping parameter value A=0.4 and one without warping (order = 200) were created. The warped model has a better frequency resolution at lower frequencies than the unwarped one and vice versa. Thus, combining these two models gives a more accurate overall result. The combined filter has 52 poles from the warped model, for frequencies below 1 kHz, and 168 poles from the unwarped filter, for frequencies above 1 kHz. This Kautz filter produces a model of very good quality up to high frequencies. An observant listener may detect a slight difference in the impulse responses, but usually all applications are related to playing where the difference is practically inaudible. The number of arithmetic operations of a Kautz filter, compared to a straightforward FIR implementation, is approximately 7 to 2. On the other hand, a lower filter order will model a response with good accuracy when compared to an FIR implementation. Therefore, for instance a Kautz filter of order 220 is over 5 times more efficient than a 4000-tap FIR filter. MATLAB scripts and demos related to Kautz filter design can be found at ht tp: / / www. acou stics. hut. fi/software. In two-part body modeling the low- and high-frequency regions are modeled separately. In a reverberant body model the exact position and magnitude of a resonance is not as important as their decay times and distances from each other (Mathews and Kohut 1973). Therefore, the high-frequency range can be modeled in a more statistical fashion, while the low-frequency range is modeled more precisely. In two-part modeling the low- and high-frequency models can be summed after low- and high-pass filtering, respectively. In this work the low- and high-pass characteristics have been embedded to the respective models and are discussed in sections 3.2 and 3.3 in more detail. In this paper, the crossover frequency between the two parts is studied through listening tests. 3.1 Determining the crossover frequency through perceptual evaluation The perceptual quality of a instrument body response was studied by two formal listening experiments (AB- and scaling test). The purpose was to determine a crossover frequency, f, between the low- and high-frequency regions, so that the original body response cannot be separated from a model. In the experiments a noise model was used where the high-frequency region was defined by exponentially decaying noise bands that had band-specific decay rates which were determined from the original body response. The lowfrequency region was a low-passed version of the original response. The standard tones, which were compared to the noise models, were produced by using measured body responses. As for the noise models crossover frequencies of 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, and 5.0 kHz were tested. Pure impulse responses of the guitar and the violin were tested as well as short samples of playing filtered with impulse responses. The bandwidth of the filter bank was fixed to 1/3 of an octave. Six subjects participated in experiment one and four in experiment two. The objective of the first experiment was to find such a crossover frequency that the noise model response could not be separated from the original. The first experiment was a same-different task (AB-test). Each stimulus trial was judged four times along with as many corresponding fake trials (two standard tones). A threshold was found for the crossover frequency, expressed as f, required for 75 % correct answers (Yost 1994). In the second experiment, the task was to grade the quality of each stimulus compared to the standard tone on a scale from 1.0 to 5.0. Each trial was repeated three times. Fig. 2 shows the detection thresholds for each excitation signal. For the impulse responses of both the violin and the guitar, the thresholds are around f, = 3.5 kHz. For the playing conditions the thresholds are between 2 kHz and 2.5 kHz, indicating that the modeled response is harder to detect in a more practical context. The scaling experiment for the guitar shows the same trend, see Fig. 2. The playing case was considered better quality than the impulse response case almost throughout the studied range. An interesting detail is the degradation of the quality judgments for 5 kHz. For 4 klHz the quality was judged as equal to the reference, while for 5k1Hz the judgments were lower and almost equal to the 3 kHz case. In ad

Page  00000003 6000 S5000 = 4000 ---- ------. 1 3000 2000. > 1000 -........ Gui Imp Vio Imp Gui Play Vio Play 15 - o - g 2 1.... - '..................... i...................................... ba4 _.-........ 3. - - A 2 -& -.... 1000 2000 3000 4000 5000 REF Crossover frequency [Hz] Figure 2: Listening test results for determining the crossover frequency. Top pane: AB-test results where y-axis represents the crossover frequency, fc, at threshold and x-axis the excitation signal. The box plot shows the median, and the upper and lower quantiles of the individual thresholds. Bottom pane: Scaling test results for guitar impulse responses (x), and playing (o) where y-axis shows the mean quality score and x-axis is the crossover frequency. The quality scores for the reference signals are also depicted on the righthand side. dition, the crossover frequency for the used violin body was found to be slightly higher in frequency. This can be explained through the physical behavior and dimensions of the bodies: The used guitar body is larger in every dimension compared to the used violin body. Therefore, the violin body reaches a certain mode density at a higher frequency than the guitar. The usage of the noise model in the listening test is justifiable in two ways. (I) The output of the test will give a guideline for a frequency-dependent decay resolution whereafter the auditory system does not recognize single modes in a reverberant body model. (II) The crossover frequency obtained from the listening test works as good guideline for the final implementation stage, where a reverb algorithm is used for the high-frequency modeling. Thus, the behaviour of a reverb algorithm can be adjusted with the assistance of the results. 3.2 Low-frequency modeling In the divided model the low frequencies can be represented with any of the methods mentioned in the introduction. In this study the low end is modeled with Kautz filters since the used filter design technique allows easy and accurate modeling. The recursive structure is able to model the slowly decaying body modes with a small computational load. To obtain a low-frequency part for the two-part model first a Kautz filter modeling the whole response was designed. Then, to acquire the low-pass characteristics of the low-end model a suitable amount of poles were removed from high frequencies. The amount of removed poles is determined through the desired crossover frequency. Another method is to model a lowpass filtered response and remove high frequency poles in the same manner as before. The order of a low-end Kautz filter ranges from 60 to 150, when the crossover frequency ranges from 2 kHz to 4 kHz. 3.3 High-frequency modeling The spectral and temporal behaviour of a body model at highfrequencies has a more reverberant characteristic than the low-frequency region. Therefore, it is well-founded and plausible to apply a reverb algorithm (Gardner 1998) to model the high-frequency behaviour of an instrument body. The reverb algorithm used in this study is a modification of the feedback delay network (Jot and Chaigne 1991). The modified version (Vdininen et al. 1997) has an allpass section cascaded with the lowpass filters. Therefore, the feedback matrix can be replaced with a single feedback loop. The modified feedback delay network is used because of its computational efficiency, performance and scalability. The overall magnitude envelope of the high-frequency model is set with a cascaded low-order filter, which can easily be designed e.g. with linear predictive (LP) techniques. First, by high-pass filtering the original impulse response and then by creating an LP-model of this response a magnitude envelope estimate is obtained. This magnitude envelope estimate embeds the desired high-pass characteristics and is used to filter the output of a reverberator. In real body response measurement environments a poor signalto-noise ratio (SNR) can sometimes be a problem. The proposed model improves the SNR when the two-part model is compared to a measured body response. Since the model does not include the noise floor of the original response it rather models the exponential decay only. 3.3.1 Tuning of Body Reverberators With the guidelines obtained from a listening test, reverb algorithms can be tuned to match the required frequency-dependent decay and response level. These parameters can be controlled in good reverb algorithms. The frequency-dependent decay is tuned via the ratio, a, of the decay times at 0 Hz and at Nyquist frequency (Jot and Chaigne 1991). The frequency-dependent decay is implemented by using low-pass filters cascaded with the delay lines. To obtain the correct ratio, a, for the two-part model the ratio from the original impulse cannot directly be used. Due to that, it would lead to an incorrect ratio because the lowest body modes decay much slower than those at, e.g., 2 - 3 kHz. To obtain a correct ratio for the two-part model, first the decay times at crossover and at Nyquist frequencies are calculated. Then, with the help of these values the ratio between 0 Hz and Nyquist frequency is estimated. The length of the delay lines are determined through the decay time at 0 Hz so that the sum of the delay line lengths is at least about a quarter of the decay time at 0 Hz. The number of delay lines and their lengths should also meet the required modal density of the modeled response. In addition, the shortest delay line is tuned so that the output of the two-part model has its maximum peak level aligned with the original response. Moreover, to obtain an evenly decaying output the lengths of the delay lines are primes.

Page  00000004 0-- 20 0 220 0 2 4 6/ 10 0.3 Tm 1 2 40.3 Time Frequency (kHz) Frequency (kHz) 10 (s) Figure 3: Spectrograms of a) an original body response and b) a two-part body model where f,=4 kHz. 3.3.2 Experiments In this study two-part body models were created by using Kautz filters for the low-frequency region and a modified FDN for the highfrequency region. Fig. 3 shows the spectrograms of (a) an original guitar body model and (b) a two-part model, with a crossover frequency of 4 kHz. The order of the Kautz filter is 96 and the value of the warping parameter A = 0.4. The reverberator is a modified FDN with four delay lines, the decay time at 0 Hz is 70ms, a = 0.2, and feedback gains of the all-pass sections are 0.3. The delay line lengths in samples are 53, 67, 79, 97, and the order of the IIR magnitude envelope model is 20. The two-part model shown in Fig. 3 is a useful body model that is computationally efficient. When the two-part model is compared to a 4000-tap FIR filter it is over 7 times more efficient. 4 Conclusions The proposed methods enable to produce accurate instrument body models in a computationally efficient way. The models are able to reproduce the time-frequency behavior of resonant instrument body properly. A Kautz model, comprised of two Kautz filters with different amount of warping, models the whole frequency range effectively. In addition to instrument body simulation, the two-part model enables to tune and change separately the two frequency regions. This permits the study of the psychoacoustical properties of a body model by changing them independently. The conducted listening tests show that differences between an original body response and its model can more easily be detected by listening to the pure impulses than by listening to responses related to playing. Fortunately, that is not a problem since practical applications are usually related to playing and playing is all that matters. 5 Acknowledgments The financial support of Pythagoras graduate school, Nokia Research Center, and the Academy of Finland (Sound source modeling g project) is gratefully acknowledged. References Gardner, W. G. (1998). Reverberation algorithms. In M. Kahrs and K. Brandenburg (Eds.), Applications of Digital Signal Processing to Audio and Acoustics, Chapter 3, pp. 85-131. Kluwer. Hirmd, A., M. Karjalainen, L. Savioja, V. Valimaki, U. K. Laine, and J. Huopaniemi (2000, November). Frequencywarped signal processing for audio applications. J. Audio Eng. Soc. 48(11), 1011-1031. Huang, P., S. Serafin, and J. Smith (2000, December). Modeling high-frequency modes of complex resonators using a waveguide mesh. In Proc. COST-G6 Conf Digital Audio Effects (DAFx'00), Verona, Italy, pp. 269-272. Jot, J. M. and A. Chaigne (1991). Digital delay networks for designing artificial reverberators. In 90th AES Convention, Paris, France. Karjalainen, M. and J. 0. Smith (1996). Body modeling techniques for string instrument synthesis. In Proc. Int. Computer Music Conf, Hong Kong, pp. 232-239. Karjalainen, M., V. Vilimiki, H. Penttinen, and H. Saastamoinen (2000, December). Dsp equalization of electret film pickup for the acoustic guitar. J. Audio Eng. Soc. 48(12), 1183-1193. Kautz, W. H. (1954). Transient synthesis in the time domain. IRE Trans. Circuit Theory CT-1, 29-39. Markel, J. D. and J. A. H. Gray (1976). Linear Prediction of Speech. Berlin: Springer. Mathews, M. and J. Kohut (1973). Electronic simulation of violin resonances. Journal of the Acoustical Society of America 53(6), 1620-1626. Paatero, T. and M. Karjalainen (2001). Kautz filters and generalized frequency resolutiontheory and audio applications. In 110th AES Convention, Amsterdam, The Netherlands. Parks, T. W. and C. S. Burrus (1987). Digital Filter Design. New York: Wiley. Rabiner, L. and C. M. Rader (Eds.) (1972). Digital Signal Processing. Selected Reprint Series. New York: IEEE Press. Rocchesso, D. (1993). Multiple feedback delay networks for sound processing. In X Colloquio di Informatica Musicale, Milano, Italy, pp. 202-209. Schtissler, W. and W. Winkelnkemper (1970). Variable digital filters. Arch. Elek. Ubertragung 24, 524-525. Reprinted in (Rabiner and Rader 1972). Smith, J. 0. (1983). Techniques for Digital Filter Design and System Identification with Application to the Violin. Ph. D. thesis, Stanford University. Smith, J. 0. (1992). Physical modeling using digital waveguides. Computer Music Journal 16(4), 74-87. Strube, H. W. (1980). Linear prediction on a warped frequency scale. Journal of the Acoustical Society of America 68(4), 1071-1076. Vaininen, R., V. Vilimiki, J. Huopaniemi, and M. Karjalainen (1997). Efficient and parametric reverberator for room acoustics modeling. In Proc. Int. Computer Music Conference (ICMC'97), Thessaloniki, Greece, pp. 200-203. Yost, W. A. (1994). Fundamentals of Hearing - An Introduction (3rd ed.). New York: Academic Press.