Page  525 ï~~INTEGRATION OF PHYSICAL MODELING FOR SYNTHESIS AND ANIMATION Perry R. Cook Stanford CCRMA ABSTRACT: In this project, physical models of a flute are used for both sound synthesis and animation. An exhaustive waveguide flute simulation is used for sound synthesis, including a full tone-hole lattice and filters, coupled vortex noise modeling in the jet simulation, and simple inertial and random models of the individual fingers of the flute player. By combining the waveguide flute synthesis model with a numerical specification of the flute dimensions, ray-tracing animation can be combined with waveguide synthesis to create the visual and sonic experience of "driving" around outside and inside the flute. The result of the project is a 90 second short video entitled "Drive By Fluting." 1. The Synthesis Model In modeling the flute bore as a series of cylindrical waveguides (Smith87) with tonehole junctions, various methods have been proposed for modeling the fractional delay required to place the toneholes at arbitrary positions along the bore at finite sampling rates. Allpass delay interpolation (Jaffe et al. 83) provides flat frequency response at the cost of some undesired phase properties at high frequencies, and some increased complexity in modeling time varying delay. Lagrange interpolation and deinterpolation (Valimaki et al. 93) provides guaranteed phase response at the cost of attenuation in high frequencies. The flute model in this project uses a total of 8 toneholes, 9 waveguide tubing sections requiring two delay lines each, and one additional delay to model the jet propagation effects(Karjalainen9l)(Cook92). This results in a total of 19 interpolated delay lines. Figure 1 shows the worst-case gain attenuation as a function of frequency at 1/2 sample of delay when using Lagrange interpolation, for sampling rates of 22 and 44kHz. Figure 2 shows the worst-case tuning error (assuming Total Delay = Sampling Rate / Fundamental Frequency) as a function of frequency at 1/2 sample of delay, which occurs in using all-pass interpolation, again at 2 sampling rates. The selection is a tradeoff between getting the model to oscillate (gain sensitivity), and play in tune (phase sensitivity). For the model of this study running at 22kHz. sampling rate, and with primarily non-time varying delays, allpass interpolation was selected. Lagrange interpolation could likely perform well if a high order interpolator were used at 44.1 or higher sampling rates. Lagrange Interpolation Alipass Interpolation Gain Error (dB) Tuning Error (cents) 0,0 -2 -10 -4 -40 -10. -50 2000 4000 6000 8000 Freq. (Hz.) 2000 4000 6000 8000 Freq. (Hz.) Figure 1 Worst case gain curves as Figure 2 Worst case tuning error as a function offrequency for 19 delay lines a function offrequency for 19 delay lines at 22kHz and 44kHz sampling rates. at 22kHz and 44kHz sampling rates. Toneholes can be viewed as three-way waveguide scattering junctions, with appropriate filters to model the low-pass reflection and high-pass transmission properties of an open tonehole. Given some assumptions, such as a fixed observation (microphone, listener) point, that all significant radiation comes from the jet and the first open tonehole (Huopaniemi et al 94), and that all toneholes are roughly the same size when open, reductions in complexity are possible. By further assuming that the reflection filter of the first open tonehole is substantially similar to the reflection filter of the open end of the flute, a single I C MC P RO C E E D I N G S 199552 525

Page  526 ï~~commuted filter can be placed at only one point in the model, near the excitation end. Given that the focus of this project is to be able to place a virtual microphone at an arbitrary point inside or outside the flute, no such reductions were exploited. The jet model includes a time-domain simulation of coupled-pulsating turbulence (Chafe95), to more accurately model the noise components of the flute. It is interesting to note here that this coupled-noise model causes the instrument to 'speak' more quickly and reliably. Figure 3 shows the complete sound synthesis model in block diagram form. One more new component in this model is the addition of low-order filters to model the inertial characteristics of the flute player's fingers (Cook95). In this project, these are modeled by one-pole filters whose pole positions are randomly selected individually to be between 0.99 and 0.9995 each time the fingering is changed. This way each performance is unique, taking a slightly different time for each 'virtual finger' to raise and lower in opening and closing each tone hole. A more elaborate model of the player' s fingers, and even entire hand, could be used in a more exhaustive simulation. For ease of interface and for constructing performances, the flute instrument/player combination program responds to solfege ("do", "re", "mi", etc.) messages automatically, setting the correct targets for the fingers based on a simplified recorder fingering chart placed in a lookup table. Figure 3 Block diagram of the flute model used for sound synthesis. Coaxing such a complex instrument to play at all, let alone play in tune, is a very difficult task. Genetic algorithms have been investigated for music synthesis, applied to FM and Wavetable selection (Homer et al. 93), and to a simplified flute model (Vuori et al. 93). These techniques were exploited in this project to explore the high-dimensional parameter space. The instrument in this study was initially adjusted from physical measurements and first principles, then a genetic algorithm was used to randomize parameters around the initial values, measure the success of such random adjustments (using simple criteria based on whether the instrument oscillated at significant power, and at roughly the correct pitch), select successful values, and iterate the process until suitable values were reached. 526 6ICMC PROCEEDINGS 1995

Page  527 ï~~The model yields outputs from the jet location, all toneholes, and the end. These outputs are, in reality, directional, and the radiation patterns are frequency dependent. It was not undertaken in this project to model these complex behaviors, and it is considered that this area would be the next logical direction to take in moving toward a more exhaustive model. Each output is placed in a stereo field by simple panning combined with appropriate time delay from each source to each virtual 'ear'. The perceptual results would be further improved by using a binaural model to 'place' the individual sources (Huopaniemi et al. 94). II. The Animation Model and Specification Measurements from a bamboo flute were used to construct a 3 dimensional specification for animation. Logical intersections of cylinders were used to construct the basic bore and tone hole chimneys. A hemispherical section was used for the end cap. Two other graphical objects were constructed, modeling a pair of lips and one finger. These objects are not modeled in the acoustical simulations, but are included to add variety and interest to the visual display. The resultant file is an industry standard DXF file. Figure 4 shows one frame containing all of the objects. Although it was not done in this project, some animation software packages are capable of kinematic modeling, and this feature could be exploited to compute the inertial trajectories affecting the speed at which the fingers close the toneholes.........'".............."" '."..".''" ".".Â~."." "." " 1 " '"".,..'...,."....".." " " i ' " "' ' - ' ' -.:..........................:......":........,,..................,...,... -..- - - -' i -' i- '.. >-..'.-::,i,,i,, ':,i,-:it i i::, i;:..., - -.7.v.::.......,:\. i:...1. va.. t ====== ====== "r==:::::: {;:::::::::::it{:ti t"::::::-"::: }'::::";::.':"..".":"{":...}.'"::Â~..-"-.:.".:}":: R........Â~:.......... an redrnswropt. tuicatelcio s.4hrepatcaoan hreflts:r:ue"i h '. '.t": tt }: {... }:vvti:..."......\....,1 "....:":i"}::\":".\. ".......'" final p t Tisica i can'..'sc a.nihin'igure 5, and t basic storyboard is: show in Fi ti.'v........ -..... = - i: Figure 4 Greyscale rendering of the flute animation model, including a pair of lips and one finger. Key frames were specified corresponding to significant events from a storyboard, and 3D ray tracing and renderings were computed. The musical selection is a three part canon, and three flutes are used in the final production. The musical canon is shown in Figure 5, and the basic storyboard is shown in Figure 6. The storyline begins with a distant shot of the three flutes, then we (the observer) zoom in to the second flute, then closer to the 5th tonehole. An attempt is made to enter the flute via the tonehole, but an interfering finger is encountered and we are pushed away by the collision. We pan around the flute end and zoom to the lips of the player, where we find ourselves sucked into the vocal tract of the flutist, but are are promptly coughed out to a very distant observation point. We zoom in once again, realizing that we can enter the flute via the hole at the end. We drive along inside the flute to the position of the 2nd tonehole, then we exit and zoom away to infinity as the canon ends. Figure 5 Musical canon used for animation soundscore. III. Future Directions It is likely that today on existing (albeit expensive) hardware, this entire simulation could run in real time under user control. It is a certainty that the model as implemented in this project would run in real time on future, less expensive, hardware/software systems. There is much that could be added to the model to make it more realistic, both in visual and aural domains. In a real-time system, a 3D joystick or control IC M C P ROCEE D I N G S 199552 527

Page  528 ï~~glove using standard Virtual Reality control gestures could be used to fly around the virtual flute world at will. Better animation techniques and software are obvious places for improvement. The next planned project involves a brass instrument based on the TBone model (Cook91). Figure 6 Animation storyboard. Keyframe times are specified in measure numbers. IV. References (Chafe95) (Cook9 1) (Cook92) (Cook95) (Homer et al. 93) (Huop. et al. 94) (Jaffe et al. 83) (Karjalainen9l) (Smith87) (Smith92) (Vali. et al. 93) (Vouri et al. 93) C. Chafe, "Adding Vortex Noise to Wind Instrument Physical Models," Proc. ICMC, Banff. P. Cook, "TBone: An Interactive WaveGuide Brass Instrument Synthesis Workbench for the NeXT Machine," Proc. ICMC, Montreal.. P. Cook, "A Meta-Wind-Instrument Physical Model, and a Meta-Controller for Real Time Performance Control," Proc. ICMC, San Jose. P. Cook, "A Hierarchical System for Controlling Synthesis by Physical Modeling," Proc. ICMC, Banff. A. Homer, J. Beauchamp, and L. Haken, "Wavetable and FM Matching Synthesis of Musical Instrument Tones," Proc. ICMC, San Jose. J. Huopaniemi, M. Karjalainen, V. Valimaki, and T. Houtilainen, "Virtual Instruments in Virtual Rooms - A Real-Time Binaural Room Simulation Environment for Physical Models of Musical Instruments," Proc. ICMC, Tokyo. D. Jaffe and J. Smith, "Extensions of the Karplus-Strong Plucked String Algorithm," Computer Music Journal, 7(2): pp. 56-69. M. Karjalainen, "Transmission Line Modeling and Real-Time Synthesis of String and Woodwind Instruments," Proc. ICMC, Montreal. J. Smith, "Musical Applications of Digital Waveguides," Stanford CCRMA, Report STAN-M-39. J. Smith, "Physical Modeling Using Digital Waveguides," Computer Music Journal, 16: 4, pp. 75-87. V. Valimaki, M. Karjalainen, and T. Laakso, "Modeling of Woodwind Bores with Finger Holes," Proc. ICMC, Tokyo. J. Vouri and V. Valimaki, "Parameter Estimation of Non-Linear Physical Models by Simulated Evolution-Application to the Flute Model," ICMC, Tokyo. 528 8ICMC PROCEEDINGS 1995