Page  00000222 Physically -based realtime synthesis of contact sounds Federico Avanzini Dipartimento di Elettronica ed Informatica Universita degli Studi di Padova Via Gradenigo 6/A, 35131 - Padova, Italy. avanzini@dei.unipd.it Matthias Rath, Davide Rocchesso Dipartimento di Informatica Universita degli Studi di Verona Strada Le Grazie, 37134 - Verona, Italy {rath;rocchesso}@sci.univr.it Abstract This paper describes an algorithm for real-time synthesis of contact sounds for interactive simulations and animation. The algorithm is derived from a physicallybased impact model, and the acoustic characteristics of colliding objects can be realistically simulated by properly adjusting the physical parameters of the model. A technique for describing the spatial dynamics of a resonating object is proposed, which allows simulation of position-dependent interaction. It is shown that the numerical implementation leads to an efficient sound synthesis module, that runs in real-time on low cost platforms. The effectiveness of the model is demonstrated, and its applications are discussed. 1 Introduction Sound has been recognized to be an effective chan- nel of information within human-computer interfaces. It can be used to convey alarm and warning messages, status and monitoring indicators,and encoded messages (Buxton 1990).It has been shown that the usability of interfaces can be significantly improved by adding carefully designed sounds to graphical menus, buttons and icons (Brewster and Crease 1999). In animation and interaction, non-speech sound plays an important role in integrating visual information. It affects the way the user perceives events,and gives the user a sense of presence and immersion in a synthetic environment. Research in audio for multimedia systems has traditionally focused on techniques related to auralization of environments. Properly designed reverberation and sound spatialization algorithms provide information on the size and shape of an environment,as well as the location of auditory events (Begault 1994).Far less attention has been devoted to the audio sources, and the interaction mechanisms that are involved in sound production. A great deal of interesting results about perception of sound sources is provided by experimental psychologists. Research in ecological acoustics shows that listeners typically tend to describe sounds in terms of the sound-producing events rather than acoustic parameters. Gaver (Gaver 1993) refers to this attitude as "everyday listening", and discusses how to use everyday sounds to synthesize auditory icons. To this end, it is necessary to identify those acoustic cues which convey information about specific physical properties of objects (such as size, shape, material)and their interactions. Recent research in sound modeling has exploited the above results. Klatzky et al. (Klatzky, Pai, and Krotov 2000) have shown that auditory information can be used in simulated contact with virtual objects, to elicit perception of material. Van den Doel et al. (van del Doel,Kry,and Pai 2001) describe models for audio rendering of collisions and continuous contact (friction and rolling).Convincing results are obtained, however the contact models used in these works do not fully rely on a physical description, and as a consequence the attack transients and the overall realism are affected.Moreover, due to the lack of physical description of contact forces,the control parameters of the sound models are not easily associated to physical dimensions. A fully physical approach has been adopted by O 'Brien et al.(0 'Brien, Cook, and Essl 2001), who have simulated the behavior of three-dimensional objects using a finite element method. Thecomputation is used for generating both visual and audio animation, 222

Page  00000223 hence a high degree of coherence and perceptual consistency can be achieved. On the other hand, finite elements have high computational costs and are possibly "too" accurate, i.e. the models take into account also those sound features that are not perceivable or relevant. An alternative approach to physical modeling amounts to first investigating what acoustic cues are significant for auditory perception, and designing the sound synthesis algorithms on the basis of such investigation. This way, only the important sound features are incorporated into the models, while less salient cues are discarded. This design process naturally leads to "cartoon" sound models, i.e. simplified models of real mechanisms and phenomena, in which some features are magnified. One advantage is that veridical audio output and physically based control are achieved using simple and computationally efficient synthesis algorithms. Using such a "cartoon" approach, we have developed a model in which a hammer and a resonator interact through a non-linear contact force. The model is controlled through physical parameters, such as impact velocity and position, and produces realistic impact sounds. Moreover, it allows control on salient acoustic cues: specifically, we have shown that the resonator model can be "tuned" to various materials (Avanzini and Rocchesso 2001 a), while the parameters of the non-linear force can be adjusted to different levels of "hardness" (Avanzini and Rocchesso 2001b). This paper addresses the issue of positiondependent interaction. Hitting an object at different positions results in different responses, since its resonances are excited to different extents. After reviewing in Sec. 2 the hammer-resonator system and the numerical implementation, in Sec. 3 we propose a strategy to integrate position-dependent interaction in the model with little computational overhead. Results are discussed in Sec. 4. 2 IMPACT MODEL 2.1 Hammer, resonator, interaction The resonator is modeled using modal synthesis techniques (Adrien 1991), where a resonating structure is described in terms of its normal modes. The state of the system is here written as the vector of its, generally infinitely many, modal states (as it would, in the spatial description be seen as the vector of states of spatial points). Modal description is in principle equivalent to a description in spatial variables: modal and spatial parameters are related through a linear basis transformation. Modal parameters, though, relate differently to human perception, which is of great importance in terms of implementation and especially hammer compression [m] x lo4 Figure 1: Compression/force characteristics. of simplification/abstraction. Each modal state w = (=.) follows a differential equation,Ij + rj3 j + kjxj = fj, (1) where rj > 0 and kj > 0 are the damping and the elastic constant of the jth mode, respectively, while fj is the sum of external forces on the mode. For sufficiently small damping (rj < 4kj), the impulse response hj (t) of system (1) is given by j (t) = hj(t) = e-t/t sin(wt). (2) The resonance frequency wj and the decay time tj are given by k = w + 1/t, r = 2/tj (3) Again, for sufficiently small damping the resonance frequency is approximated by wj wo A kj. In practice modes are always of finite number n, since the bandwidth of our ears as of any system of processing/reproduction is finite. The transformation from the mode states to a spatial state variable in a point P is then wp = E ap j Wj. Equivalently: n xp = ap jxj = apx' and,p = api', (4) j=1 where x = (l,...,Xn) and ap = (ap 1,...,ap n). In a similar way, a force f applied at a spatial point Q is distributed to the separate modes according to fj = aQj f, j =1,...n (5) The hammer is modeled as a lumped mass which moves freely except when it collides with the resonator. 1 t might be stated that the spatial description of an object rather refers to its visual appearance whereas modal properties have a closer relationship to auditory perception. 223

Page  00000224 Hence its position Xh is simply described by the equation mhxh = f, (6) where mh is the hammer mass. What is left is an equation which describes the interaction between the two objects. Hunt and Crossley (Hunt and Crossley 1975) proposed a model for the contact force between two colliding objects, under the hypothesis that the contact surface is small: f(x,) -k - Ax' x > U, x < 0 (7) where the value of the exponent a depends only on the local geometry around the contact surface. The variable x stands for the hammer compression, i.e. the difference between the resonator displacement and the hammer position. Therefore, when x > 0 the two objects are in contact. Marhefka and Orin (Marhefka and Orin 1999) have used this model for describing contact with the environment in dynamic simulations of robotic systems, and have shown that it provides realistic results. Figure 1 shows the penetration/force characteristics for a hammer hitting a hard surface with various impact velocities. Note that hysteresis occur, i.e. the paths during loading and unloading are different. This effect comes from the presence of a dissipative term in Eq. (7). 2.2 Properties When the system is discretized, the modal resonator appears as a parallel filter bank of N secondorder resonant lowpass filters, each accounting for one specific mode of the resonator. The filter parameters (center-frequency and quality factor) are accessed through the physical quantities rj, kj described above. Due to the non-linear nature of the contact force, computational problems occur in the numerical hammerresonator system. These can be handled by computing the contact force iteratively at each time step (Avanzini and Rocchesso 2001a; Avanzini and Rocchesso 2001b). The sound model has been tested in previous studies in order to assess its ability to convey perceptually relevant information to a listener. A study on materials (Avanzini and Rocchesso 2001a) has shown that the decay time is the most salient cue for material perception. This is very much in accordance with results by Klatzky et al. (Klatzky, Pai, and Krotov 2000); however, the physical model used here is advantageous over using a signal-based sound model as in (Klatzky, Pai, and Krotov 2000), in that more realistic attack transients are obtained. A study on hammer hardness (Avanzini and Rocchesso 2001b) has shown that the contact time (i.e. the time after which the hammer separates from the resonator) can be controlled using the physical parameters. Specifically the ratio mh/k is found to be the most relevant parameter in controlling -40 -50 -60 -70 -80 0 5000 10000 15000 frequency [Hz] Figure 2: Sound spectra obtained when hitting a resonator with a soft mallet (low mh/k) and with a hard hammer (high mh /k). contact time. Figure 2 shows an example of soft and hard impacts, obtained by varying mh/k. Due to the physical description of the contact force, realistic effects can be obtained from the model by properly adjusting the physical parameters. Figure 3(a) shows an example output from the model, in which the impact occur when the resonator is already oscillating: the interaction, and consequently the contact force profile, differs from the case when the resonator is not in motion before collision. This effect can not be simulated using pre-stored contact force profiles as in (van del Doel, Kry, and Pai 2001). Figure 3(b) shows an example of "hard collision". This has been obtained by giving a very high value to k in Eq. (7), while every other parameter of the model has the same values as in Fig. 3(a). It can be noticed that several micro-collisions take place during a single impact. This is qualitatively in accordance with the remarks by van del Doel et al. about hard collisions (van del Doel, Kry, and Pai 2001). Gesture-based control models can be designed for the sound synthesis algorithm, where the impact velocity of the hammer is used as the main control parameter. In a recent work, a virtual percussion instrument has been designed, where the sound model is based on the hammer-resonator system described above and the control model is implemented through a gestural interface (Marshall, Rath, and Moynihan 2002). The subtle sound nuances achieved by the model provide a rich timbral palette that can be very useful in sonification and auditory display. Its responsiveness to user gestures is especially suitable for feedback sounds, where the audio information should give confirmation about the extent and quality of action performed. 224

Page  00000225 x10-s E 2 -E 0 Q.-2 -C, Figure 4: A circular membrane displaced from its rest position along the axes of mode(l,l) (left) and mode(l,2) (right). The frequencies of vibration along these axes are 1.593 and 2.917 times that of mode(0,1) (the "fundamental"). (a) 1.5 time [s] Figure 3: Numerical simulations; (a) impact on an oscillating resonator; (b) micro-impacts in a hard collision. Intersections between the solid and the dashed lines denote start/release of contact. 3 POSITION-DEPENDENT INTERACTION Figure 4 shows a membrane in states of displacement from its rest position along isolated modal axes. The distance of each point of the membrane from the "rest plane" is here proportional to the weighting factor of the mode at this position. Note that the section lines of the mode-shape with the rest plane stay fixed through the whole movement along this modal axis, since the weighting factors at these positions are obviously 0. Correspondingly, an external force applied at these node lines does not excite the mode at all. Determination of position dependent weighting factors There are several possible approaches to gain the position dependent weights. In the case of a finite one dimensional system of point masses with linear interaction forces, modal parameters are exactly found through standard matrix calculations. Most systems of interest of course do not fit these assumptions. In some cases the differential equations of distributed systems can be solved analytically, giving the modal parameters; this holds for several symmetrical problems as circular or rectangular membranes. Alternatively, either accurate numerical simulations (e.g. wave-guide mesh methods) or "real" physical measurements can be used. Impulse responses computed (or recorded) at different points then form a basis for the extraction of modal parameters. The acoustic "robustness" of the modal description allows convincing approximations on the basis of microphonerecorded signals of e.g. an objects struck at different points, despite all the involved inaccuracies: spab) tially distributed interaction, as well as wave distribution through air, provide signals that are quite far from impulse/frequency responses at single points. The following considerations illuminate the possibility of aforementioned estimations. Equations (1) and (2) correspond to the frequency response of a resonant lowpass filter. The transfer function connected to one or a pair of spatial points of the system is a weighted sum of these responses with position dependent factors. Even in non-ideal recording conditions, the prominent modes can be identified from peaks in the response. The level of the peak j reflects the position dependent weight while its width is related to the time factor tj. Decay times can though be extracted more easily from STFT values at different temporal points. The clear perceptual character of these parameters finally allows "tuning by ear" which in many situations is the final judging instance. Qualitative observations on modal shapes, exemplified in figure 4, can be effectively used in a context of cartoonification: for modes of higher frequencies the number of nodes increases and its spatial distance accordingly decreases. 1. One consequence is that for higher modes even small inaccuracies in interaction or pickup position may result in strongly different weighting factors, so that an element of randomization can here add "naturalness". 2. For interaction positions close to a boundary, which is a common node for all modes, the lowest modes gradually disappear and higher modes (with smaller "regions of weight") relatively gain in importance. This phenomenon can be well noticed for a drum: if 225

Page  00000226 the membrane is struck close to the rim the excited sound gets "sharper", as the energy distribution in the frequency spectrum gets shifted upwards ("rimshots"). For a clamped bar higher partials are dominant near the fixed end, whereas lower frequencies are stronger for strokes close to the free vibrating boundary (noticeable in sound adjustments of electromechanical pianos). Similar considerations apply to points of symmetry: some resonant modes, those with modal shapes antisymmetric to central axes, are not present in the center of a round or square membrane. They consequently disappear "bottom-up" when approaching the center point. 4 IMPLEMENTATION USE AND The model has been implemented as a module for the real-time sound processing software PD2. Controls are handled with the temporal precision of an audio buffer length; values for the PD buffersize are e.g. 32 or 64 samples, that provide temporal accuracies of a 0.75 ms and a 1.5 ms, respectively. The iterative algorithm used for solving the non-linear interaction has been observed to exhibit a high speed of convergence: the number of iterations at each time step is never higher than four. As a consequence, the module runs in real-time on low-cost platforms. The quality of the audio generated from the model has been assessed through both informal evaluations and formal listening tests (Avanzini and Rocchesso 2001 a): in general, the impact sounds are perceived as realistic. The control on the impact location provides convincing results. The impact model is being used as the kernel of a variety of sound models. It has been shown that certain sounds are very effective to convey information about continuous processes. For instance, the sound of a vessel being filled is quite a precise display of the level of liquid (Vicario 2001) Similarly, sliding, rolling, scraping, and crumpling are all phenomena that are easily recognized by ear, thus being perfect candidates for auditory display. If properly cartoonified, they can be used as dynamic auditory icons that give information about events and processes. Just by temporal organization and dynamic control of micro-impacts, we have been developing sound models for a variety of these phenomena. Real-time demonstrations will be given at the conference. Again, the veridical control that we can exert on the contact model, and the possibility to manipulate the control variables in real-time, give to the sound designer much more flexibility. In the end the sound cartoons of complex processes such as scraping or rolling turn out to be more engaging and pleasant as compared 2http: //www.pure-data.org/doc/ to the results of models based on samples (van del Doel, Kry, and Pai 2001). 5 Acknowledgments This work has been supported by the European Commission under contract IST-2000 -25287 (project "SOb - the Sounding Object": www.soundobj ect.org). References Adrien, J. M. (1991). The Missing Link: Modal Synthesis. In G. De Poli, A. Piccialli, and C. Roads (Eds.), Representations of Musical Signals, pp. 269-297. MIT Press. Avanzini, F. and D. Rocchesso (2001a, Sept.). Controlling Material Properties in Physical Models of Sounding Objects. In Proc. Int. Computer Music Conf. (ICMC'OI), La Habana. Available at http://www. soundobject.org. Avanzini, F. and D. Rocchesso (2001b, Dec.). Modeling Collision Sounds: Non-linear Contact Force. In Proc. COST-G6 Conf. Digital Audio Effects (DAFX-01), Limerick, pp. 61-66. Available at http://www.soundobject.org. Begault, D. R. (1994). 3-D Sound for Virtual Reality and Multimedia. Academic Press. Brewster, S. A. and M. G. Crease (1999). Correcting Menu Usability Problems with Sound. Behaviour and Information Technology 18(3), 165-177. Buxton, W. (1990). Using our Ears: an Introduction to the Use of Nonspeech Audio Cues. In E. J. Farrel (Ed.), Extracting Meaning from Complex Data: Processing, Display, Interaction, pp. 124-127. Proceedings of SPIE, Vol 1259. Gaver, W. W. (1993, Apr.). How Do We Hear in the World? Explorations in Ecological Acoustics. Ecological Psychology 5(4), 285-313. Hunt, K. H. and F. R. E. Crossley (1975, June). Coefficient of Restitution Interpreted as Damping in Vibroimpact. ASME J. Applied Mech., 440-445. Klatzky, R. L., D. K. Pai, and E. P. Krotov (2000, Aug.). Perception of Material from Contact Sounds. Presence 9(4), 399-410. Marhefka, D. W. and D. E. Orin (1999, Nov.). A Compliant Contact Model with Nonlinear Damping for Simulation of Robotic Systems. IEEE Trans. Systems, Man and Cybernetics-Part A 29(6), 566-572. Marshall, M., M. Rath, and B. Moynihan (2002, May). The Virtual Bhodran - The Vodhran. In Int. Workshop on New Interfaces for Musical Expression, Dublin. O'Brien, J. F., P. R. Cook, and G. Essl (2001, Aug.). Synthesizing Sounds from Physically Based Motion. In Proc. ACMSiggraph 2001, Los Angeles. van del Doel, K., P. G. Kry, and D. K. Pai (2001, Aug.). FoleyAutomatic: Physically-based Sound Effects for Interactive Simulation and Animation. In Proc. ACM Siggraph 2001, Los Angeles. 226

Page  00000227 Vicario, G. B. (2001). Phenomenology of Sound Events. In D. Rocchesso (Ed.), Deliverable of the Sounding Object Project Consortium. Available at http://www.soundobject.org. 227