Page  00000108 An Approach to Sound Morphing based on Physical Modeling Takafumi Hikichi Naotoshi Osaka Media Information Laboratory, NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi, Kanagawa 243-0198, Japan {hikichi,osaka}@brl.ntt. co. jp http://www.brl.ntt.co.jp/people/{hikichi,osaka}/ Abstract Our goal is to develop sound synthesis technology that allows users to synthesize arbitrary sound timbre, including musical instrument sounds, natural sounds, and their interpolation/extrapolation on demand. For this purpose, we investigate sound interpolation based on physical modeling. We propose a morphing framework based on physical modeling that can produce intermediate timbre between different musical instrumental sounds. Piano and guitar sounds are simulated using a physical model, and smooth timbre change, i.e. interpolation is investigated. The results suggest the possibility of developing a morphing system using a physical model. 1 Introduction Physical modeling synthesis is now becoming one of the most promising methods for simulating musical instrument sounds. Since the artificial instrument can be have the same control parameters as the real one, users can control its timbre more intuitively than with the other abstract methods. Many artificial instruments have recently been developed [1] [2] [3]. Researches on flexible model structures and cost-effective algorithms for real time implementation have been reported. Now there are various synthesis systems with expressive control. Controlling the timbres of different musical instrument sounds and between them using physical models, however, has not been attempted so far. Such techniques, called timbre morphing, have been implemented by signal-based methods. Several researchers, including one of the authors, have discussed morphing or sound interpolation techniques [4] [5]. In the field of speech synthesis, speech morphing has been investigated [6]. Our goal is to develop sound synthesis technology that allows users to synthesize arbitrary sound timbre, including musical instrument sounds, natural sounds, and their interpolation/extrapolation on demand. We intend to utilize the controllability of physical models in order to apply a sound morphing system to various sounds. The morphing techniques include 1) extracting the model parameters from the original signals, 2) modifying the parameters, and 3) synthesizing the signals. Finding a proper physical model from acoustic signals is important and difficult. However, we do not treat this problem here. As a first step to morphing, we are primarily concerned with smooth timbre control using a physical model. 2 Basic concept We propose a morphing framework based on physical modeling that can produce intermediate timbre between different musical instrumental sounds. This approach has not only the merit of physicalmodel-based methods, but also generality in the sense that morphing between different instrumental sounds can be achieved. Here, interpolation of sound sources are considered. For two sound sources having the same production mechanism, sound interpolation can be achieved by simply interpolating different physical parameters. For two sources having a different production mechanism, an integrated model that includes the different models is considered. According to this approach, all intermediate sounds have the production mechanism, and as a result, sounds of natural and homogeneous quality are expected. Furthermore, the number of parameters needed for synthesis is generally smaller than that of signal-based methods. The major disadvantages are that a model or algorithm must be built first and that the model limits the timbre range that can be produced. The purpose of this report is to introduce a synthesis algorithm that achieves smooth and gradual timbre conversion from one timbre to another, i.e. timbre interpolation. Interpolation approaches are two aspects: one is "structural" and the other is "characteristic." Piano and guitar sounds are selected as targets to be represented by a unified physical model, and the interpolation between them is investigated. Both have a similar mechanism: the strings are excited by an object, and the vibration of the strings propagates to a resonator, and radiates into the air. The key idea for interpolation is that, by properly adjusting the parameters, two different timbres can be synthesized by one model, - 108 - ICMC Proceedings 1999

Page  00000109 and that smooth transition from one timbre to the the piano hammer and the collision of the hammer other may be possible. with the string are described as 3 Physical model This section describes the physical model used in this experiment. For the purpose of synthesis, there are well-known cost-effective algorithms tuned for real time processing available. However, in order to clarify the relationship between physical parameters and synthesized tones, we use the classical method based on numerical solutions of differential equations, and assume a simple model composed of an exciter, a vibrator, and a resonator connected in series (Fig. 1). exciter j vibrator resonator - Figure 1: A block diagram of the physical model. 3.1 A. vibrator model The present model describes the transverse motion of a one-dimensional vibrator with damping which is hit/plucked by a nonlinear hammer/plectrum. This model is based on that reported in [7] for the simulation of the vibiation of a piano string struck by a hammer. The vibrations are governed by the following equation: 82y T O2y 2ES 4y t~ = / OZ2 p Ox4 ay 83y -2bi + 23bs + f/(zo, ), (1) where y is string displacement, p is line density, T is tension, E is Young's modulus, c is the radius of gyration, S is the cross-sectional area, bl and b3 are damping coefficients, f(z, z, t) is force density, and zx is the distance of the hammer from one end of the string. The two partial derivatives of odd order with respect to time simulate a frequencydependent decay rate of the form, d(w) = bl + b3 2, where w denotes angular frequency. The string is assumed to be hinged at one end, and to be connected to a resonator at the other end. The force density term f(z, zo,t) represents the excitation by a hammer, a plectrum, or fingers. This term is limited in time and distributed over a certain width. 3.2 Excitation model The excitation model for striking and plucking is described here. According to Ref. [7], the motion of MH d-- - -FH(t), dt2 (2) FH(t) K 17(t) - y(zo, t)l P, 7(t) L y(0o, t) ( 0, (t) < y(xo,t) (3) where 77is hammer displacement, FH(t) is hammer force, and Mf is hammer weight. Coefficients K and p are determined experimentally. The relationship between f(x, xo,t) and FH(t) is S O f FH- (t) (-C, To) f(Z, o,t) =.-ZO+6 -Y 6g(x,xo)dx JXo -6: (4) where 26x is hammer width and g(x, xo) is force distribution along the string. This is the struckstring model used in Ref. [7]. On the other hand, the most primitive model for plucking is described by specifying the initial shape of a string. Recently, more elaborate physical models of the plucking process have been reported (e.g. [8]). They are based on mass-spring representation. Since they are developed independently with a hammer-string interaction model, the struck-string model and the plucked-string model have little room to be shared. Here, we will explain how to extend the struck-string model to the plucked-string model. The most significant difference between striking and plucking appears at the end of an excitation force signal applied to a string. When a string is struck, the string is pushed downward by a hammer, as shown in Fig. 2(a). Compressive force given by Eq. (3) is exerted on both the string and the hammer, as depicted by "contact and compression." Then, the string pushes back the hammer, and when the distance between the string and the hammer becomes zero, the hammer releases from the string naturally, as shown by "natural release" in Fig. 2(a). When a string is plucked, on the other hand, the finger or plectrum is pulled off the string suddenly and the excitation force becomes zero at the end of the contact period, as depicted by "sudden release" in Fig. 2(b). Hence, force duration time tf is introduced to the conventional model. The proposed excitation model is expressed as 177(t) - y(xo, t)jP, 7(t) > y(x0, ), FH) t < tf, (5) 0, others When tI > t,i, where t,t denotes force duration in the struck-string case, Eq. (5) expresses the conventional struck-string model shown in Eq. (3). When tf is set so that t1 < t,t, force signal becomes zero at t = tf like a step function. In the experiment reported later, the proposed model could ICMC Proceedings 1999 - 109 -

Page  00000110 treat both cases and intermediate conditions continuously by controlling tf and other parameters. Frictional force, which may be exerted on the exciter and the string, is not considered in this model. An example of force signals for various interpolation rates a is shown in Fig. 3. a = 1.0 represents striking. By changing the parameter a little by little, the force signal varies gradually. force F,(i) is derived by F,(t) = Ta (7). - l- ----n --tura reeas La natura rele-----------ase natura elrease (*)* P contact and compression sudden release Figure 2: Behavior of an exciter and a string in the (a) striking and (b) plucking motions. 3.4 Numerical solution Equations (1), (2) and (4)-(7) can be digitized by using an explicit finite difference scheme, and the recurrence equations are derived. The velocity signal at the junction between the string/strings and the plate, which corresponds to a bridge, is calculated and used as synthesized sound. 4 Experiment 4.1 Parameter control This section describes the strategies we used to implement timbre interpolation between simulated piano and guitar tones. Subjective similarity tests were conducted in order to evaluate the strategies. After several trials, we selected parameters for simulating the piano and the guitar sounds. For the resonators, global characteristics of the frequency response were simulated. In order to implement smooth interpolation, the transition path was divided in three domains, and parameters were gradually changed in each domain. 1. From a piano to a struck-string tone * The piano tone was simulated using three slightly detuned strings. The difference between strings in cents was'gradually changed to 0. This decreases beats. * Damping coefficient of the plate b, was gradually varied to a larger value. This avoids creating a sharp resonance between string partials and plate partials. 2. From a struck-string to a plucked-string tone * The excitation condition was interpolated. * The damping coefficients of the string and plate, b3, were interpolated linearly on a log scale, while keeping damping coefficient of the plate b1 constant. 3. From a plucked-string to a guitar tone * The lowest mode frequency of the guitar body was simulated. The numbers of samples selected from the three domains are the following. Piano to struck-string used 3 tones. Plucked-string to guitar used only 1 tone, because the difference between a guitar tone and a plucked-string tone is very small. For intermediate time [ms] Figure 3: Force signals with various interpolation rates a. 3.3 A resonator model The resonator model used here is a rectangular plate with supported boundaries. This is one of the simplest approximations of a piano soundboard. The soundboard is connected to one end of a string/ strings meeting at a point. The vibration of the plate connected with the string/strings is described as 82z K2E 8z 93z 2 V - 2V61- + 2b3 aat2 - p(l - v2) at aj +F 6(1, yi), (6) where z is plate displacement, p is density, v is Poisson's rate, E is Young's modulus, K is the radius of gyration, h is thickness, F,(t) is the force exerted from the end of the string/strings on the plate, 6(x1,yl) is Kroneker's delta, and (x1, y1) is the junction between the strings and the plate. The - 110 - ICMC Proceedings 1999

Page  00000111 tones between struck and plucked, 6 sounds were synthesized at physically equal intervals, i.e. 0%, 20, 40, 60, 80, and 100%. From the above procedure, ten sounds were synthesized with their timbres gradually changing from piano to guitar. 4.2 Subjective evaluation The subjective evaluation was performed using the synthesized tones. Ten subjects, from 18 to 27 years of ages, judged the timbral similarity of the pair of tones on a seven-point scale, where value 0 corresponds to the "same" and 6 to "totally different." There were four trials for each pair including both orders of the tones, and 180 trials in total. For each pair, a mean score of the judgement across subjects and repetitions was calculated. This is regarded as a subjective distance. A multidimensional scaling (MDS) technique was adapted to the subject distance data. A two-dimensional solution modeled the responses with a stress of 12.2%. Figure 4(a) displays the solution. 9'!., " "., a 8.... t7; these adjusted tones, similar subjective tests were performed, and the subjective distances were measured. Two-dimensional solution of the MDS calculated from the similarity data is shown in Fig. 4(b). In Fig. 4(b), the plot shows degeneration for indices 1-3. Most of the stimuli are located near the straight line connecting the two edges, especially when excluding stimulus 9. Therefore, the intermediateness is relatively confirmed. Stimulus 9 moved too much to locate near stimulus 10 by considering the centroid. 5 Conclusions An approach to timbre interpolation using a physical model was described, and subjective tests were done using the synthesized tones. Parameters were adjusted using the spectral centroid as a reference, and as a result, smooth timbre change was achieved. Sound demos will be available in the presentation. Future work includes finding physical models from acoustic signals automatically and extending timbre range. We will also adapt these results to a morphing software system. 6 Acknowledgments The authors would like to express their gratitude to Dr. Norihiro Hagita for his enthusiastic support. References [1] G. Borin, G. De Poli, and A. Sarti, "Algorithms and structures for synthesis using physical models," Computer Music J. 16, 30-42 (1992). [2] J. 0. Smith, "Physical modeling synthesis update," Computer Music J. 20, 44-56 (1996). [3] 0. Calvet, R. Laurens, and J. M. Adrien, "Modal synthesis: Compilation of mechanical substructures and acoustical sub-systems," Proc. Int. Computer Music Conference, 57-59. (1990). [4] E. Tellman, L. Haken, and B. Holloway, "Timbre morphing of sounds with unequal numbers of features," J. Audio Eng. Soc. 43, 678-689 (1995). [5] N. Osaka, "Timbre interpolation of sounds using a sinusoidal model," Proc. Int. Computer Music Conference, 408-411 (1995). [6] M. Slaney, "Automatic audio morphing," Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 1001-1004 (1996). [7] A. Chaigne and A. Askenfelt, "Numerical simulations of piano strings. I. A physical model for a struck string using finite difference methods," J. Acoust. Soc. Am. 95, 1112-1118 (1994). [8] G. Cuzzucoli and V. Lombardo, "Physical model of the plucking process in the classical guitar," Proc. Int. Computer Music Conference, 172-179 (1997). (a) (b) Figure 4:: Perceptual timbre space of the ten synthesized tones. Two-dimesional space is generated by MDS. Indices 1 and 10 represent the piano and the guitar. (a) The spectral centroid is not considered (Kruskal's stress = 0.122). (b) The spectral centroid is considered (Kruskal's stress = 0.071). The plot shows degeneration for indices 1-4 and 9-10. The path makes a curve, and index 6 is not located in between the two edges, although it has almost the same distances from the edges. This result does not satisfy intermediateness. This is because the subjective distances show saturation when physical distances are large. 4.3 Consideration of the spectral feature Signal analyses of the synthesized tones suggest the relationship between the spectral centroid and one of the axes in timbre space in Fig. 4(a). So, string damping coefficient b3 was adjusted so that the cen-- troid of the tones was linearly interpolated. For ICMC Proceedings 1999 - 111 -