Page  85 ï~~Real Time Control of 3D Sound Space by Gesture Tsutomu Ilarada, Akio Sato, Shuji Hashimoto, Sadamu Ohteru Department of Applied Physics Waseda University 3-4-1Okubo, Shinjuku-ku Tokyo, 169 JAPAN ABSTRACT In most interactive performance systems, the music output's tone and tempo are controlled in real time to allow for artistic improvisation. However, the supposed spatial layout of instruments is fixed during the performance. In the proposed system, we tried to control the position and movement of the sound sources in 3D space, interactively, by the performer's natural gesture. The system consists of Data Glove, the gesture recognition system, the MIDI controller and three speaker pairs placed oppositely on X, Y, and Z axes. A performer can use the proposed system to control the 3D sound interactively and thus integrate the Improvised sound motion into his musical performance. 1. Introduction 2. System Implementation Localizatlons and movements of sounds are often experienced in our daily life. If we can control the sound movements In a music performance, the limits of musical expression seems to be extended enormously[l]. However, at present only a few audio professionals can realize this by their highly advanced skills of musical sound mixing in electric amplified music performance. For ten years, our laboratory has been engaged In the development of man-machine Interfaces for computer music[2}[3]. Recently we have developed a 3-D sound system which can control the musical sound space by gesticulation with DATAGLOVE in real-time. The system makes It possible to enjoy musical sound localization and Its movement without the complex techniques of the sound mixing engineer. In this paper, we introduce our new system with experimental results. The system consists of a MIDI instrument (E-mu Proteus), MIDI sequencer(Roland), DataGlove system (VPL), personal computer (NEC Amplifier Speaker fig. l System Implementation 85

Page  86 ï~~PC-9801), and 6 speakers with amplifiers (fig. 1). The MIDI instrument has 3 pairs of stereo outputs. The DataGlove system measures the bending of a performer's fingers and the position of his hand wearing the glove and sends these data to the computer in real-time. The personal computer recognizes the performer's gesticulations, calculates balances and amplitude for 3 pairs of speakers, and then produces a sound image at the expected position. These processes are all managed In real-time. For this system we use the algorithm of the gesticulation system which was developed for the articles 'A Computer Music System that Follows a Human Conductor[2]' and 'Virtual Musical Instrument[3].' This system also recognizes gestures signifying commands to adjust the parameters for system control and to select the instrumental sounds. 3. Sound Localization and Movement 3.1 Sound Localization First we attempted to control sound movement along a line between a pair of speakers. The Imaginary position of sound localized by the speaker pair is affected by a variety of factors, such as the characteristics of the speakers and MIDI instruments, the sound condition of the room and the human auditory system. For example, the MIDI velocity parameter is not linear to the power of speaker output, and the human auditory sense is not linear to the power of sound. To realize localization and movement of a sound image in high fidelity, our system compensates for these effects by making a compensating function (eq. 1). V = T(P) (eq. 1) V: MIDI parameter VELOCITY P: MIDI parameter PAN We can easily determine this function experimentally using the glove (fig. 2). The performer makes the function by simply drawing a curve with his finger while wearing the DataGlove, while he is listening to the sound and looking at his drawn curve in the display. When he ends up making the function, he has only to signal the 'end of edit' sign to the system (fig. 3). To realize sound image movement in 3-dimensional space, the system uses the compensating function and an algorithm as Move Sound Selecp ]n1st ument pianoÂ~ No fig.3 Examples of Control Gestures /00ZN.....t" s., 1 ' "....o s. ". "....... ';.... s " " '.. t " n _! 9 Jmpl itude 0 Speaker S Sound Image..,O,,.,, N/ ee I ii o ngof o i 4 - I aolloo I e Ii l. CPenmt i r Funion T(P) Sound Moverent DATAGLOUE fig.2 Making of the Compensating Function Spea I er 86

Page  87 ï~~follows. In this algorithm the direction of sound image, which is pointed out by performer's first finger, Is taken to represent a center of gravity 0 of 6 point masses. 0 = smr, / m, r,: position vector of m, m, = V, (eq. 2) The weight of each point mass m, Is taken to be the power Vi of each speaker's output (fig. 4). The distance between the performer and the sound image, the position of which is indicated by the performer's finger, is also considered in order to realize 3-dimensional sound localization. The total power of the six speakers' output, represented by the total weight of the point masses, is calculated and compensated for adjusting the distance using the function T (eq. 3). selected through an appropriate gesture. After learning the gestures for playing certain Instruments through the glove, the system recognizes the player's gestures and selects each Instrument, such as a piano, trumpet, saxophone, and so on, all of which are interchangeable in real time. The performance can be enjoyed according to one's musical preferences and can also be "experienced" through the feeling of playing these Instruments. 3.3 Gesture Comprehension At first, the user sets the limits of the left-hand position (up, down, right, left, front and back), velocity (vertical or horizontal), and finger flexing (grasping or stretching), so that the system suits the user's physique. Next It quantifies the movement data from the DataGlove relative to these limits to produce a movement-pattern vector S. This vector consists of the 14 components shown In Table 1. The discriminating function RTJ expresses the relationship between the movemen t-pattern vector S and the meaning vector C, which consists of the 9 components shown in Table 2. RTj transforms the more meaningful part of S into certain positive values, the less meaningful part of S into certain negative values, and the other part into 0 (fig. 5). This RT,3 for each S,, which determines the meaning C, can be made up automatically from several samples of the same gesture. The meanings can Xm, = 'V = T( I 0 I) (eq. 3) The system inversely calculates the amplitude of each speaker from the position of the performer's finger In real-time to localize and move the musical sound image. In our system, spatial resolution of finger positioning in performance space is 128x 128 x 128, while output resolution of sound localization Is 32x 32x 32. 3.2 Selecting Instruments through Gestures With this system instruments can also be S2 center of gr 0 avity image fig.4 The Method using 'Center of Gravity' 87

Page  88 ï~~be determined with the help of the discriminating function RT by the following formula. C,= E RTJ (S,) (eq. 4) If the value of the meaning C, is maximum and over a predetermined threshold value, the computer understands that the user is Indicating the meaning C,. If the meaning C, expresses the value piano, for example, the system sends the message to generate a piano tone as the MIDI signal. Component:Descrlption Component:Description So: x Co: piano S,: y' position C,: guitar S2: z C2: choir S3: Vx velocity C3 " violin S4: Vy vC4: saxophone saÂ~ VzJ C: trumpet S::yaw Ca: orchestra S,: pitch C7T: the first part S8: roll C8: the second part So: thumb flex Co0: the third part Sto: Index flex S,: middle flex Table.2 Meaning vector SS2: ring flex C=(C0, Ct,,' ',Cn) S,3: pinkie flex Table 1. Movement-pattern vector S=(So, S1,..,Sn) RToo(So) RT o(S1) 0....n -.... ', ",... '" ',So $1 fig.5 Examples of Discriminating Function 4. Performance The performer can control the sound movement very easily by pointing the direction in which he wants to locate the sound image. At any time in the performance he can select Instruments by means of a playing gesture. The system also displays a simple picture of the selected Instrument moving in the room as an animation. All these processes are managed in lOOms. We enjoyed Bach's Invention no.1 with this system, changing Instruments and moving their positions improvisationally. If the performer wants to modify the proposed compensating function, he has only to indicate so by gesture, and the system enters the editing mode to modify the table. The easiness of controlling system can make him perform freely and creatively. 5. Conclusion With this system we can control a 3-dimensional virtual musical sound space by natural gesture - the localization and movement of the sound Image and the selection of Instruments In real time. Furthermore, the system can control video disc recorder to display the performance scene of each instrument to improve the virtual reality. Musical performers and composers have longed for controlling their performance space, and now they can create music pieces which also consider movements of sound along with the progress of music. This system can also be applied to performing plays with high sound presence. For example, if performers wear certain position detectors, then the system produces movements of sound as they move. Real time sound space control will play a more important part in musical performance in near future to overcome the limitations of instrument positioning and the corresponding sound movement. 6. References [1] Bosi, M. "An Interactive Real-Time System for the Control of Sound Localization," ICMC Glasgow 1990 Proceedings, pp.1l12-114 [2] Morita, H. Hashimoto, S. O hteru, S. "A Computer Music System that Follows a Human Conductor," Computer, pp. 44-53, IEEE, 1991 [3) Sato, A. Harada, T. Hashimoto, S. Ohteru, S "Singing & Playing in Musical Virtual Space," IC.1C 1991 Montreal Proceedings, pp.289-292 88