Page  257 ï~~Gesture Analysis Using 3D Acceleration Sensor for Music Control SAWADA, Hideyuki; OHKURA, Shin'ya; HASHIMOTO, Shuji Department of Applied Physics, School of Science and Engineering, WASEDA University E-mail: 695L5113, Abstract A musical instrument is a device to translate body movements into sound. Moreover, gesticulation is often employed in musical performance to express the performer's emotion. Many approaches have been proposed to analyze gesticulation or body movements by sensing the positions or trajectories of their feature points using position sensors or image processing techniques. The most important emotional information in human gestures, however, seems to be in the forces applied to the body. This paper proposes a gesture analysis algorithm based on the force patterns detected by a 3D acceleration sensor, and its applications for real-time musical performance controlled directly by the human gesture. 1. Introduction In the field of musical performance, gesture is widely used. A conductor directs a musical performance by his baton movement. Conducting is a common nonverbal language globally used for music. Beyond this, a musician plays a musical instrument which can be regarded as a kind of equipment to translate body movements into sound. In traditional musical instruments, the relationship between the action and the generated sound is determined by the physical structure of the instruments. The authors have so far paid attention to emotional feelings presented in music, and have developed a computer music system that can follow a human conductor(Morita et al.) and a virtual instrument that generates sound according to a player's gesticulation by using a data glove and image processing techniques(Sato et al.). In most previous musical performance systems(Katayose et al.)(Mathews)(Rubine and McAvinney) (Keane and Gross), including ours(Morita et al.)(Sato et al.), body movements were measured by the positions or trajectories of the feature points using position sensors or image processing techniques. The most important emotional information in human gestures, however, seems to appear in the forces applied to the body. Therefore a system is required which will respond to applied forces for more impressive musical performance. In this paper we introduce a three-dimensional acceleration sensor for a real-time analyzing algorithm of gesture in its emotional aspects. Furthermore, a new musical performance system controlled by hand movements is proposed. The kinetic parameters of the hand movements are extracted from an acceleration vector sequence to recognize human gestures. The recognition results can be used for both musical performance and timbre control in real time. In performance control, the gesticulation is used as a sort of nonverbal command to direct performance advancement, while in timbre control the kinetic parameters are associated with the sound parameters using a neural network to make a personal instrument which allows private definition in the learning phase. 2. Acceleration and Kinetic Parameters 2-1. 3D Acceleration Sensor Piezo-electric Device 151m To A/D Converter Metal Weight 10Em Circuits Inner View of the Sensor Figure 1. 3D Acceleration Sensor The acceleration sensor used in the proposed system is small enough to be attached to any point of the human body as illustrated in Figure 1. The size is 20mm long, 15mm wide and 12mm high. It can sense independently three-dimensional acceleration by piezo-electric devices which cause a change of voltage according to the amount of acceleration applied to the sensor. The sensitivity is about 5my per G in the range between -25G and +25G. The voltage signal is amplified and fed to a computer through an A/D converter as 12-bit binary data. IC M C P ROC E E D I N G S 199525 257

Page  258 ï~~2-2. Extraction of Kinetic Parameters The three-dimensional acceleration data a x(t), a y(t) and a z(t) are independently obtained in real time, which correspond to the accelerations in x, y and z directions at time t, respectively. To extract kinetic features we use three projection vectors defined as, Al(t) = ( a y(t), a z(t) ), A 2(t) = ( a z(0), a x(t) ), A 3(t)=( a x(t), a y(t) ). (1) As shown in Figure 2, Ai(t), A2(t) and A 3(t) represent the acceleration vectors on the y-z, z-x and x-y planes, respectively. Eleven kinetic parameters shown in table 1 are extracted from one sequence of each projected acceleration vector. We set the sampling rate 25 Hz and used 30 successive acceleration vectors for the gesture recognition. Function Value f f5 a IIi (t)-,-,Table I:Â~Kinetic Parameters As(t+l}). P -d- 1 'ai3 P Pas P16 P17 Na Pd Time differences of Vectors / f-,,A,(t) Pg Vector products among vectors Pr Aspect ratio of circumscribed rectangle a y...3 i/4 -. a/ 0 9./2 3./4 V e Pao-Pa7 Directional characteristics of vectors f6 Input Value Figure 2. Acceleration Vectors Figure 3. Membership Function Pd represents the amount of fluctuation of the acceleration. The rotating direction of the hand movement is expressed by Pg, while Pr gives the directional information of the maximum vector. For the calculation of Pa's, the membership function shown in Figure 3 is applied. They correspond to the eight main directions of the vectors in every 7r /4 degrees. For example, for a vector whose angular direction is a little larger than irt/4, the values f5 and f6, as shown in Figure 3, are assigned to the parameters Pa5 and Pa6, respectively, while the other Pa's are zero. Thus Pa's represent the tendency of the movement direction. All the kinetic parameters are calculated in real-time from the sequential vectors A 1(t), A 2(t) and A3(t) independently and are referred to for gesture recognition. 3. Performance Control 3-1. Gesture Analysis The system diagram for musical performance control is shown in Figure 4. The frequency and amplitude of the acceleration caused by a human performer's hand movement determines the tempo and velocity of the MIDI music, respectively. Furthermore, the performer can direct the performance expression by his gesticulation as nonverbal commands. Real-time gesture recognition is executed by comparing the obtained kinetic parameters with the standard data which is acquired in the previous learning phase. In the learning phase, the performer inputs gestures to be recognized five times each. Then averages E g a and standard deviationsIt g a of the kinetic parameters are calculated for each gesture g as shown below, and stored as the standard pattern data. 1 1 l g g 2 V: Parameter Value E = -V(, V= "2(vhE) a= Pd, Pg, Pr, Pa's h-I (2) The performer also defines the meanings of these gestures as performance commands such as the performance start, stop, tempo change, volume change, staccato, tenuto, selection of instruments, and so on. In the recognition phase, the kinetic parameters are extracted and the evaluation value e g is calculated for each standard pattern as below, (v;--E ) eg= ( -E)2v'- Parameter Value Then the minimum e g is selected as a candidate. In case it is smaller than a predetermined threshold value, the result of the gesture recgnition is confirmed as the gesture g to control the performance according to its outcome. 258 8ICMC PROCEEDINGS 1995

Page  259 ï~~3-2. Performance Control Ten gestures are currently used in performance control. They are, the shaking up and down, horizontal stroke, diagonal stroke, single shake, star shape stroke, cross stroke, triangle shape stroke, heart shape stroke, clockwise rotation, anti-clockwise rotation and pause. Using these gestures the performer can direct the performance in real time. Figure 5 shows typical acceleration trajectories recognized by the system. Acceleration Data Acceleration Data Score Information Kinetic Parameter Extraction Kinetic Parameter Extraction Neural Network DataGestureRe ognitiono.................undParameters_...... - Evelope Parameters Postfilter Characteristics SPerformance Control Signal] Sound Source MIDI Instrument Effector Performance Performance Figure 4. Diagram of Performance Control Figure 6. Diagram of Timbre Control 2, f,,............. X Xi: Input Vector............................--_...._..............II../! I....,,.., - X Clockwise Rotation Single Shake az 1 Figure 5. Examples of A raging......-Acceleration Trajectories....; y i: output Vector Triangle Shape Figure 7. Associative Neural Network 4. Timbre Control 4-1. Associative Neural Network Musical timbre is related to various factors such as spectra, attack time, decay time, sustain level and envelope figure(Hartono et al.). We use white noise as a sound source and control the sound parameters (which include the envelope parameters and post filter characteristics) to modify the generated sound. An associative neural network (A-N.N.) is employed to interpret the kinetic parameters of hand movement into the sound parameters as shown in Figure 6. The structure of the A-N.N. is shown in Figure 7. We assign the kinetic parameters of i-tb gesture to the elements of the rn-dimensional input vector X~ = (xi,.-. x1 )T, and the corresponding sound parameters to the elements of the n-dimensional output vector yi= (yl,.... yni)T. To perform the quasi-orthogonalization of vectors, the input and output vectors are transformed as shown below, IC M C PROCEEDINGS 199525 259

Page  260 ï~~Xi= (X;,, xFk)T:=F(xi) Â~:kim elements for each pattern i Yi = (Yli,'Yni) T= F(yi) Â~:kn elements for each pattern i (3) F expresses two operations. The first one is to increase the number of elements by k times as, xk-p-. = Xi yk-q-i, p =1,2,-",,Y i q = 1,2.-- i:, j 0,1.-.k-1 (4) and the second one is the randomization by element permutations. The associative information of the N pattern pairs is stored in the weight matrix W as, N I~s = Y'XT r= 1,2-",k-n W = -Wrs N.--Xi s = '2..,k-Â~ (5) Then the pattern Yo which corresponds to the input vector X o is recalled as, Y, = W" X, (6) The final output y, can be recovered as, y,= (y 1...y%)T= G(Y) (7) where G includes derandomization and averaging manipulations, which correspond to the reverse operation of F as shown in Figure 7. 4-2. Timbre Control Musical timbre is related to the various sound parameters. Our emotional impression on the musical sound such as "vivid", "heavy", "active" and "silent" are deeply affected by such parameters. On the other hand we often express such feelings by our gestures. So we tried to correspond these emotional gestures to music controlled by sound parameters, by applying the A-N.N. In this research we assigned the eleven kinetic parameters mentioned above to the input vector elements of the A-N.N. And we also selected six sound parameters; attack, decay, sustain level, release, high-pass filter gain and low-pass filter gain, for the output vector elements. The A-N.N. can be used to link input vectors and output vectors so that the network can produce various kinds of sound by gestures. The advantage of using the A-N.N. is that the system can produce not only the patterns used in the training phase, but also produce patterns that fall within the parameters of the trained sets, through the generalization ability of the neural network. 5. Conclusions In this paper we proposed a new kind of musical performance system together with the gesture analysis algorithm, using the 3D acceleration sensor to measure the force of gesticulation. Unlike conventional musical instruments, performers can arbitrarily define the relations between gesture and sound to be generated, and can make an adaptive performance in real time. By attaching another acceleration sensor at another part of the body, like head and arms, the ability of the gesture analysis can be easily improved to make the system more flexible and sophisticated. The goal of this study is not only to provide a new type of musical system driven by emotional gesticulation without the need for any special knowledge of musical instruments, but also to demonstrate the possibility of a new man-machine interface using gesture that creates a system with emotion. References Morita, Ohteru and Hashimoto, "Computer Music System that Follows A Human Conductor", IEEE Computer, Vol.24, No.7, pp.45-53, 1991 Sato, Harada, Hashimoto and Ohteru, "Singing and Playing in Musical Virtual Space", Proc. of International Computer Music Conference, pp.289-292, 1991 Katayose, Kanamori, Kamei, Nagashima, Sato, Inokuchi and Simura, "Virtual Performer", Proc. of ICMC, pp.138 -145, 1993 Mathews, "The Conductor Program and Mechanical Baton", Proc. of International Symposium on Music and Information Science, pp.58-70, 1989 Rubine and McAvinney, "The Videoharp", Proc. of ICMC, pp.49-55, 1988 Keane and Gross, "The Midi Baton", Proc. of ICMC, pp.151-154, 1989 Hartono, Asano, Inoule and Hashimoto, "Adaptive Timbre Control Using Gesture", Proc. of ICMC, pp.151-158, 1994 260 I6 CMC PROCEEDINGS 1995