Page  421 ï~~Acceleration Sensor as an Input Device for Musical Environment Hideyuki Sawada*, Naoyuki Onoe and Shuji Hashimoto Department of Applied Physics, School of Science and Engineering, WASEDA University * Research Fellow of the Japan Society for the Promotion of Science sawa@shalab.phys.waseda.ac.jp shuji@shalab.phys.waseda.ac.jp ABSTRACT This paper introduces two experimental musical performance systems both of which use acceleration sensors proposed at the last ICMC together with a Dataglove to understand human movements. The first one is a score based conducting system that can recognize conductor's performance directions and emotional expressions in real time. The second one is a gesture-to-sound conversion system which allows an individual performer or composer to environments. 1. Introduction Gesture plays an important role in our daily life as nonverbal media for emotional human communication. On the other hand, music is one of the most important nonverbal communication channels. Therefore, it is natural that gestures are often employed as an essential part in the musical performance to express performers' emotion. Conducting is a common nonverbal language globally used for music direction. If the computerized performance system can understand the conductor's gesture, the system will be most desirable [Morita et al. 91, Sato et al. 91]. Moreover, musical instruments can be considered to be devices to translate body movements into sound. Although, in traditional musical instruments, the relationship between the body action and the generated sound is determined by the physical structure of the instruments, we can obtain a flexible "super" instrument by defining the relation arbitrarily. We have so far paid attention to emotional feelings presented in music, and have been trying to construct a computational model with emotion by extracting essential expressions from music performance and gestures. Although most of the reported works to introduce the body movement into musical performance treat the shape or the position of body, the most important emotional information in human gestures seems to appear in the forces applied to the body. Force can be measured directly as the acceleration of body movements as we have reported in the previous ICMC [Sawada et al., 95]. In this paper we introduce three dimensional acceleration sensors as gesture input devices and propose a new musical environment driven by gestures. A Dataglove is also employed for the acquisition of hand shapes, so that it contributes to the realtime extraction of geometric parameters of body movements. The achieved environment consists of two different musical performance systems. One is a score based performance system controlled by a define personal musical instruments and sound human conductor. A performer is able to direct a performance with a baton in his right hand and gesticulation of his left hand. The other is a gestureto-sound direct conversion system for musical environment controlled by gesture. 2. Gesture Acquisition and Kinetic Parameters 2-1. Acceleration Sensor Although there reported some devices to sense body acceleration [Kanamori et al., 95], the device used in the present study is small enough to be attached to any points of the human body [Sawada et al., 95]. The size is 20am long, 15mm wide and 12mm high. It can sense independently three dimensional acceleration by the piezo-electric devices which cause the change of voltage according to the amount of acceleration applied to the sensor. The sensitivity is about 5mV per 1G in the range between -25G and +25G. Since the acceleration caused by human gestures easily exceeds 20G in the usual hand movements, this sensor is considered to be suitable for the use of gesture recognition. After passing through a 3 Hz high-pass filter to prevent the voltage signal from drifting, the acceleration data are amplified and fed to the computer through the A/D converter as 12 bit binary data. The three dimensional acceleration data ax(t), ay(t) and az(t) are independently obtained in realtime, which correspond to the accelerations in x, y and z directions at time t, respectively. To extract kinetic features from sequential acceleration data, three projection vectors are defined as, AI(t) = (ay(), az(t)), A2(t) = (az(t), ax(t) ), A3(t) = ( ax(t), ay(t ) (1) where Ai(t), A2(t) and A3(t) represent acceleration vectors on the y-z plane, z-x plane and x-y plane, respectively. A succession of fifteen to thirty data set which accords with the duration of one gesture are used for gesture recognition. ICMC Proceedings 1996 421 Sawada et al.

Page  422 ï~~Eleven kinetic parameters shown in table 1 are extracted from one sequence of each set of projection vectors in realtime for gesture recognition. These kinetic parameters were selected not only to satisfy the sufficient conditions for the discrimination of gestures to be recognized, but also to realize realtime processing. Table 1 Kinetic Parameters from Acceleration Data Pd Change of Force Given as Time Differences of Vectors Pg Rotating Direction Given as Vector Products among Vectors Pr Directional Characteristics Given as Aspect Ratio of Circumscribed Rectangle PaO Characteristics of Directional Distribution Pa7 of Vectors All the kinetic parameters are calculated in realtime from the sequential vectors Al(t), A2(t) and A3(t) independently and are referred to for gesture recognition. 2-2. Dataglove Although the acceleration sensors are useful in detecting dynamic movements, the static shapes of hands sometimes seems to contribute to meanings of the gestures. Therefore a dataglove is employed in this system, which allows the realtime analysis of three dimensional motion of hand and finger geometry. By distributing five bend sensors along the five finger parts of the glove, the device outputs roughly proportional values to finger joint angles. Furthermore, a magnetic position sensor fixed on the back side of the wrist area gives three dimensional positions x, y and z and orientations which are azimuth, elevation and roll attitudes. We defined seventeen characteristic parameters Ro - R16 for the analysis of hand figures and movements. Table 2 shows their meanings. Table 2 Characteristic Parameters from Dataglove Po - P4 Joint Angles of Five Fingers P5 - P7 Three Dimensional Positions of Hand P8 - PlO Mean Position Changes in Latest Five Samples P11 - P13 Mean Absolute Position Changes in Latest Five Samples P14 - P 16 Three Orientations At first, limiting values of the parameters are set so that the system suits a performer's physique. The performer is required to input the limiting positions (up, down, left, ight, front and back), velocities (vertical and horizontal) and finger flexing (grasping and stretching). Then in performances, movement data from the glove is normalized by the limiting values and transformed into characteristic parameters. 3. Gesture Recognition In the actual performance, gesticulation usually given by a conductor's left hand is not only regarded as musical performance directions, but also as commands for the system control. The gesture recognition uses both the kinetic parameters P's obtained from the acceleration vectors and the characteristic parameters R's from the dataglove. The realtime recognition is made by comparison with standard data acquired in the gesture learning phase to make the system suitable for the individual users. In the learning phase, a performer inputs gestures to be recognized M times each. Then the average E 9 and the standard deviations/t4 of the kinetic and characteristic parameters are calculated for each gesture g as shown below, and stored as the standard pattern data in a private gesture database. E M (2) - ( - _E )2 V: Parameter Values (3) a': P's andR's In recognition phase, the kinetic parameters V. are extracted and the normalized distance e8 is calculated for each standard pattern data as below, e8 = Ea= X( g E:2 a a /tea~) V': Parameter Values (4) Then the minimum e8 is selected as a candidate. In case it is smaller than a predetermined threshold value Th shown below, the result of the gesture recognition is confirmed. Th} Th = ain { ( V _'- E at ' ) i a (/a8j)2 for alli, j gi, g:Gesturei,j (5) 4. Musical Environment with Acceleration Sensors and Dataglove 4-1. Performance System Controlled by Human Conductor This system realizes a conventional musical performance directed by a human conductor. By sharing common musical knowledge with a conductor, the system reacts the conductor's gesticulation in realtime. Figure 1 shows the system configuration together with the schematic diagram. Sawada et al. 422 ICMC Proceedings 1996

Page  423 ï~~Figure 1 Performance System Equipped with the acceleration sensor 1 fixed on a right hand, a baton-movement-analysis unit calculates tempo and volume indicated by the performer from frequency and amplitude of acceleration, respectively. A conductor gives performance directions while facing performers, so we extract tempo and volume information from sequential acceleration vectors Ai(t) in the y-z plane. Beat points are detected in real-time according to the magnitude and phase patterns shown in the equations below. I4(t)I = ay(t)2 +a,(t)2 (6) argA(t) = tan"{az(t) / a,(t)} We compared the proposed acceleration method with the measurement from the baton trajectories using image processing employed in the former research [Morita et al., 91]. For position detection of a baton in image frames, we attached a LED marker with the acceleration sensor and took the image every 1/30 seconds. Figure 2 shows the vertical position change of the LED marker in images together with acceleration data. The method of image processing causes the delay of approximately 0.1 to 0.2 seconds, and also has a limitation that the performer has to handle his baton within the view field of the camera. Furthermore, it constrains the time resolution up to 30 Hz, on the other hand the acceleration sensor allows users to select the data acquisition interval just by setting the A/D conversion frequency. Consequently, the acceleration sensor makes it possible not only to simplify the system to detect the conductor's hand movements, but also to realize more natural real-time tempo tracking. A gesture-comprehension unit, on the other hand, obtains musical expressions from the acceleration sensor 2 and the dataglove on the left ) hand. The performer needs to register in advance some gestures he may use in the performance as standard data, such as crescendo (moving a palm I upward, for example), vibrato (vibrating a hand), pianissimo (placing a straight index finger on mouth), instrument selection (pointing an instrument direction). These gestures are associated with performance expression commands and stored in the system manager., f! 1 o fS e n o r A=4, 0 1 2 3 4 s ec 5 Figure 2 Tempo Detection by the Two Methods In the performance, while observing the tempo indicated by the human conductor, the system manager predicts the future tempo according to a tempo prediction model considering mutual interaction between the human system and the machine system. In the same time the system obtains musical expressions on the left hand. The system manager integrates the outputs from these two units by the help of its musical knowledge, and creates performance expression commands for the MIDI instrument which should be sent out to the performance-generation unit. The unit plans the temporal schedule of the performance synchronizing with the baton movement. 4-2. Musical Environment Controlled by Gesture Two levels of sound perception are considered in construction of the musical environment system. They are a fundamental level and an expression level. The fundamental level deals with basic sounds factors such as volume and localization. These factors have close relations with our sound perceptivity having been acquired as we glow up. For example, we expect a big sound to be produced when we move with large acceleration, and we expect a sound to be generated at the position we point with a finger. On the other hand in the expression level, parameters of performer's movements are associated with sound parameters like timber, pitch, duration, velocity, localization, reverb and so on. The system consists of two acceleration sensors, ICMC Proceedings 1996 423 Sawada et al.

Page  424 ï~~a dataglove, a MIDI instrument and a pair of speakers as shown in figure 3. A performer wears the dataglove with the position sensor on the left hand, and fixes the two acceleration sensors at any points of his body. During a performance the sensorintegration unit continues to receive sequential data from the sensors which are transformed into kinetic and characteristic parameters of performer's movements. According to the request signal from the system manager, the unit sends out the parameters to the kinetics-comprehension unit which has two different functions. One is to recognize gesticulation which will be used as control commands such as system start, stop, mode selection (tuning or performance) and command selection in the tuning mode. It realizes a new man machine interface with gesticulation. And the other is to extract parameters which is directly associated with sound parameters in the performance phase for the use in the expression level. In the tuning mode, the performer define the relations between gestures of the left hand and control commands so that he can direct the performance with his own gestures. Furthermore, he assigns kinetic and characteristic information extracted from body movements to sound parameters selected in the expression level. Sound factors assigned in the fundamental level can also be associated with kinetic information and controlled with gesticulation. create his favorite sound environment using gestures and make an improvisational soundscape in realtime. 5. Conclusions In this paper we proposed two musical environment systems together with the gesture analysis algorithm using two 3D acceleration sensors to measure the force of gesticulation and a dataglove to sense the position and shape of a hand. The proposed systems were tested by both amateur and professional performers and obtained better results comparing to our former systems. Exactly speaking, the force measurement using the acceleration sensor is an indirect method and produce inevitable errors, because the actual acceleration is determined not only by the applied force but also by other physical factors such as viscosity and mass of the body parts. BioMuse [Tanaka, 93] may be more direct force sensing system to measure EMG although it needs special apparatus. The use of a datasuit may make the system more powerful. We however consider that small and easy-wearable devices are more promising for musical applications. We are now working to improve the wireless sensing systems to allow users to act freely for more flexible performance. Acknowledgments This work was supported in part by Grant-in-Aid for Scientific Research No. 07244221 from the Ministry of Education, Science and Culture, Japan. References [Harada et al., 93] T.Harada, A.Sato, S.Hashimoto and S.Ohteru, "Real Time Control of 3D Sound Space by Gesture", Proc. ICMC, pp.85-88, 1993 [Kanamori et al., 95] T.Kanamori, H.Katayose, Y.Aono, S.Inokuchi and T.Sakaguchi, "Sensor Integration for Interactive Digital Art", Proc. ICMC, pp.265-268, 1995 [Morita et al., 91] H.Morita, S.Hashimoto and S.Ohteru, "Computer Music System that Follows A Human Conductor", IEEE Computer, Vol.24, No.7, pp.45-53, 1991 [Sato et al., 91] A.Sato, T.Harada, S.Hashimoto and S.Ohteru, "Singing and Playing in Musical Virtual Space", Proc. ICMC, pp.289-292, 1991 [Sawada et al., 95] H.Sawada, S.Ohkura, and S.Hashimoto, "Gesture Analysis Using 3D Acceleration Sensor for Music Control", Proc. ICMC, pp 257-260, 1995 [Tanaka, 93]A.Tanaka, "Musical Technical Issue in Using Interactive Instrument Technology with Application to the BioMuse", Proc. ICMC, pp.124 -126, 1993 Figure 3 Musical Environment In the performance, at first with control commands, a performer selects initial sounds and their locations in the imaginary space. Then he can Sawada et al. 424 ICMC Proceedings 1996