Page  00000001 Sounds in Hands -A Sound Modifier Using Datagloves and Twiddle Interface - Hideyuki SAWADA*, Naoyuki ONOE and Shuji HASHIMOTO Department of Applied Physics, School of Science and Engineering, WASEDA University * Research Fellow of the Japan Society for the Promotion of Science { sawa, onoe, shuji} Abstract Although various ways of sound production have been actively studied and reported so far, the development of human-machine interface is not sufficient for the translation of emotional feelings into sounds. This paper presents two sound generation systems together with the direct manipulation devices to measure twiddling gestures. One employs a pair of datagloves for the gesture acquisition, and the other adopts a specially designed equipment which measures grasping forces caused by user's twiddling manipulations. The approach is considered to provide not only a new sound generation techniques in music scene but also a flexible manmachine interface. 1. Introduction Not only in a music scene but also in human communication, sounds play an important role. Sounds as compositional pieces of music are considered to be media which directly affect our emotional feelings. On the other hand, we communicate with each other by using vocal sounds produced by our vocal organs. In the primary level of human communication, the vocal sounds seem to have been used as the communication media together with body gesticulations in non-verbal ways. Such sounds were evolved into higher level as words and languages. Principal characteristics of a sound are determined by the fundamental frequency and spectrum envelope. The former is the characteristic of the source sound generated by a vibrating object and the latter is operated by the work of a resonator. In human vocalization, for example, the vibration of vocal chords generates a source sound, and then the sound wave is lead to a vocal tract which works as a filter to determine a spectrum envelope' [2]. On the other hand, most of the conventional acoustic musical instruments have the fixed constructions of the resonators, which characterize their own timbres. Various ways of sound production have been found and actively studied so far. Algorithmic syntheses have taken the place of the analogue circuit syntheses and became the popular techniques[3][4]. Granular syntheses and physical model based syntheses are the typical techniques, in addition to the conventional FM synthesis and sampling sound sources, and are expected to provide sound producers and musicians with new kinds of sounds and sound materials s5. Several problems, however, still remain, such as small varieties of produced sounds, difficulties in handling of sound parameters, calculation time that causes slow response. A flexible sound generation system which is free from such problems and complicated procedures will be needed to generate rich sounds in realtime to express Kansei(emotional) feelings for musical performance and human communication[6] We are studying the sound generation techniques together with the developments of the man-machine interface devices. This paper introduces two experimental sound generation systems in both of which sounds are generated in realtime by the direct manipulations of interface devices. One employs a pair of datagloves for the gesture acquisition[71, where an algorithmic sound synthesis using an arbitrary source sound and a virtual resonator is implemented. The other adopts a specially designed equipment which measures grasping forces caused by user's twiddling manipulations. MIDI interface is presented in this system to use varieties of commercially available sound sources. 2. Sounds, Music and Gestures Since sounds and music are both logical and mathematically well-founded, they are suitable for applying the computational technologies. At the same time, they have the great aspect as to be seen as "arts" evaluated by human emotion. Audience evaluates musical sounds according to their emotions, and sound producers and performers produce sounds with their emotional senses such as noncommittal expressions and gesticulations. Especially in the musical performances, a conductor directs a musical performance by his baton movement, and a musician plays a musical instrument which can be regarded as a kind of equipment to translate body movements into sound. An impromptu performance is also an effective way to express the performer's emotions and musical expressions directly. We have so far paid attention to the

Page  00000002 relationship between musical sounds and emotional gesticulations [8] [][1O], and constructed sound generation systems using twiddling gestures. 3. Sound Modification by Datagloves 3-1. System Configuration The manipulations of a fundamental frequency and envelope characteristics contribute to the synthesized sound impre ssions. We took into account these characteristics of sounds and developed a new sound modification system manipulated by a pair of datagloves in realtime. The system is able to accept an arbitrary source sound to be modified in a virtual resonator implemented in a computer. The system is equipped with a microphone, a pair of datagloves, a speaker, an effector and a computer, to realize a realtime sound synthesis and modification. The system configuration is shown in figure 1. A source sound input through A/D port is lead to the resonator which works as a filter to determine a spectrum envelope. The modified sound from the resonator is sent out from the DIA port and taken into the effector. The effect functions such as reverb, phaser, compressor and filter gain are manipulated by the MIDI signals. bend sensors along the five finger parts of the glove, the device outputs almost proportional values according to the finger joint angles. The geometries formed by the ten finger angles are corresponded to the 10 cross-sectional areas of the virtual resonance tube which are used for the calculation of the resonance filter. 3-3. Construction of Virtual Resonator The resonance filter is realized by the 10th order PARCOR synthesis algorithm. The PARCOR coefficients {kn} are corresponded to the 10 cross-sectional areas {Sn} (n=1-10) of a resonance tube defined by the geometries of the datagloves as shown below. s, -s, The PARCOR filter is realized by calculations shown below: Am (D) = Ami (D) + kmiBm (D) Bm,,i(D) = D(Bm,(D)U kmiAm,(D)) (1) the recursive (2) (3) A0 (D)=1 B0(D) = (m=1,2.10) Source Sound Virtual Resonator Effector Effect Parameters Speaker Figure 1 System Configuration of Sound Generation Using Datagloves 3-2. Manipulation Device of Virtual Resonator A pair of datagloves are employed in this system, which allows the realtime analysis of finger geometry to define the form of the virtual resonator. By distributing five (Sn 1: Cross-Sectional Areas of Virtual Resonator {kn): PARCOR Coefficients Figure 2 Schematic Diagram of Virtual Resonator

Page  00000003 where A,,(D) and B,,(D) are the forward and backward prediction operators, respectively, and Dl presents the i-th retardant operator defined as D'xt = xti. (4) The schematic diagram of the virtual resonator is shown in figure 2. The system is able to accept various kinds of sound for the input, including instrumental sounds, natural sounds and human voice, to be arranged in different flavors. A waveform example of an experimental input sound and the output sounds are shown in figure 3. The output sounds 1 and 2 in figure 3(b) are generated from the virtual resonators which have the PARCOR coefficients 1 and 2 in figure 3(a), respectively. The system is considered to provide a new type of sound synthesis and modification driven by emotional twiddling gestures. instruments are constrained by the physical structures of the sound generation mechanisms, the structures of the interface of electric sound generation systems can be designed much more freely"11]. Well-designed devices would be suitable for performers to play with their feelings or KANSEI. We present a new MIDI instrument, called GraspMIDI, which can be directly controlled by the performer's grasping force. 1 0.5 S0 -0.5 -1 0.1 0.05 0 0 -0.05 -0.1 -0.15 - """- Output 1 [-- Output 2r [Order] Balloon: Diameter: 30 [mm] 30 [mm] Init.Pressure: 1.05 [kg/cm2] Pressure Sensor: FPM-50PG Range: -1 to 3.515 [kg/cm2] 14 1 12 [mm] Output: 60 to 140 [mV] Figure 4 P-Unit: Balloon Filled with Air is Attached to Cork of Pressure Sensor The sensory part of GraspMIDI is shown in figure 4. The force-sensing device, P-Unit, is equipped with a pressure sensor and a small balloon. The pressure sensor used in the P-Unit has the size of 14x17.5x12[mm] and 1.2[g] weight, and is able to measure from -1 to 3.515[kg/cm2] almost linearly. The diameter of the balloon is about 30[mm]. The P-Unit measures the internal pressure of the balloon by the electric bridge circuit consisting of four resistors as shown in figure 5. The unit is small enough so that several of these can be held altogether in one hand. (a) PARCOR Coefficients?-, + -o To PC (b) Sound Waves Figure 3 Examples of (a)PARCOR Coefficients and (b)Output Sound Waves 4. Sound Generation by GraspMIDI 4-1. Configuration of P-Unit MIDI plays one of the most important roles to control sound sources in the computer music world. Currently, the handling devices of the MIDI signals are the imitations of the conventional musical instruments such as a piano-type keyboard, a guitar, a trumpet and so on. Although the human-machine interface functions of the conventional Figure 5 Bridge Circuit Figure 6 GraspMIDI: 4 P-Units Packed into Spherical Silicon Ball 4-2. Construction of GraspMIDI As an instrumental equipment, we use four P-Units packed into a spherical silicon ball as shown in figure 6. By using this instrument, the MIDI event messages are controlled by performer's grasping forces in realtime. The sensor information of the four P-Units is taken into a computer through four A/D ports, and is associated with the MIDI events. Two ways are prepared to associate the P-Unit information with the MIDI events, which are defined by the

Page  00000004 performer in advance. The first way is a direct mapping between the each PUnit information and the MIDI events. For example, the stronger force given to the P-Unit A generates the higher tone, the stronger force applied to the P-Unit B gives the more pitch bends, etc. This may not be the intuitive way for the performer to express his KANSEI, since the each PUnit has individual meaning. The second way realizes complex mappings between the four P-Units and MIDI events, which means the each PUnits are related to each other. For example, the stretching or squeezing action triggers the pitch modulation, and the twisted action changes the pan position of the sound. A neural network can be employed to realize the mapping between grasping force patterns and MIDI signals as shown in figure 7. The mapping system is a sort of force-to-sound translator to preserve the performer's Kansei information. In the present experiments, we introduce a multi-layered neural network to associate the force vector patterns from the P-Units with the MIDI commands according to the training sets which are given in the learning phase. By using these ways, the performer is able to express his KANSEI intuitively. musical performance systems using a gesture acquisition technique and a new force measurement device. A performer is able to enjoy produced sounds and music through his own gesticulation without any special knowledge of musical instruments. The proposed input devices which acquire human grasping gestures can be employed not only in sound manipulation and music production, but also in flexible human-machine interface driven by the emotional gestures. We are planning to test the proposed systems by both amateur and professional performers. These systems can be combined together to create a virtual musical space with a high level musical intelligence. In such kind of environment, a musician would be able to feel as if s/he were working with a variety of instruments and players that have excellent abilities and respond to musician's emotional feelings. We are now working to improve the wireless sensing systems to allow users to act freely for more flexible performance. References [1] H.Sawada and S.Hashimoto "Adaptive Control of a Vocal Chord and Vocal Tract for Computerized Mechanical Singing Instruments", Proc. of International Computer Music Conference, pp.444-447, 1996 [2] J.L.Flanagan, "Speech Analysis Synthesis and Perception", Springer-Verlag, 1972 [3] X.Rodet and G.Benett, "Synthesis of the Singing Voice, Current Directions in Computer Music Research, PIT Press, 1989 [4] Ph.Depalle, G.Garcia and X.Rodet, "A Virtual Castrato", Proc.ICMC, pp357-360, 1994 [5] J.O.Smith III, "Viewpoints on the History of Digital Synthesis", Proc. ICMC, pp.1-10, 1991 [6] S.Ohteru and S.Hashimoto, "A New Approach to Music Through Vision", Understanding Music with AI, AAAI Press, pp.402-410, 1992 [7] H.Sawada and S.Hashimoto "A Study of Gesture Recognition as Human Interface", Proc. Interaction '97, IPSJ, pp.25-32, 1997 [8] H.Morita, S.Hashimoto and S.Ohteru, "Computer Music System that Follows A Human Conductor", IEEE Computer, Vol.24, No.7, pp.45-53, 1991 [9] P.Hartono, K.Asano, W.Inoue and S.Hashimoto, "Adaptive Timbre Control Using Gesture", Proc. ICMC, pp.151-158, 1994 [10] H.Sawada, N.Onoe and S.Hashimoto "Acceleration Sensor as an Input Device for Musical Environment", Proc. ICMC, pp.421-424, 1996 [11] H.Sawada and S.Hashimoto "Gesture Recognition Using Acceleration Sensor and Its Application for Musical Performance Control", Trans. IEICE Vol.J79 -A No.2, 1996 MIDI Event Generation Units MIDI Outputs MIDI Instrument Figure 7 Force-to-Sound Translator The GraspMIDI can be used as a musical instrument with MIDI sound sources, and also as an active sound filter with MIDI sound effectors. Furthermore the GraspMIDI is able to be used not only for MIDI, but also as a humanmachine interface device with a computer. 5. Conclusions Human-machine interface is very important in musical performance systems. It may often be a multimodal interface integrated with a vision system, a sound system and a tactile system. In this paper we introduced two