Page  00000407 Interactive Multimodal Mobile Robot for Musical Performance Kenji Suzuki Takeshi Ohashi Shuji Hashimoto Dept. of Applied Physics, Waseda University 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan Tel. +81-3-5286-3233, Fax. +81-3-3202-7523 {kenji, take, shuji} Abstract This paper proposes an interactive multimodal artistic environment for communication between human and robot. The developed system is a sort of active aids for musical performance that allows users to get feedback for emotional activation in terms of sound, music, image and motion using a mobile robot, audio-visual input and output. Moreover, it is effectively used for musical performances with motion because it can move around with performer while generating sound and music according to the performer's movement and environmental sound and image. The system hardware includes microphones, speakers, CCD camera, four strain-gage sensors and Macintosh G3 computer all of which are equipped on an omni-directional mobile platform. While, the software consists of a sound/image synthesizer and analyzer, coordinator. 1. Introduction The goal of our work is to explore paradigms of interaction between humans and robot in the framework of museal exhibitions, theatre and art installations. With regard to interaction metaphors such as multimodal interaction system, many researchers so far have reported experimental result. The authors have also reported robotic agent that has a computational model of artificial emotional states, which has ability of self-organizing and adaptation [1]. On the other hand, many studies about the musical interaction between human and machine have been also proposed so far, for example, [2]. However, although human makes action with his body, gesture while making sounds, few works have been reported about the autonomous mobile robot for musical performance. Based on above considerations, we propose to equip musical instrument with an autonomous mobile ability for computer music performance in real world. The mobile robot on wheels performs as an agent capable of communicating by means of several channels including sound and music, visual expression and movement. The system software is based on an agent architecture for real-time multimodal interaction. The user can easily associate the agents on MAX Opcode GUI. The module of sound analysis mainly extracts the pitch and velocity of the input sound including human voice and instrumental sound at the rate of 100ms. The module of image analysis extracts the color composition and temporal structure of the input image including environmental scene and human gesture at the same rate of sound analysis. The sound, image and robot motion are compiled from the composition patch referring the outputs of the behavior coordinator. The input data from the four strain-gage sensors, sound a sensor driver, a motion controller and a behavior and image analysis module are mapped into stimuli to activate the internal process components of the agent. These components produce the robot behavior including the displayed image and MIDI sounds, which can be influenced by the performer's movement, environmental sounds and image. It should be noted that communication data are exchanged through MIDI channel among main controller and others. Therefore the system works as a sort of reflector to create an acoustic and visual space around the moving instrument. Moreover, the robot can display the refractive motion according to the context of performance to create the human-robot collaborative performance on stage. This paper consists of three sections. Firstly, the system architecture and hardware implementation are introduced. Secondly, the software configuration of the developed system is described. Finally, discussion and conclusion are introduced. 2. System Architecture The overall of the developed system is shown in figure 1 on the next page. The system consists of four components: mobile robot, motion interface, main controller and output devises. In this work, we use a robot that is an omni-directional mobile platform [3]. In addition, a motion interface called "plate" is equipped in order to receive external force information. The interface "plate" enables the locomotion simply by weight shift and force application [4]. Besides, CCD camera and two microphones are equipped in order to get environmental visual and auditory information, All above instrument and others including Mac G3 computer and audio speakers are equipped to make the mobile robot autonomous. We have constructed useful modules for motion devises on Opcode MAX architecture. The modules ICMC Proceedings 1999 -407 -

Page  00000408 * CCD Color Camera -/ ~ Microphones Audio Speakers Macintosh G3 Moe pMIDI Controller - MIDI Synthesizer ' 75 Motion Interface gWBS- igy Mobile platform Figure 1. System Architecture Overall communicate with robot and motion interface through MIDI controller to exchange serial and MIDI data. Therefore, we have realized a effective musical platform that users can easily to associate with each other including not only music generation but also movement of mobile robot. 3. Input Modules of the Developed System To make the interaction with human, the system integrates three kinds of communication channels: acoustic (music and sound), movement (the behaviors of robot), and visual one. Firstly, we describe the input module that consists of three modules. 3.1 Motion Interface Module First module is an action receiver that has the role of gathering data from motion interface at the rate of MIDI (31.25kbps). The interface can obtain the center of gravity data of objects on the mobile robot, which is measured by four strain-gage sensors bonded under the plate. If a user gives force to the object on the plate, four strain-gage sensors bonded under the plate measure the center of gravity. Since the strain-gage element changes its resistance value according to the applied load, the applied load can be measured as change of voltage value, and the center of gravity calculated with those voltage values using a single chip computer. The module on MAX receives data list of the center of gravity, and calculates the desired direction. 3.2 Sound Input Module Second one is a part of sound input. This module can obtain auditory information with two microphones equipped with both sides of the mobile platform. From this, the system allows users to interact with robot by using his voice and handclap. Following three submodules are developed on MAX patch. Volume Tracking: The sound from each microphone is measured with eq. (1), where N and A denote the number of samples per a frame, and the maximum amplitude, respectively. V[db] -O10logo - x(t +nAft) / ND-. 2^O L ^ (1) In addition, simple sound localization has been also realized. By using the difference of amplitudes from each microphone, the system can roughly estimate the location of sound sources. It is not easy to detect exact sound sources with only two microphones. However, because robot can turn his direction toward measured target, it helps to capture the exact sound sources more precisely. Pitch Tracking: We adopt the cepstrum method as the identification of fundamental frequency. The method is to use harmonic structure obtained by Fourier transform at the high range of cepstrums. In order to decrease sampling errors, we regard the fundamental frequency as a value of nth peak of frequency divided by n as shown in eq.(2). fi-wf flI / I (2) Using this method, we realized an object to extract the specified pitch information of environmental sound including human voice. As well as handclaps, singing voice can be given toward robot as one of auditory information. For example, when the system obtains human's voice and is able to capture the pitch information, the backing scale will be changed to allow users to control the high/low chord with his voice. Tempo Tracking: While users give handclap, this sub-module calculates the tempo by extracting the peak of volume data. The system can synchronize the generated music with the estimated tempo in real time. The player usually has the flicker error to make tempo. The module renews the next tempo to take the flicker into account by the experimental threshold so that audience can listen to the music agreeable. Figure 2 shows an example of tempo tracking with two microphones. Since the sound is captured every 10 ms, time difference between each microphone is not found in this experiment. In Figure 2, left figure represents -408 - ICMC Proceedings 1999

Page  00000409 the sound input from a microphone equipped with the left side, while right figure represents the sound input from right one. X-axis of each graph represents time t, while Y-axis represents the volume of input sound source. i m I Stereo Input L Stereo Input R Figure 2. Tempo Tracking Object on MAX 3.3 Camera-based Sensor System The third one is a camera-based sensor system to obtain environmental information and human gestures. Moving image data from CCD camera are calculated to get color information such as RGB data, hue, saturation and lightness every 100ms. From this, the spatial and temporal features are also obtained such as the density of edge, pattern data, and blinking information [5]. The system allows human to communicate with the robot with the aid of a small symbolic source such as LED light, or color flags. Figure 3 shows an example of the pattern detection. The left window represents original input image from CCD camera, while right window shows the detected pattern. As shown in both of windows, input image is divided into 3x3 area. In order to obtain the local information, we also calculate image features in each area. From this, users can give sign to robot with the location of detected symbol. I::ll.Video:~.lg Video Tracker Figure 3. Image Tracking Object on MAX 4. Output Modules of the Developed System The output module consists of three parts: sound and music generation, control movement of robot, and image controller. Each output module operates under influence from input modules. They calculate output parameters from input parameters of external environmental information through two kinds of process component: long-term and short term reactive ones. The details of music creation are described as follows and the data flow is shown in Figure 4. 4.1 Omni-Directional Mobile Robot Control First of output modules is a part that controls the robot. We prepared" tro types of control: active and passive reaction. The former one works as a sort of robot behavior to chase human. For example, with the aid of symbolic flags, robot can detect human and simply follow the symbolic flag. Then, this chasing reaction also is caused by the input from the sound detection module. By using data of sound localization, robot can turn and change the direction by itself. While, the latter one is a kind of tool that human can use to control robot. When he pushes the robot by his hand, robot moves itself along the direction that the force is given. In other words, this module allows human to give his intention with his action. The communication data format is the same as MIDI configuration, and sent to external MIDI Controller (Motion MIDI). The hardware translates MIDI format into the control of mobile robot. 3.5 Music and Sound Creation Second one is a part of music and sound generation. With regard to sound and musical features, we prepared some basic modes of music generation. Based on the prototype of generated music, sound and musical features can be modified according to the input from sound, image and motion modules. By using the key information such as scene changing and applied force, the system changes the current mode of music generation. We describe a few examples of music generation mode as follows. Rule-based Music: We are familiar with the music created based on the musical theory. Also in the field of computer music, the structure of chord progress and melody harmonizing has been often applied from many kinds of musical theory. In this mode, we simply adopted the theoretical music generation to make the simple chord progress. Using typical five patterns based on C chord, basic melody is also composed within possible notes for each code. Stochastic Music: We prepared another kind of music generation called mode of stochastic music. Up to now, several researches about stochastic music generation have been reported, for example [6]. In this work, every beginning phrase of starting this mode, the note set is decided by input data from sound/image analysis. Then, the chord progress and melody can be created with random value within the note sets. The four kind of note sets we prepared are: major / minor pentatonic, and Japanese major / minor scale. Pre-recorded Music: We prepared 72 sets of drum pattern. These patterns have 6 different tempo and appear so that the rhythm could correspond the change of image and sound input. The backing chord progress is also generated with the tempo based on musical theory. While, some other MIDI files of melody are also used. ICMC Proceedings 1999 -409 -

Page  00000410 ShorttemLProcesses Figure 4. Data flow of the developed system The musical features such as timber, pitch, volume, tempo and style of music generation are modified by users in real time. The mobile robot becomes active when it is put onto the environment where robot and humans perform. Then, all of output can be changed continuously, and also be modified according to both acoustic and visual features of the environment. Up to now, the experiments prove us that the system is an effective and interesting interaction between users and robot by using multimodal communication channels. 3.6 Image Output Module Thirdly, the system also allow users to see the data flow and the relationship between input data and generated music on the display of MAX architecture. At present, we have just a symbolic color expression that changes according to the "emotional" state of the robot influenced by environmental state around robot. Further consideration is arisen that using a kind of symbolic image will be useful to show the state of robot. 4. Conclusion and Future Work We have introduced an interactive multimodal environment for musical performance with an autonomous mobile robot. The developed system enables to reflect environmental visual and auditory information around robot to create sound, music, and image. Moreover, it should be noted that the motion interface embedded with the system can receive users' action toward robot. From this, since users can give the intention to robot with his action, a new style of possible music generation can be provided. On the other hand, the proposed musical environment is constructed under MAX Opcode interface. Therefore, users can easily associate the relationship between input and output modules. Then, since all equipment communicate with each other through MIDI data, it is easy to connect other possible mobile robot, or output devises with MIDI link. In this work, we have much paid attention to reactive responses in communications between human and robot. Recently we consider adding agents as capable to reflect users' favorites. This study concerns with a project of entertainment robot. Along this direction of study and from artistic point of view, the familiar design of whole system is also important for users, performers and audience. While, further consideration is arisen to put robot hands to communicate with humans in order to increase input capacity, and to make the performance more flexible and expressive. References [1] Kenji Suzuki, Antonio Camurri, Pasqualino Ferrentino, and Shuji Hashimoto: "Intelligent Agent System for Human-Robot Interaction through Artificial Emotion", in Proc. of 1998 IEEE Intl. Conf. on System, Man and Cybernetics, USA pp. 1055-1060 (1998) [2] Naoyuki Onoe, Dingding Chang, and Shuji Hashimoto: "Background Music Generation Based on Scene Analysis", in Proc. of International Computer Music Conference '97, pp.361-362 (1996) [3] Shigeo Hirose, Shinichi Amano: "The VUTON: High Payload High Efficiency Holonomic OmniDirectional Vehicle", in Proc. of Intl. Symposium on Robotics Research, pp. 253-260 (1993) [4] Jun Yokono, Shuji Hashimoto: "Motion Interface for Omni-Directional Vehicle" in Proceeding of 7h International Workshop on Robot and Human Communication, pp. 436-441 (1998) [5] Shogo Takahashi, Kenji Suzuki, Hideyuki Sawada and Shuji Hashimoto: "Music Creation from Moving Image and Environmental Sound", in Proc. of International Computer Music Conference 1999 (1999) [6] L. Hiller, and L. Isaacson: "Experimental Music", McGraw-Hill Book Company, Inc. (1959) -410 - ICMC Proceedings 1999