Page  00000001 A real-time platform for interactive dance and music systems Camurri Antonio, Coletta Paolo, Peri Massimiliano, Ricchetti Matteo, Ricci Andrea, Trocca Riccardo, Volpe Gualtiero, Laboratorio di Informatica Musicale ( DIST - University of Genova, Viale Causa 13 I-16145 Genova Abstract We present recent developments on EyesWeb, a research project aiming at providing models and tools for integrating movement, music and visual languages in a multimodal perspective. The project includes a hardware and software platform to support the user (i) in the development and experimenting of computational models of expressive content communication and of gesture mapping strategies, and (ii) in fast development and experiment cycles of interactive performance setups. An intuitive visual programming language allows to map - at different levels - gestural components with integrated music, visual, and mobile scenery. System original features, such as its realtime patch scheduler supporting active modules and userdefined datatypes, are shortly described in the paper. The EyesWeb platform consists of a serie of integrated hardware and software modules which can be easily interconnected and extended by users. 1. Introduction Several systems and applications are currently used for interactive performance, e.g., the several applications using Max and Supercollider, or a few special-purpose systems based on videocameras mapping low-level image data to MIDI. Several conceptual basic issues remain currently open. Typical state of the art applications concern quite simple cause-effect mapping of (low-level) movement into MIDI parameters, in many cases characterised by weak control strategies, no memory, poor interaction metaphors, and lowlevel interaction design. The problem of gesture mapping as well as the individuation of effective gestural integration in a multimodal perspective is an important goal of current research. In (Camurri 1995; Camurri and Leman, 1997) we proposed new interaction metaphors as an attempt to go beyond the "musical instrument" metaphor: from orchestra direction, to metaphors like artificial potential fields and "maps" (Camurri et al 1994), to dialogue based on agent models embedding models of comunication of artificial emotions (Camurri and Coglio 1998; Camurri and Ferrentino 1999) and KANSEI (Camurri, Hashimoto et al. 1998, 2000). In this paper we propose a flexible and powerful platform to support research in these directions, from both an artistic and scientific perspective. The motivation for designing an original platform derives from the requirements we collected from several artistic projects involving real-time interaction we participated in the last years, and from previous research projects (Camurri et al 1986) including HARP (Camurri et al 1994, 1995). For example, as for real-time movement analysis, a main focus is on the extraction of parameters on expressive content in the performance. Existing systems are rather limited from this point of view. Ideally, we want a system able to distinguish the different expressive content from two performances of the same dance fragment. The system is designed to support this and other research issues. 2. The EyesWeb software The EyesWeb platform consists of a number of integrated hardware and software modules which can be easily interconnected and extended. The EyesWeb software consists of a development environments and a set of libraries of reusable software components which can be assembled by the user in a visual language to build patches as in common computer music languages inspired to analog synthesizers. A patch can be used as a module in a higher-level patch. The software runs on Win32 and is based on the Microsoft COM standard. An example of a simple EyesWeb patch is shown in fig. 1. The patch shows a typical application on movement analysis based on videocameras. Let us shortly examine the patch modules of fig. 1 (from left to right). A frame grabber module sends its output to a splitter module: it takes the input signal and separate the odd and even lines from each frame (this because this patch uses the EyesWeb Mpx hardware to use two cameras in a single video input - see below). The output from Mpx goes to two identical subpatches: in figure 1 only one is shown. Then a simple mechanism based on background subtraction is used, followed by a "binarizer". The output of the two-level "silhouette" is sent to feature extraction modules trying to extract barycenters. In the upper part of Fig.1 user interface widgets available also at run time are shown. Fig. 2 shows the dialog window for the barycenters module. In fig. 3, the outputs from two test modules show the results of the realtime analysis from this patch. MIDI output modules end the chain of this patch. Besides the "silhouette" model, further graphical representations of movement parameters have been developed (3d models and in general visual metaphors). EyesWeb is a multi-thread application based on the Microsoft COM standard. At run-time, an original real-time patch scheduler supports active modules and user-defined datatypes for link types of data streams among modules. A

Page  00000002 patch is automatically splitted by the scheduler into several threads according to the topology and the presence of active modules. Active modules are software modules used in patches characterised by a special behavior: they have an internal dynamics, i.e., they receive inputs as any other kind of modules but their outputs are asynchronous with respect to their inputs. For example, an "emotional resonator" able to react to the perceived expressive content of a dance performance, embedding an internal dynamics, may have a delay in activating its outputs due to its actual internal state, memory of past events. This is one of the memory mechanisms explicitly supported by the system to implement interaction metaphors beyond the "musical instrument". New datatypes used to communicate data (e.g. on expressive content) among modules in a patch can be defined by the user. Data links come from two different root types: signals and controls. Signal type taxonomies can be defined by the user. Multiple versions of modules (versioning mechanism) is also supported by the system, e.g., allowing the use in patches of different versions of the same datatype or module. The compatibility with future versions of the systems, in order to preserve the existing work (i.e. patches) in the future is also supported. An open library of basic module libraries include input and movement capture, signal conditioning, filters, active modules, observers, and output modules. Movement capture and input modules have been developed for different sensor systems: both environmental sensors (e.g. video cameras) and the wireless on-body sensor technology we developed (see below in the paper). Low-level filters (e.g. preprocessing, signal conditioning, etc.) as well as mediumlevel filters (e.g. the module to extract the barycentre coordinates of a human figure; the module for evaluating equilibrium) are available. Observers can be high-level filters, active modules, patches able to extract high-level information, typically concerning expressive content. Output modules include MIDI, TCP/IP, DMX outputs, including the communication of expressive content to external applications (e.g. on Virtual Environments inhabited by avatars and clones/characters). A current research project concerns the development and experimenting of EyesWeb software modules for movement analysis inspired to a computational model of Laban's Theory of Effort (Camurri et al 1999; 2000). To this aim, we are developing analysis modules of medium level features. For example, figure 4 shows the output of a module fo analysis of instability of a movement (in the figure: a walking): peaks in the graphic correspond to steps (when a foot is raised from the floor). End users can directly assemble modules from the available libraries to build patches implementing interactive performance setups. 3.1 The EyesWeb Wizard Modules are standard COM modules. Users can build new EyesWeb modules as standard COM modules, and use them in patches. In order to hide to the user the complexity of COM programming, we developed the EyesWeb Wizard. The user can develop automously (i.e., possibly independently from EyesWeb) the algorithms and the basic software skeletons of their own modules. Then, the Wizard supports the user in the process of transforming his algorithms in integrated EyesWeb modules. 4. EyesWeb Hardware Modules Here we list a few main hardware systems we developed during the last years, that we use in EyesWeb-based interactive performances and experiments. Wireless Sensor-to-MIDI is a wireless, small, batteryequipped, easily wearable system designed to capture signals in real-time from on-body sensors. Signals are sent by this system to a remote receiver by means of a wireless radio link. A proprietary redundant data transmission protocol makes the system wireless communication robust and errorfree. The receiver convert data into MIDI signals. System latency is less than 5ms. The system consists of two hardware unit boxes: - a small data acquisition, conversion, and wireless transmission unit (to be worn on-body), - a receiver and converter to MIDI unit. Video Multiplexer (Mpx) for connecting and sync two videocameras to the same framegrabber is our proprietary special electronics developed to capture the signal from two synchronized cameras. This board is based on the fact that we can multiplex two separate input video signals in only one, by switchings between the two video signals at the field rate (50 Hz). In this way we obtain a new interlaced signal in which odds and even fields contain the two different signals. We can then acquire the signal using an ordinary full frame single channel acquisition board. At this point, we have in the frame memory buffer the two original signals, just missing half vertical resolution but maintaining the same temporal resolution. Hardware for human-robot communication: We developed since 1991 a number of setups for museums, music theatre, and art installations. We recently extended the robots Pioneer 1 and Pioneer 2 from SRI with audiovisual interfaces, sensors, interactive audio I/O, for real-time interaction with users (dancers, public, actors, music performers). In "L'Ala dei Sensi" multimedia event (see our web site) we equipped a Pioneer 2 with a video camera, a video projector, a microphone to interact with a dancer (Virgilio Sieni). The dancer wears on-body sensors (Accelerometers and FSRs connected to our previously described wireless-sensors-tomidi box) to send "stimuli" to the robot. A MIDI-controlled audio matrix 8x8 channels and a Long distance MIDI signals Tx/Rx (which uses the standard audio/cannon cables commonly available to send audio signal on stage to send MIDI signals, at long distances) are other examples of special hardware developed for interactive setups. Previous systems developed for HARP (Camurri et al 1994) a few years ago, can be used in EyesWeb. For example, DanceWeb is a low-cost sensor system based on ultrasound (US) technology. Up to 64 ultrasound sensors and up to 32 digital I/O can be connected and multiplexed in groups (e.g. to avoid interferences between US sensors). The system provides fast serial and MIDI outputs. The system consists of an external programmable (via MIDI) rack unit with a microcontroller and electronics for sensor data acquisition and signal conditioning. 5. Conclusion The EyesWeb platform has been experimented and used in music theatre projects, e.g., Luciano Berio Opera "Cronaca del Luogo", which opened the 1999 Salzburg Festival, as well as in other multimedia events and museum exhibits, and

Page  00000003 in science centers interactive games-experiments. It is currently experimented in a high-school for music education. Recent minor but useful improvements concern the extension of the system to support not only Matrox Meteor (I and II) frame grabbers but any Video For Windows compatible video input board. Further, the system does not use any more the Matrox MIL libraries, so no expensive software license is needed to use the software. This because we are planning to start distributing the system to a selected group of users. Project Bibliography Camurri A. P.Morasso, V.Tagliasco, R.Zaccaria. 1986. "Dance and Movement Notation." In Morasso & Tagliasco (Eds.), Human Movement Understanding, pp.85-124, Amsterdam: North Holland. Camurri, A., M.Frixione, C.Innocenti (1994). A Cognitive Model and a Knowledge Representation Architecture for Music and Multimedia. Journal of New Music Research, Vol.23, No.4, pp.317-347, Swetz & Zeitlinger, Lisse, The Netherlands. Camurri, A. 1995. "Interactive Dance/Music Systems." In Proceedings of the 1995 International Computer Music Conference. San Francisco: International Computer Music Association, pp.245-252. Camurri, A. (Ed.) 1997. Proceedings of the International Workshop on KANSEI: The Technology of Emotion. Genova: AIMI (Italian Computer Music Association) and DIST-University of Genova, Italy. Camurri, A., M.Leman. 1997. "Gestalt-Based Composition and Performance in Multimodal Environments." In Leman (Ed.) Music, Gestalt. and Computing, 495-508, Springer. Camurri, A., A.Coglio. 1998. "An Architecture for Emotional Agents." IEEE Multimedia, 5(4):24-33, OctDec 1998, New York: IEEE Computer Society Press. Camurri, A., P.Ferrentino. 1999. "Interactive Environments for Music and Multimedia." Multimedia Systems Journal. Special issue on Audio and multimedia, 7:32 -47, ACM and Springer. Camurri, A., S.Hashimoto, K.Suzuki, R.Trocca. 1999. "KANSEI analysis of dance performance". In Proceedings of the 1999 IEEE Intl Conf on Systems Man and Cybernetics SMC'99, Tokyo, New York: IEEE Computer Society Press. Camurri, A., Hashimoto, S., Ricchetti, M., Suzuki, K., Trocca, R., Volpe, G. (2000) EyesWeb - Toward gesture and affect recognition in dance/music interactive systems. Computer Music Journal, 24(1), MIT Press. Figure 1: a patch implementing a simple movement analysis based on two videocameras.

Page  00000004 Fiiure 2: the dialog window for setting parameters of the barycenters analysis module A ea 2 9ii 1 22305 Fieure 3 [3 2 i 71 [304 V Fiirure 4 r -B I + + + + + r ~e + + + II 100 150 200 250 50 100 150 200 250