Synthesis of expressive movementSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact firstname.lastname@example.org to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 00000001 Synthesis of expressive movement Antonio Camurri, Paolo Coletta, Matteo Ricchetti, Gualtiero Volpe DIST, Laboratorio di Informatica Musicale - University of Genova (http://musart.dist.unige.it) Viale Causa 13, 16145 Genova, Italy email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org ABSTRACT We present experiments concerning the use of mobile robots in music theatre artistic productions. The paper focuses on the development of techniques for expressive movement synthesis and on their hardware and software implementation in concrete human-robot and human-virtual characters interaction. Different kinds of synthesis of expressive content in movement are considered: (i) Expressive content conveyed through the movement of robots and interacting with them (e.g. robots interacting with dancers, artists, musicians on a stage): a particular focus will be on the expressiveness raising not only from the movement of a robot (i.e. its style of movement), but also from a global, multimodal interaction between a robot and an artist (e.g. a music performer or a dancer). The integration of music and movement is also a crucial issues presented in the paper. (ii) Expressive utilisation of mobile scenery in music theatre applications. Research on synthesis of expressive movement will take advantage from the results obtained in our Labs on the analysis of expressiveness in human movement and dance. Experiments based on small mobile robots (the Pioneer 2 from Stanford Research Institute) are presented. The paper describes issues developed in a multimedia artistic production we partecipated with Virgilio Sieni Dance Company (L'Ala dei Sensi, Ferrara, November 1999). In particular, the interactive setup, the techniques, and the lessons learned in experimenting the developed system will be presented. 1.INTRODUCTION One of the main goals of our research is to explore paradigms of expressive interaction between humans and robots in the framework of multimodal environments in music, theatre, museal exhibitions, and art installations. In a previous work (Suzuki et al, 1998), we experimented a small mobile robot on wheels (the Pioneer 1 from Stanford Research Institute) as a semi-autonomous agent capable of communicating by means of several channels, including sound and music, visual media and its style of movement (smooth/nervous, tailwagging, fast/slow, etc.) with the visitors of a museum exhibit on contemporary art ("Arti Visive 2", Palazzo Ducale, Genova, October 1998). In another experimental setup, we developed a "theatrical machine" for the performance of "Spiral", by K.Stockhausen, for trombone and radio output. The radio, audio amplifier and loudspeakers were installed on top of the robot navigating on the stage, thus creating effects of physical spatialization of sound due to the movement of the robot during the performance (trombone Michele Lo Muto, live electronics Giovanni Cospito, Civica Scuola di Musica, Sezione Musica Contemporanea, Milano, June 1996). The movements of the robot were influenced by sound parameters and by the gesture of the trombone performer: for example, a high "energy" content by the trombonist's gesture and a high sound spectral energy were stimuli for the robot to move away from the performer. Smooth and calm phrasing and movements were stimuli attracting the robot near and around the performer. Further, the robot sound and music output can be part of the interaction process, i.e. influenced in real time by the movement and sound from the performer. In this paper we investigate how the physical interaction of performers, dancers and robotic scenery can participate as a component of the music language. Interactive visual media and robotic scenery are therefore part of the compositional project of music theatre works, art installations, multimedia concerts. The paper explores these directions by presenting the performance environments developed for the multimedia performance "L'Ala dei Sensi" (literally, "The Wing of the Senses"), held in Ferrara (Italy) on November 1999. Figure 1. The agent-robot in the art installation at the "Arti Visive 2" museal exhibit, Palazzo Ducale, Genova, Oct 1998. 2. THE MULTIMEDIA PERFORMANCE "L'ALA DEI SENSI" We carried out experiments in the above described directions in the framework of the work "L'Ala dei Sensi" (Ferrara, Italy, November 1999). "L'Ala dei Sensi" is a multimedia performance consisting of a percourse on the theme of human perception from different perspectives. In a few words, it aims at explaining scientific principles (on human perception mechanisms) by means of the language of art (dance, music, visual arts). Each episode focuses on a specific scientific issue which is explained and demonstrated in a performance. The episodes involving interactive dance/music performances also made use of a small mobile on-wheel robotic platform Pioneer 2 from Stanford Research Institute. Our EyesWeb system for real-time gesture analysis and for the design of gesture-music interaction has been used (Camurri et al, 1999; 2000), together our wireless technology for movement sensors, visualisation and live electronics hardware (see our web site http://musart.dist.unige.it). These episodes consist of short dance performances in which the dancer himself is involved in a dialogue with the robot, with visual and musical clones on large videoprojection screens. In particular, three episodes are here described and then discussed.
Page 00000002 2.1 Dancer - robot dialogue This is the main episode, where we invested most of our efforts, experimenting and developments. The Pioneer 2 mobile robotic platform has been equipped with sensors, an on-board video projector and a videocamera. The robot was controlled by our proprietary supervision software application developed as an EyesWeb patch (Camurri et al, 1999; Camurri et al, 2000). The sensors allow the robot to avoid collisions with the scenery and the dancers. In the first part of this episode, the on-board video projector and the videocamera are directly controlled in real-time by the director Ezio Cuoghi (off-stage). He used in real time the images coming from the robot (robot's point of view) to assemble and mix overall images on a large screen managed by several computer controlled video projectors. He also planned and controlled in real-time the movements of the robot with the goal of causing proper movements of the images projected by the video projector on the robot. The movement of the robot is therefore amplified by its images projected on the large screen (managed by several video projectors). That is, the robot, by its movement, causes the on-board video projector a dynamic video projection: by moving, the robot caused the migration, oscillation, etc. of fragments of the overall projected image from a part to the other in the overall large screen. In this first part of this episode, the robot is a sort of passive companion of the dancer: it follows the dancer and is controlled by the director, and captures in real-time video images of the dancer from an onstage perspective. These images are immediately available to the director for mixing and videoprojection. In this starting part of this episode, the movements of the robot are radio controlled from the director: the only degrees of freedom of the robot consisted of low-level reactive processes able to avoid collisions (in case of error from the human) and possible "interpretation" parameters (i.e. knobs) controlled by the director to add "nuances" to its movement (smoothness, directness, etc.). In this first part, the robot had the electric power cable attached, so it was not completely wireless. Its wireless part concerned sensors, audio, and video links. At a certain point, the dancer plug-off the electric power cable of the robot, as a special important gesture. This automatically caused (i) the activation of an extra internal battery pack (to feed the on-board videoprojector) (ii) the link of the robot videocamera to its on-board video-projector (that is, the robot videocamera signal is not any more available to the director of the performance), (iii) the starting of a semiautonomous behaviour of the robot, (iv) the activation of the wireless sensors on the body of the dancer. In this way, using the words of the director Ezio Cuoghi, "the robot comes to life". A deeper and more interesting dialogue therefore started between the dancer and the robot. The dancer was equipped with two sensors (on the palms of the hands, whose signal is transmitted by a wireless radio link to the supervision software). By acting on the first sensor he was allowed to influence the robot toward one between two different styles of movement: a sort of "ordered" movement (aiming at a straight line constant speed movement) and a "disordered" movement. Through the second sensor the movement could be stopped and restarted. The dancer was also observed by a videocamera. Movements and gesture of the dancer then became further stimuli to the robot, which was able to react by changing (morphing) its style of moving. The language of the dancer to communicate with the robot was quite simple: intentional stimuli (start/stop the robot, positive/negative stimuli), and dancer's movement overall characteristics such as energy, equilibrium, and a number of time and space Laban's Effort-like parameters (Camurri, in press). Furthermore, the robot projected around in the environment (walls) the images coming from its on-board camera. In this way, the dancer was able to interact not only with the robot itself, but also with the projected images. Also in this case, even small movements of the robot were amplified by the changes in the visual feedback projected from the robot. The free space between the dancer and the robot is perceived as a sort of an invisible "elastic": in other words, the robot and the dancer became a sort of a whole entity. 2.2 The real and virtual dancer Another episode concerned the real-time interaction between the dancer and its virtual clone visually perceivable on a large video screen. The dancer interacted with a computer generated silhouette changing over time its graphical appearance according to the style of movement of the dancer. In case of more dancers, an overall, whole clone corresponding to the whole group appeared. 2.3 The Virtual Mirror This episode was based on our multiplexer hardware (see http://musart.dist.unige.it) developed for real-time multiple videocamera signals synchronised analysis. The effect was similar in some aspects to the video artists Vasulka's virtual mirrors performances. The dancer was also doubled in real-time by this special system based on synchronised multiple videocameras. 3. NOTES ON THE REAL-TIME IMPLEMENTATION The dancer is equipped with our proprietary Wireless sensors-tomidi system for the real time analysis of dancer's movement features, consisting of (i) a small wearable hardware box to collect on-body sensors data and (ii) a receiver box which also converts data to midi. Such MIDI data, processed by a software application running on the supervisor computer, are in our case utilized to influence the robot's style of movement. The supervision software was built upon our Eyes Web visual software. It consists of a visual language and an open set of libraries of reusable modules allowing the user to build patches as in common computer music languages inspired to analog synthesizers (Camurri et al, in press; Camurri, in press). EyesWeb also includes our experimental modules for analysis of expressiveness in movement. A typical hardware configuration consists of two video cameras, a special proprietary electronics for the real-time capture of both camera signals and their sending to a video for windows compatible frame grabber board. Figure 3 shows the overall hardware architecture. We use the signals from two (or more) videocameras to have different views of the same scene: a typical use include a frontal and a lateral view of the stage, to extract three-dimensional information. The system supports both color and black&white (infrared) cameras. Our proprietary special electronics board (Mpx, see figure 3) has been developed to capture concurrently the signal from two synchronized cameras. This board is based on the fact that we can multiplex two separate video signals in only one, by switchings between the two video signals at the field rate (50 Hz). In this way we obtain a new interlaced signal in which odds and even fields contain the two different signals. We can then acquire the signal using an ordinary full frame single channel acquisition board. At this point, we have in the frame memory buffer the two original signals, just missing half vertical resolution and halving the temporal resolution.
Page 00000003 Supervision s/w: input gesture and robot control (EyesWeb patch) Pentium II/ Win32 Mpx Wireless S frame midi sensors-to-midi b J system Figure 2. System architecture for movement analysis and for the "virtual mirrors" episodes, including our proprietary Mpx and Wireless-sensors-to-midi (see http://musart. dist. unige. it) 4. DISCUSSION The main emerging issues from our experience in "L'Ala dei Sensi" are summarized in the following points. 4.1 Expressive movement or expressive interaction? That is, in an artistic performance involving a robot moving on stage, can expressiveness arise from the movement of the robot itself or is it needed the interaction of the robot with one or more human interpreters? In fact, from the performance in "L'Ala dei Sensi" it resulted that a set of quite simple movements such as the "ordered" (near straight line constant speed movement) or the "disordered" movements were judged expressive and artistically interesting when performed by the robot interacting with the dancer and not by the robot alone. We could say that the dancer's ability to create an expressive performance conveyed expressiveness to robot movements. Therefore, it seems that the focus of future research should concern interaction and the mechanisms that make interaction expressive rather than on searching for expressive styles of movement. However, we have to remark again that the styles of movement utilized in these initial experiments were quite simple. An hypothesis is that further studies on expressiveness in human movement, taking into account the results obtained so far on the analysis side (Camurri, in press) together with the application to expressive movement synthesis of theories such as Laban's Theory of Effort, and with the utilization of more versatile robotic platforms, can lead to more effective and expressive styles of movement. This might allow a robot to move itself on a stage, alone or with other agents (robots, humans, virtual clones), conveying anyway in itself an expressive and artistically interesting content to the audience. 4.2 Synthesis of expressive movement in a multimodal perspective The experiments carried out in "L'Ala dei Sensi" were mainly devoted to the synthesis of expressive movement and expressive interaction in a multimedia performance. The movement features analysed in real-time from the dancer were intentionally only a subset of those available from our system. As well, the synthesized movements of the virtual characters were very essential. This was a our choice due to (i) the need for a measurable experimental setup, and (ii) the resources available for the project. These aspects can be enhanced in future projects, with more complex interaction between real and virtual dancers. Further, the relationships between gestural and musical channels still need research efforts. Open problems concern for example extending the models for integrating expressive movement synthesis and synthesis of expressive content in other modalities like music and visual media. Furthermore, the mapping strategies of expressive multimodal input onto multimedia output, and in particular onto the synthesized movement, are another crucial research issues. A deeper discussion of these aspects can be found in (Camurri, in press). 4.3 Expressive autonomy If a robotic agent is involved in an artistic performance (e.g. it takes part in a dance performance together with some human dancers), can it make autonomous decisions? That is, has the robot to carefully follow the instructions given by the director, the choreographer, the composer, or is it allowed some degree of freedom in its movement? This question can be extended to agents in general, including visual and music agents. In the first section of the robot-dancer interaction of "L'Ala dei Sensi", the robot was an interpreter (that is, with some expressive content) of a predefined score of movements designed by the director. A number of rehearsals was devoted to obtain what the director wanted. Further, in most music and theatre performances, the performer's expressiveness is directed to transmit the expressive content that the director or the composer intended to communicate. In general, an agent involved in a performance can have different degrees of expressive autonomy. We define as expressive autonomy the amount of degrees of freedom that a director, a choreographer, a composer (or in general the author of an application including expressive content communication), leaves to the agents involved in the application in order to make decisions about the appropriate expressive content in a given moment and about the way to convey it. Expressive autonomy is therefore somewhat different with respect to autonomy as intended in Artificial Intelligence and Robotics: in fact, expressive autonomy doesn't concern the amount of built-in knowledge the agent contains nor its capabilities to make decisions on its own on the basis of the feedback coming from its physical sensors. For example, if the agent has to strictly observe the directives given by a director, then it will result not autonomous from the point of view of expressiveness. That is, it is not allowed to make an autonomous decision about the expressive content to convey to its audience: such decision is made by the director. In the case of a robot, if during a performance the robotic agent is asked to perform in an effective and expressive way a sequence of movements (and possibly music and visual media) that the director predisposed and that has been repeatedly tuned and trained with the system during a number of rehearsals, then the robot is only minimally expressively autonomous or it is not expressively autonomous at all. Of course, this is not always the case: if, as it happens for performers, some degrees of freedom are still present and the agent is flexible, versatile and rational enough as an interpreter, intervening when necessary to add nuances to its behaviour coherently with the performance, then we could say that the agent is expressively semiautonomous. That is, it plays the role the author or the director assigned to it, but it can still make decisions for example about the way of conveying the expressive content. For example, an expressively semiautonomous robotic agent could choose what behaviour or, in particular, what style of movement it can show in order to appear happy, in a part of a performance during which the director wants the robot to appear happy. A high degree of expressive autonomy can be found for example in museum environments where a robot could be asked to play the role of guide for visitors or as a visitor itself (Camurri and Coglio, 1998). In such a case, although the robotic guide has to follow a narrative thread, however it can choose what expressive content to convey in order to increase the interest of its audience:
Page 00000004 the author of the application builds a narrative structure and process, and the agent is assigned the task to instantiate/interpret it in a suitable way given its current audience and context. The current degree of expressive autonomy, however, can depend on the structure and dynamics of the narration and can vary over the time during the visit. Complete expressive autonomy implies that in a given moment the agent is completely free to choose the expressive content it wants to convey (i.e. the expressive content it judges to be more suitable given its current perceptions) as well as the way to convey it. The requested degree of expressive autonomy is crucial when we deal with the implementation of the agent. In fact, a higher degree of expressive autonomy requires the agent to have more sophisticated capabilities in order to make its expressive choices. Thus, while the design and the implementation of an agent with limited degrees of expressive autonomy can result quite simple, a high expressively autonomous agent could need to be equipped with different kinds of components, such as components able to recognize the expressive content communicated by people interacting with it, components embedding artificial emotions and personality models, components providing rational capabilities able to make decision on the basis of the current goals of the agent. Therefore, from the point of view of a director or of an author it could be useful to have some tools allowing them to easily build artistic performances involving the presence of expressive agents with different degrees of expressive autonomy or, in general, including various kinds of systems and devices, without they have to worry about the technological issues and the underlying complexity. For example, as concerns robot-human interaction and expressive movement synthesis, this means to provide directors and authors with tools allowing them to easily specify in a high level language the expressive actions and movements they want the robot performs as well as the degree of freedom that are left to it when doing such actions. 4.4 Composition models: musical information organisms, agents, and physical interaction The previous considerations can be applied to the expressiveness of music agents. In (Camurri et al, 1994) we defined a cognitive model and an environment for composition and performance (HARP) based on agents and metaphors of energy fields. The use of metaphors as a "glue" between modalities, i.e. a means unifying the different languages involved (music, gesture, visual) was also one of the main motivation issues in HARP and now in EyesWeb. Several composers are facing composition from similar perspectives, as discussed in (Camurri, in press). For example, Marco Stroppa's Musical Information Organisms, which can be represented at the microstructure level as living being and at the macrostructure level as sources of energy, and Gerard Grisey's etre vivant musical objects. In our model of an interactive environment, such "organisms" (musical or, more in general, multimodal) should also be considered in an open-world perspective, i.e., to comprehend their real "sensors" and "actuators". That is, the definition of a organism shoud be situated in the real-world (the real or virtual stage): an organism exists if it can act in the world it inhabits, and our proposal of model of organisms include real sensors and effectors. Organisms communicate either directly (e.g., Marco Stroppa's energy fields) or through the shared realworld, populated by performers, dancers, robots. Robots, environmental sensors and actuators become the sensors and the effectors of the perceptual and motor systems of such Musical Information Organisms. This our last idea is one of the most intriguing issues on which our current research on interaction models is grounded. ACKNOWLEDGEMENTS We thank the Director Ezio Cuoghi, the Choreographer and Dancer Virgilio Sieni for the interesting discussions during the preparation of "L'Ala dei Sensi", and Andrea Ricci for his support on the EyesWeb software. REFERENCES Camurri, A. 1995. "Interactive Dance/Music Systems." In Proceedings of the 1995 International Computer Music Conference. San Francisco: International Computer Music Association, pp.245-252. Camurri, A., A.Coglio. 1998. "An Architecture for Emotional Agents." IEEE Multimedia, 5(4):24-33, Oct-Dec 1998, New York: IEEE Computer Society Press. Camurri, A. In Press. "Music Content Processing and Multimedia: Case Studies and Emerging Applications of Intelligent Interactive Systems." Journal of New Music Research, 28(4), Swets&Zeitlinger. Camurri, A., P.Ferrentino. 1999. "Interactive Environments for Music and Multimedia." Multimedia Systems Journal. Special issue on Audio and multimedia, 7:32-47, Association for Computing Machinery (ACM) and Springer. Camurri, A., S.Hashimoto, K.Suzuki, R.Trocca. 1999. "KANSEI analysis of dance performance". In Proceedings of the 1999 IEEE Intl Conf on Systems Man and Cybernetics SMC'99, Tokyo, New York: IEEE Computer Society Press. Camurri, A., R.Trocca, G.Volpe (1999). Full-body movement and music signals: an approach toward analysis and synthesis of expressive content. Proc. Intl. Workshop on Physicality and Tangibility in Interaction: Towards New Paradigms for Interaction Beyond the Desktop. CEC-I3, Siena, October 1999. Camurri, A., M.Frixione, C.Innocenti (1994). A Cognitive Model and a Knowledge Representation Architecture for Music and Multimedia. Interface - Journal of New Music Research, Vol.23, No.4, pp.317-347, Swetz & Zeitlinger, Lisse, The Netherlands. Camurri A. P.Morasso, V.Tagliasco, R.Zaccaria. 1986. "Dance and Movement Notation." In Morasso & Tagliasco (Eds.), Human Movement Understanding, pp.85-124, Amsterdam: North Holland. Camurri, A. (Ed.) 1997. Proceedings of the International Workshop on KANSEI: The Technology of Emotion. Genova: AIMI (Italian Computer Music Association) and DISTUniversity of Genova, Italy. Camurri, A., M.Leman. 1997. "Gestalt-Based Composition and Performance in Multimodal Environments." In Leman (Ed.) Music, Gestalt. and Computing, 495-508, Springer. Camurri, A., Hashimoto, S., Ricchetti, M., Suzuki, K., Trocca, R., Volpe, G. (2000) EyesWeb - Toward gesture and affect recognition in dance/music interactive systems. Computer Music Journal, 24(1), MIT Press. Friberg, A., and Sundberg, J. 1999. "Does music performance allude to locomotion? A model of final ritardandi derived from measurements of stopping runners." Journal of the Acoustical Society of America, 105(3), pp.1469-1484. Laban, R. 1963. "Modem Educational Dance." 2nd Edition, London: Macdonald & Evans Ltd. Suzuki, K., Camurri, A., Ferrentino, P., Hashimoto, S. 1998. "Intelligent Agent System for Human-Robot Interaction through Artificial Emotion." Proc. IEEE Intl. Conf. On Systems Man and Cybemrnetics SMC'98, New York: IEEE CS Press.