Page  00000001 Design and Analysis of Virtual Musical Environments (VME) Jaein Hwang, Gerard Jounghyun Kim Virtual Reality Laboratory Department of Computer Science and Engineering Pohang University of Science and Technology (POSTECH) {jane, gkim} @postech.ac.kr Abstract This paper proposes and demonstrates the concept of a "Virtual Musical Environment (VME)", a generalization of a virtual music instrument, in which the virtual world itself acts as a source of multimodal feedback, and at the same time, a place of interaction. Naturally, the two important issues in designing a VME are: the display content and the control interface. These two issues are in fact inter-related as music performance is seen as a closed loop system composed of the user and the VME. The control interface, that changes the world, must be designed to be as natural and easy-to-use as possible for quick responses to the on-going music. The display must also be "vivid" in the sense that it must leave the user with a strong musical impression so that one remembers the "essence" of the musical content. We discuss some of our hypothesis on various important elements in designing an effective VME. Exploitation of these elements necessitates a highly controllable user interface with a simplified performance task to free the user from having to worry about the playing skills. We present our design of a VME, and report on the results of the usability experiments we carried out to test the controllability of the system. 1. Introduction While enjoying music in the form of listening is a passive form of music appreciation, playing an instrument or conducting an orchestra might be classified as an active form of music appreciation, and such a skill may often require years of difficult training. As difficult as it is, this active form of music appreciation induces an even deeper sense of enjoyment out of the performers, because in addition to the sense of direct participation, the performers are given a total or at least partial control in expressing one's artistic taste. Recent progress in digital music has enabled some form of active music appreciation. In particular, with the advent of affordable VR technologies, many researchers are experimenting with the threedimensional and multimodal interfaces to control music and replace the current keyboard/mouse-based (piano or computer) interface for computer/electronic music. Given such a background, this paper proposes and demonstrates the concept of the 'V irtual Musical Environment (VME)", a generalization of a virtual music instrument, in which the virtual world itself acts as a source of multimodal (e.g. visual, audio, haptic) feedback, and at the same time, a place of interaction, Naturally, the two important issues in designing a VME are: the display content and the control interface. These two issues are in fact inter-related as music performance is seen as a closed loop system composed of the user and the VME. The control interface, that changes the world, must be designed to be as natural and easy-to-use as possible for quick responses to the on-going music, and the world around the user must contain the right information and convey them to the user in an intuitive manner for acceptable controllability. The display must also be 'Vivid" in the sense that it must leave the user with a strong musical impression so that one remembers the 'essence" of the musical content. The main theme of this paper is on the derivation of display and interface requirements of VME's. We argue that Presence (one of the defining qualities of virtual reality) and information correspondence are important elements of VME, and discuss how they relate to the display and interface requirements. Exploitation of these elements necessitates a highly controllable user interface with a simplified performance task to free the user from having to worry about the playing skills. We present our design of a VME and report on the results of the usability experiments we carried out to test the controllability of the system. 2. VME (Virtual Musical Environment) 2.1 Concept and Architecture Pressing [Pressing 94] has suggested for a closed loop system model of a general music performance with the human performer in the loop. The performer attempts to map one's musical intent through one's nervous and motor systems, activating the control surface of the music instrument that generates auditory, haptic and visual feedback. The feedback is felt by the user through one's visual, kinesthetic and auditory senses and somehow affects the performer who continually changes one's musical intent. An ordinary instrument offers limited haptic and visual feedback in the sense that it is only the motion and scene of the instrument itself. However, one can imagine a computer based or virtual instrument that may provide richer or more interesting content. We extend and generalize this model by replacing the usual real or virtual instrument with the entire virtual environment, i.e. a world instead of an object (See Figure 1). We feel that this distinction between a world and an object is important from the system design point of view, especially when we consider the effect of extending the control/display range, providing "compelling " multimodal feedback, and the feeling of presence.

Page  00000002 We hope such elements will bring about new avenues to appreciating, educating and performing music. __ Performer intent central nervous system motor proprio- vision auditory system ception sense o i A ' f t ' VR devices stereo 3D computer graphics computations of computer system sound generation Virtual Musical Environment 1 Figure 1: Architecture of VME (Adapted from [Pressing 94]) 2.2 Design Requirements Hypothesis Although music appreciation and performance are mostly regarded separate tasks, in our view, we feel that music performance, at least partially, will become important in terms of 'actively" appreciating music, and increase the effect and level of music appreciation. Here, music appreciation refers to the ability to recognize and learn the content of the music (e.g. flow, rhythm, beat, expression, tempo, tension, etc) without mastering movement skills to play or conduct particular instruments. There are many literatures that point to the importance of presence toward task performance and learning in VR setting [Durlach 98][Slater 96][Steuer 92][Sheridan 92]. We are interested to know if this is also valid for music applications. In the context of interactive music environments, we define 'presence" as the sense of visitation/immersion into the 'inusical space " and the sense of first hand control (of the music, not necessarily the instrument) and participation (i.e. Am I really part of this harmony?). According to this definition, we identify the following elements that are important in providing presence: (1) sensory immersion (e.g. visual field of view, control range, first person view point, multimodality), (2) spatial and logical immediacy between the user and the virtual objects (this would require virtual objects to represent various parts of the given music rather than the instruments) and (3) control immediacy that refers to the ease of recognition of interaction between the musical objects and synchronization between different modalities. This definition and causal elements are adapted from several other existing research works in presence [Durlach 98][Slater 96][Steuer 92][Sheridan 92]. The level of learning performance in music may be defined by how well, in fixed number of practice sessions, users can recognize a given musical piece, associate portions of the music with the appropriate tension and emotion, remember the appropriate tempo/expression/notes at various parts of the music, etc. We believe that our main hypothesis will be valid only when users are free from having to worry about the skill side of the performance (e.g. striking the right note). For the general users, this means a system with a convenient and "highly controllable" user interface, helped by computer music technology will be needed. Here, controllability is defined as one's ability to strike the necessary notes at the right time for appropriately progressing the music and at the same time have enough cognitive / kinesthetic room for controlling other factors of music, expression, tempo, emphasis and so on, i.e. "controllability ". We argue that the two presence elements, spatial and control immediacy, are also important for high system controllability. We think that the expert players use mental visualization to enhance their performance, and further argue whatever that may be, it must be based on the musical content. We can hypothesize, thus, that novice to mid level players need to '"ee" the musical information (e.g. notes) and if we can provide visualization based on musical content, it might help them play better and gain more insight to the music as well. Here is a summary of our hypothesis for design of a VME. 1) Performance and active participation increases the level and effect of music appreciation and learning. 2) To see the effect of performance in music education, we must overcome the "skill problem" and simplify the performance task and design a highly controllable user interface. 3) Presence is increased by sensory immersion, spatial/logical immediacy and control immediacy, and presence increases the level and effect of music appreciation and learning. 4) Spatial immediacy and control immediacy are achieved by modeling the VME with musical objects rather than real world objects like the instruments. 5) Spatial immediacy and control immediacy also increase the controllability. 6) The musical information manipulated and controlled by the user must be present in the feedback (visual, aural or haptic). 3. Performance and Interaction Model in VME As indicated before, highly controllable user interface with a simplified performance model is a prerequisite of an effective VME. For such a performance model, we first define the minimal parameters for the user to control in performing a musical piece and still feel in control, and use computer music technology to take care of the other remaining music parameters automatically. The two main acoustic elements that an artist manipulates in shaping a performance are the intensity and duration. Music played with tones of equal intensity and exact duration (as notated) sounds lifeless and mechanical. It is variations in duration and intensity that create the living shapes in music [Repp 93]. We feel that for direct interaction with the music and thus increased presence, one should be able to control the overall tempo, expression (like a conductor does) and also the duration and the intensity at the individual note level (like a concert pianist). We propose a mixed conductor-instrument based interface that allows the user to globally control the tempo and expressiveness of the music flow, while be able to control a particular instrument at a note level. This is analogous to a conductor who manages to play the major instrument for a concerto. Such an interface combines the merits of the conductor

Page  00000003 and instrument-based interface, allowing both the global and minute note-level control. Playing even just one track can be an error prone task, and thus we suggest the concept of air-playing. For example, the instrument-based part of the interface is basically a virtual piano keyboard: the user simply needs to continuously tap (for each note of the melody) on the interface. We feel that a tap would be the simplest and most natural interface for this particular task. A special sensor that detects the user's tap and its intensity has been designed for this purpose. As the sensor is built based on detecting vibration, the tap can be sensed within a wide range from where the sensor is located, and enlarges the control range significantly. We assume that in a multi-track musical score, there is one track, which we call the "Driving" track that represents the main melody (the other tracks are called the "Supporting" tracks), and this is the track (unless the user wants to control some other track) the user controls. As the notes in the Driving track are played upon finger taps (sent as MIDI notes to the sound processing hardware in real-time), all other notes from the supporting tracks that must be played until before the next melody note are played at the current tempo. The information display must be directly be driven by the elements controlled by the proposed interface for better controllability; and it must display both global properties (e.g. tempo/expression) and individual note level information. We call this the '"orre spondence" requirement. Even with this requirement, there may be thousand ways of visualizing and displaying music. We propose to use three dimensional motion as the main theme of the display. This is based on the literatures that report the existence of the human's internal representation of music in motion form, called the "inner motion", used for both music interpretation and performance [Repp 93, Epstein 95]. Therefore, a given music is visualized graphically but juxtaposed with the necessary musical information and metaphorical motion models in hopes of striking a chord with the human's internal motion-based music representation. 4. The Experiment 4.1 Testbed Design Our final design of the VME employs two handed interface, where the dominating hand controls the music progression and individual note volume through finger tapping (air piano playing) of the major melodic track, and the other hand controlling the global tempo through a simple squeeze/unsqueeze gesture. We believe that the "air piano playing" interface is quite simple and easy to use and offers the sense of direct control and presence through the passive haptic feeling. The squeezing metaphor is based on the human's sense of proprioception (i.e. finger joints), and requires minimal cognitive load in the midst of simultaneous control of music progression and expression control by the dominating hand. The visual display uses a roller-coaster metaphor that 'inoves" according to the notes and expression of the music being progressed and varied by the user. The visualization shows note intensities, note frequencies and overall volume and tempo. In Figure 2, the squares represent the intensity and pitch of each melody note. The bars at the right and left end show the current volume and tempo. Figure 2: Visualization used in the VME Testbed. 4.2 The Task The experiment attempts to compare the controllability of the proposed mixed interface to either just instrument-based or conductor-based approach. Whether to provide visualization (with information correspondence) or not was another variable in measuring the controllability of the system. The automatic accompaniment is required, by design, for the proposed mixed, and, for a fair comparison, this resulted in total of 8 system configurations to be tested as seen in Table 1. A total of 21(18 men, 3 women subjects participated in the study and every participant went through eight experiments with random order. Instrument-based One No Exp 1 Exp 2 Conductor-based One No Exp 3 Exp 4 Conductor-based One Yes Exp 5 Exp 6 Instrument Two Yes Exp 7 Exp 8 Conductor based Table 1: The Experiment Design. The task given the subject was to perform a very simple tune ("twinkle twinkle little star") in six different styles: in a normal mode, in a tempo two times slower, in a tempo two times faster, with half the intensity, with twice the intensity, with an alternating tempo (slow and fast) and with an alternative intensity (strong and weak). The task order was randomly given to each participant. The tune of "twinkle" was chosen since virtually everyone is familiar with it. Quantitative performance data (e.g. note timing, tempo, intensity) were collected to indirectly evaluate how well the subject followed the playing instructions. In addition, participants were given questionnaire asking about their interests and task difficulty. 4.3 Results As we expected, instrumental gesture was adequate for both fine (note level) control of tempo and volume control. On the other hand, conductor based interface allows a smoother transition between different control phases. The mixed approach is shown to combine the merits of both approaches without significant performance degradation that may be expected due to

Page  00000004 the increased cognitive load (i.e. using both hands (See Figure 3). ~~-: ~ ~~: -s NAl m Vkslum 14 40 IS0I,e Figure 3: The Target Performance (above) and the Performance of the Mixed Interface (below). When using the conductor-based interface, subjects were affected by the existence of visualization more greatly than the subjects who used instrument-based interface. It was also found that conductor-based interface generated more interests from the users and users felt it to be relatively easier to use (See Figure 4). Thus, we conjecture that users of conductor-based interface have more cognitive room to concentrate on the visualization and thus are more affected by it. However, for tempo control, users of all interfaces seem to ignore the tempo information displayed through the visualization and use their own scale, for instance, all of the users consistently increased the tempo by only 50 % when instructed to double it. Interestingly this was not the case for volume control. 5. Summary and Future Work In this paper, we have presented a model of VME, and proposed some of its design requirements. As one of the prerequisite of the proposed design requirements is a highly controllable and easy to use interface and simplified task model, we propose a two handed mixed interface that employs both conductor and instrumental based interface. The music performance task is simplified by use simply having to "tap" on the right note for the main melody only (or the "Driving Track"). Our experimental results confirm that our interface produced a fairly controllable system in both its resolution and number of control parameters. Other interesting results also surfaced from our study. We plan to continue to test and confirm the effect of presence, the other hypothesized design requirements, using our basic VME design. Figure 4: Difficulty and User Interests of the Interfaces. Acknowledgement This project has been supported in part by the Ministry of Information and Communication and Electronics and Telecommunication Research Institute (ETRI) of Korea. References [1] EPSTEIN, D., Shaping Time. Schirmer Books, 1995. [2] REPP, B., Music as Motion: A Synopsis of Alexander Truslit's (1938) Gestaltung und Bewegung in der Musik. Psychology of Music, 1993. [3] PRESSING, Cybernetic issues in interactive performance systems. Computer & Graphics, 18(5), 1994. [4] DISCUSSION GROUP ON GESTURE RESEARCH IN MUSIC, Documentation available on the web at http://www.ircam.fr/equipes/analysesynthese/wanderle/Gestes/ Externe/index.html, 1999. [5] Kim and Hwang, Musical Motion: A Medium for Uniting Visualization and Control of Music in the Virtual Environment, VSMM 99, 1999 [6] N. Durlach and M. Slater. Presence in Shared Virtual Environments and Virtual Togetherness. Presented at the BT Workshop on Presence in Shared Virtual Environments, June 1998 [7] M. Slater, V. Linakis et al. Immersion, Presence, and Performance in Virtual Environments: An Experiment with Tri-Dimensional Chess, ACM Virtual Reality Software and Technology(VRST), 1996 [8] J. Steuer. Defining Virtual Reality: Dimensions Determining Telepresence. Journal of Communication, 42(4):73-93, 1992 [9] T.B Sheridan. Musings on Telepresence and Virtual Presence. Presence, Teleoperators, and Virtual Environments. 1:120-125. 1992