/ Coney Island: Combining jMax, Spat and VSS for Acoustic Integration of Spatial and Temporal Models in a Virtual Reality Installation
Coney Island: Combining jMax, Spat and VSS for Acoustic Integration of Spatial and Temporal Models in a Virtual Reality Installation Robin Bargar* (rbargar@ncsa.uiuc.edu), Francois Dechelleý (Dechelle@ircam.fr), Insook Choi*, Alex Betts*, Camille Goudeseune*, Norbert SchnellP, and Olivier Warusfel~ *University of Illinois at Urbana-Champaign, 405 N Mathews, Urbana, IL 61801 ~IRCAM, 1 Place Igor Stravinsky, 75004 Paris, France Abstract We present a case study of sound production and performance that optimize the interactivity of model-based VR systems. We analyze problems for audio presentation in VR architectures and we demonstrate solutions obtained by a model-based data-driven component architecture that supports interactive scheduling. Criteria and a protocol for coupling jMax and VSS software are described. We conclude with recommendations for diagnostic tools, sound authoring middleware, and further research on sound feedback methods to support a topology of interacting observers. 1. The VR Audio Problem Space We configure Virtual Reality to provide real-time interaction with geometric and temporal models. The dominant medium for displaying immersive VR is animated computer graphic images. While VR is frequently referred to as a "space" based upon 3D models, in most VR systems the primary physical space is constrained to a flat surface where an image is projected. A virtual camera view is generally a tetrahedron with its apex at the viewing position defining a viewing volume expanding symmetrically into a geometric 3-space. Stereo image computation generates depth cues by offsetting 2D images. For the most part visual "3-D" immersion results from 2D frontal image projection enhanced by interactive camera mobility. Introducing sound into a flat, frontal visual field often induces cinematic solutions: imitating space rather than simulating space. That is, fixed resonance characteristics and fixed timing of wave propagation, rather than spatio-temporal simulation. 3D audio has been touted as an important attribute of VR. However most VR systems presented at academic conferences and trade shows only provide rudimentary sound file playback, often coupled to MIDI-enabled devices. The significant limitation in this approach is the absence of information concerning an interactive simulation. Pre-recorded sounds and MIDI sequences can provide at best a rough approximation of the behavior of a simulated system. They are often observed to be mere imitations which cannot provide an accurate insight into the states of a real-time simulation. While they may confirm simple interaction, pre-determined sounds minimize the acoustic relevance of the degrees of freedom in a simulated system. If auditory feedback can improve the user's spatial orientation and sense of real-time interaction, then why is VR audio typically limited to triggering pre-recorded sound file or MIDI file playback? A short answer is "there are no standard alternatives." Unlike the comprehensive hardware solutions provided by proprietary graphics subsystems, there have been few vendor-supported efforts toward sound synthesis subsystems. Historically it has proven difficult to argue for the commercial profitability of sound synthesis on general computing platforms. There are several problem areas to establish and maintain a flexible audio development subsystem in VR. These include real-time scheduling and synchronization of sounds, graphics and control signals from interactive sensors. At a higher level of abstraction, grammars are needed for describing sound synthesis in relation to VR models and events. Traditional audio paradigms such as multi-track recording and sample-based or orchestra-score descriptions of sound production, were developed in an era when virtual experiences could only be imitated. From our experience these paradigms are not compatible with the coherent spatial and temporal models central to VR. In this paper we discuss sound production and performance that optimize the interactivity of modelbased virtual systems. We present a case study of a VR installation called Coney Island. The functional software roles of the Coney Island architecture include (1) a VR authoring and rendering system, (2) Sound Synthesis engines, (3) Spatial Audio DSP, (4) Sound Authoring, and (5) Scheduling and Synchronization of sounds with graphics and simulations. Software from several research centers was combined to fill these roles: (1) A VR authoring environment called ScoreGraph (Choi 1998) was used to create the Coney Island simulations, graphics, message passing and interactive scheduling. ScoreGraph provides the context that determines the requirements for interoperability of sound production modules. (2) Sound synthesis was provided by VSS (Bargar 1994), including synthesis engines from the STK toolkit (Cook 1995), and by jMax. (3) Distance and directional cues for sound sources were generated using Spatialisatuer (Spat) (Jot 1995) running in jMax. (4) VSS performs VR data interpretation to generate synthesis parameter control messages. (5) These messages are exchanged and scheduled in real-time by ScoreGraph and VSS. Our discussion will proceed from the VR environment to the VR software architecture, and then to sound production.
Top of page Top of page