Page  442 ï~~An Implementation of a 3D Binaural Audio System within an Integrated Virtual Reality Environment Rick Bidlack Banff Centre for the Arts Banff, Alberta rbidlack@acs.ucalgary.ca Dorota Blaszczak Banff Centre for the Arts Banff, Alberta blaszcza@acs.,ucalgary.ca Gary Kendall Northwestern University School of Music Evanston, Illinois gary@music.nwu.edu Virtual Reality at the Banff Centre The Banff Centre for the Arts has been engaged in research and development of interactive virtual environments since May 1992. This research is being driven by the specific needs of approximately a dozen artists who are being sponsored by the Banff Centre and who are exploring virtual reality (VR) as an aesthetic medium, many for the first time. Several of the works being produced use a VR helmet as the primary "venue." The helmet, or head-mount display (HMD) fits over the top of the wearer's head, and contains a pair of visual monitors mounted in front of the eyes, as well as an integrated pair of headphones for sound. This paper is concerned primarily with aural aspects of this environment, and with the implementation of a simulated three-dimensional audio field, given a binaural presentation over headphones. Sound Design for VR Environments Whether the audio playback occurs over loudspeakers or headphones, most VR environments make use of highly and elaborately spatialized sonic landscapes. In addition, the temporal relationship between events and sections, the flow of component materials in the sonic environment, is ultimately specified only at "run time," through the VR participant's interaction with the virtual space. Thus the sound designer of a VR installation must be concerned with two broad levels of concern, first being the basic construction and behavior of the sonic environment in time, as it responds to the participant's actions in virtual space; the second being the three-dimensional spatialization of sounds, and the choice of what materials should be spatialized and which should be treated in an ambient stereo field. Clearly, designing a sound environment for a VR installation is a great deal more involved 2D.1 442 ICMC Proceedings 1993

Page  443 ï~~than simply spatializing statically located sound source. Sound sources may arrive at the processing equipment by any number of means: as soundfiles, as samples in memory, as output from a CD player, as synthesized sound, or from a live microphone. Actions by the participant in the virtual environment cause sounds to be triggered or stopped, to be transformed in various ways, and parameters in the algorithms governing the realization of the sonic landscape to be changed. The MAX language [Puckette, 1991] is well suited to highly variable and interactive environments such as this. Within the virtual environment, a particular sound may be attached to an object in space which moves relative to the participant's point of view. This relative motion may be due to the object itself (i.e. it might be spiralling around the participant's head), or it could result from the head motions of the participant as she navigates through the space (i.e., a stationary sound source moves relative to the participant as she moves her head from side to side or up and down). In either case, a localization algorithm is used to create the illusion of the sound emanating from any spherical direction. The algorithm is. also capable of moving the sound smoothly from one point to any other. Many sound sources are presented to the participant as single point sources, but certain effects depend on the ability to present the sound source as a wide field. This requires the use of several localizers running in tandem. Head-Related Transfer Functions The physical properties of the torso, head and pinna (outer ear) determine how acoustical information is presented to the ears, in terms of phase shifts and attenuation changes at various frequency bands. These changes in the original signal are directionally dependant, and thus allow directional judgements to be made. Even though the head-related transfer functions (HRTFs) which describe these changes are very rich in acoustic detail, perceptual research indicates that the auditory system is selective in the acoustic information that it utilizes in making judgements of sound direction. Certain ambiguities in localization persist despite the information gained from HRTFs. For example, while front/back discrimination is possible on the basis of information contained in measured HRTFs, it is also clear that head movement plays a dominant role in resolving front/back differences [Wallach, 1940]. In addition, almost every researcher has noted that pinna transfer functions vary tremendously from one individual ear to the next. Numerous researchers [Butler & Belendiuk, 1977; Morimoto & Ando, 1983; Wrightman & Kistler, 1989] have demonstrated that it is quite possible for one person to utilize the spatial hearing cues recorded with another person's ears. It appears that some individual's HRTFs improve other individual's localization accuracy, but that large differences in HRTFs of individuals can undermine localization. The HRTFs used in the Banff Centre's binaural localization algorithm were derived from ICMC Proceedings 1993 443 2D.1

Page  444 ï~~acoustic measurements of the Kemar mannequin made at Northwestern University [Kendall et al, 1988]. The electronic measurement system was a Crown TEF-10 analyzer which implemented time-delay spectrometry. The analyzer produced a sine-wave sweep which was transduced by an MDM TA-2 loudspeaker and then recorded with a Knowles microphone capsule in the ear canal of the mannequin. Environmental sound was eliminated from the recordings by making all measurements within an anechoic chamber. The Kemar mannequin was mounted laterally on a rotating mechanical arm. Full rotation provided a 360-degree range of azimuthal directions for a fixed elevation relative to the mannequin's head and the measurement loudspeaker. The angle of elevation was determined by changing the angle between the mounting arm and the imaginary line between the center of Kemar's head. The measurements of Kemar were collected at 10 -degree resolution in both azimuth and elevation. Tests for repeatability showed that measurements made for the same angle at different times varied by no more than two dB. Reference measurements were also taken with the microphone held in free space at the same position as the center of Kemar's head. All the measurements were transferred from the Crown TEF analyzer to a network of general-purpose UNIX computers (Pyramid 90X and SUN workstations). In order to obtain free-field HRTFs, the reference measurements (which captured the combined microphoneloudspeaker response) were divided out of each Kemar measurement. Impulse responses were obtained through application of the discrete inverse Fourier transform. Current Implementation The current state of the art requires a fairly daunting collection of technologies to be brought together, and that a cohesive communication protocol be established between the individual devices. A minimal VR environment at the Banff Centre would typically consist of the following: an HMD is worn by the VR participant, whose head motions are tracked by a magnetic field tracker suspended from the ceiling. Data from the tracker passes through an interface via serial line to either a pair of SGI VGX computers, or to an SGI Onyx rack. A three-dimensional model of a previously constructed virtual space runs on the SGI. Based on the data received from the tracking device, the SGI then synthesizes a view of the virtual space that corresponds to the orientation of the participant's head in real space. To localize sound, data on head orientation as well as the positions of sound-containing objects in the virtual space are passed onto a NeXT computer via a private TCP/IP channel. The NeXT contains one or more Ariel/IRCAM ISPW DSP boards, running MAX. An custom "external" MAX object, called hrtfconv, is passed data for azimuth, elevation and distance of the sound to be localized. The complete set of HRTFs from Northwestern University is represented as an indexed array of pre-computed FFT's, which are then convolved with the source signal to affect directional transformation of the signal. In addition, non-localized sounds are processed on the ISPW boards, to perform the type of on-the-fly sonic construction discussed above. Two modes of localization are currently available, which determine the number of sources that can be separately localized at the same time. 2D.1 444 ICMC Proceedings 1993

Page  445 ï~~Running as a free-field spatialization algorithm, four sources can be localized on each ISPW board (two per processor). If room simulation is desired (reverberation and discrete echoes), then only two sources can be localized per board (one per processor). References R. A. Butler and K. Belendiuk, "Spectral cues utilized in the localization of sound in the median sagittal plane," Journal of the Acoustical Society of America, 61, pp. 1264-1269, 1977. G. S. Kendall, W. L. Martens and M. D. Wilde, "A spatial sound processor for loudspeaker and headphone reproduction," The Sound of Audio. Proceedings of the AES 8th International Conference, 1988. Morimoto and Ando, "On the simulation of sound localization," Journal of the Acoustical Society of Japan, 74, pp. 873-887, 1983. M. Puckette, "Combining event and signal processing in the MAX graphical programming environment," Computer Music Journal, 15:3, pp. 68-77, 1991. H. Wallach, "The role of head movements and vestibular and visual cues in sound localization," Journal of Experimental Psychology, 27:4, pp. 339-368, 1940. F. L. Wrightman and D. J. Kistler, "Headphone simulation of free-field listening, I1: Psychophysical validation, Journal of the Acoustical Society of America, 85, pp. 858-867, 1989. (EtherNet) ICMC Proceedings 1993 445 2D.1