Page  467 ï~~sndSpace: A Graphical 3-D Stereo Software Tool Gavin R Starks, Department of Music, 14 University Gardens, University of Glasgow, Scotland. G12 8QH Ken Linton, SORT, Banque Parabis, 33 Wigmore Street, London. WIH OBN knl Abstract Demands for spatial processing are increasing not only in music production, but in the rapidly growing field of virtual reality. A graphical interface has been constructed which allows users to easily manipulate a large number of sources. This approach aims to meet the need for high precision, multi-instrument spatialization without compromising simplicity of operation. Introduction Physical, mathematical and psychoacoustic theories allowing the accurate simulation of acoustic spaces are well established [Kleiner et al, 1993]. Research in the field of three-dimensional audio has focused mainly on commercial music processing, data visualisation and virtual reality (VR) systems. 3-D stereo creates convincing ambience and directional cues through headphones. Directional cues are obtained using digital filters which simulate the filtering performed by the shape of a listeners head and ears, referred to as the head-related transfer function (HRTF) [Blauret, 1983]. Ambience cues are achieved through the use of artificial reverberation [Sakamoto et al, 1975]. Processing implementations do not yet readily involve the user within the context that he or she may desire. Many commercial systems often allow a small number of inputs and are usually aimed at manipulating sound in a post-processing manner [Electronic Musician, 1992]. The development of flexible graphical interfaces, similar to those used in the design of graphical VR environments is required to facilitate 3-D audio processing. 1 Basic Psychoacoustic Theory Clearly the environment in which a sound emanates is a crucial factor when determining the resultant sound field heard by the listener. Physically accurate modelling of acoustic spaces yields computationally heavy algorithms. It is therefore desirable to obtain approximations which give satisfying distance and ambience cues and are more easily computable [Nielsen, 1993]. 1.1 Acoustic Modelling It is common practice to separate acoustic modelling into two areas: the 'early reflections' and the 'late reverberation'. Early echoes are quite distinct from the more isotropic form of the reverberant tail and both areas can be modelled accordingly. Early reflections can be calculated by considering them as sources in 'virtual rooms' surrounding the listening room. Evaluating the distance, direction and attenuation of these virtual sources from successive shells results in an accurate simulation of the early signal. 1.2 Reverberation Due to the computational load required to accurately model the dense pattern of echoes which form the reverberation process, algorithms based on recursive digital delay networks are used which attempt to simulate the natural environment. The main problem in this form of simulation is that, even if a large number of time-varying delay lengths are used, it is difficult to eliminate undesirable resonance [Moorer, 1979], [Jot, 1991]. 1.3 Head-Related Transfer Function In binaural simulation, the two most obvious parameters to consider are the inter-aural time delay (ITD) and inter-aural intensity difference (lID). Consideration of these cues, however, results in the localisation of sounds in only one dimension: between the ears. The human auditory system derives directional information by analysing the filtering performed by the torso, head, pinna and ear canal. Placing minute probe microphones in a listener's ear canals allows their individual frequency characteristics to be calculated. Measurements are made at a large number of angles around the head and the data from the resultant HRTF pairs applied directly as FIR (Finite Impulse Response) digital filters [Wightman and Kistler, 1989a,b]. Application of this form of binaural sound processing is clearly subjective, HRTFs vary from person to person, thus the effect of a single subjects measurements is reduced on other listeners. Frontback ambiguity and in-head localization are major problems, but they can be reduced by tracking the position of the listener's head [Begualt, 1991, 1992]. ICMC Proceedings 1994 467 Acoustics

Page  468 ï~~2 Implementation The application of artificial approximations of the environment in conjunction with HRTFs should result in a convincing 'virtual' sound space. In this implementation filters are created which simulate the effects of air and wall absorption. These are convolved (in the frequency domain) with the appropriate HRTF for each direct/reflected signal. Global reverberation is then added using a combination of comb and all-pass filters. 2,1 Csound Csound is an industry standard C based computer music language which allows users to manipulate signals with an extremely high degree of precision. Processing centres around two files: an 'orchestra' file and a 'score' file. The orchestra file contains definitions of how sounds ('instruments') are to be created or manipulated, and its associated score file contains the specific data pertaining to each event. In this application, routines have been incorporated into the structure of Csound which enable the user to accurately position an arbitrary number of sound sources in a defined space by the addition of a simple function call in their Csound programs. The orchestra function call is of the form asiglasig2 sndspac. asound, pathfile", xr, yr. zr, xu, yu, zu, xh,yh. zh,xs, ys, zs, wl, w2, w3, w4. w5Â~ w6 where asigl, asig2 are the (stereo) output signals from the calculations, sndspace is the name of the function, asound is the input sound, "pathfile" is an (optional) file containing all positional data (generated by the graphical interface), xr. yr. zr are the dimensions of the room (in meters) xu, yu, zu are the users coordinates, xh, yh, zh is the where the listeners head is pointing, xs, ys, zs are the source coordinates, wi..w6 are the absorption coefficients of the walls. Since the spatial processing routines incorporated into Csound are written entirely in C, they are easily portable across a number of platforms. Clearly entering this data manually is tedious, especially if parameters are to be varied (e.g. if the sound moves across the sound stage). To solve this problem, a graphical interface has been implemented to allow the user to enter all relevant data in a more intuitive manner. 2.2 Graphical Interface The main problem in the implementation of the interface was maintaining the flexibility of control offered by Csound without compromising simplicity of operation. NeXT workstations allow easy manipulation of sample data and NeXTSTEP provides excellent interface development tools. Figure 1 shows the two main control windows with which the user manipulates positional data. The Control Panel allows the user to initiate playback of any sampled sound, select the viewpoint for drawing and select the object to move (i.e. the Listener or the Source). Since the user is working with a 2D display, standard elevation, end elevation and plan views are used for plotting. As the sound is playing, the positional data entered using the mouse is recorded. Subsidiary control windows allow individual points to be joined linearly over a specified time period. An 'Edit Room' window allows the (rectangular) room dimensions and the absorption coefficients of its walls to be varied with respect to time. Example values are also given which can be easily used as template data. Once all data has been input, the user simply has to click on a single button to execute the processing: the application creates the relevant CSound orchestra and score files and executes them. The resultant sound can then be monitored and the environment data edited if the result is not satisfactory. Since the nature of Csound allows any number of instruments in an orchestra the user is not limited to the manipulation of a small number of sources. The Object-Oriented nature of NeXTSTEP facilitates the use of a large number of manipulation windows, allowing the user to enter positional data for a large number of sources. Although it is desirable to create an 'environment template', which is common to all sources, the user is not constrained to doing this. Positional templates can also be created and normalised to fit within a particular time scale. Basic sound editing is also supported, allowing simple cut/copy/paste operations within, or between sound files, sample rate conversion between 8013, 22050 and 44100Hz and format conversion between Linear, Mu Law and Compressed, mono and stereo formats. Acoustics 468 ICMC Proceedings 1994

Page  469 ï~~Figure 1. Main Control Windows 3 Applications Applications of 3-D Stereo are abundant [Kleiner et al, 1993], [Starks, 1993], [Cohen, 1992]. Music production has benefited from systems such as Archer Communication's 'QSound' and Roland's 'RSS' processor to enhance spatial image: they, however, aim to meet the requirements of that specific market. In its current state, computer generated music is extremely lacking in spatial information. Although many synthesizers currently support 'onboard effects', they usually constitute of chorus and reverberation: the directional properties of the instruments are ignored. The inclusion of more specific directional properties may result in more pleasing effects. Immersive VR systems are usually based around a helmet-mounted display (HMD) and 'data' glove. 3-D visual cues are related using a binocular stereoscopic viewing system and audio cues given over headphones. The position of the head and glove can be monitored using magnetic sensors above the operator, or micro-switches inside the apparatus. This allows the state of the visual and audio images to be updated as the operator moves, giving the impression of a 3-D environment. Desktop VR systems also allow the creation and visualisation of 3-D environments, only on a 2D computer screen. In most cases, the operator moves around using a 'Spaceball' (allowing 6 degrees of freedom). As with the immersive system, a high degree of interaction can be achieved. Recreation of three dimensional sound fields in VR is essential if the creation of any virtual 'world' is to be realistic. The benefits of VR graphics in areas such as data visualisation are obvious and are being developed at many sites. The additional benefits of 3-D sound are also beginning to be appreciated in a number of fields [Sherman, 1993], [Scaletti and Craig, 1993], [Kendall and Martens, 1988]. Development of effective methods with which to impart spatial information are therefore extremely important, not only in music, but in areas such as the design of auditory icons. The auditory system gives additional bandwidth to communicate information, particularly when there is a large amount of visual information. The use of sound in, for example, aircraft cockpits and EVA has proved extremely successful [Wenzel et al, 1990]. More diverse applications, such as visually and aurally mapping blood velocity to supplement diagnosis are beginning to appear. Research has also highlighted the importance of creating 'appropriate' audio icons for the sonification of data [Scaletti and Craig, 1993]. 4 Conclusion The development of graphical desktop VR systems appears to be sufficient to implement realistic 3-D graphics representations. It follows that acoustic manipulation can be approached in a similar manner, although ultimately an immersive application may be required to realise its full potential. This application has not implemented sophisticated 3-D graphical manipulation, but it is easy to envisage how external hardware (e.g. a VR glove or 'Spaceball') could be incorporated into the design of the interface. It is hoped that, once complete, this application will benefit both 'acoustic designers' and composers alike. ICMC Proceedings 1994 469 Acoustics

Page  470 ï~~Future developments include modification of the software to allow real-time manipulation. While this will require extensive software development and more powerful hardware (processing currently runs at approximately 8:1), parallel DSP configurations are commercially available (e.g. Ariel's Quint Processor and Ircam's ISPW). Current audio results are effective: although many of the problems detailed earlier have been encountered, their solution is in hand. References [Begualt, 1991] D.R.Begault, Challenges to the Successful Implementation of 3-D Sound, J. Audio Eng. Soc., 39, No. 11, 1991 [Begualt, 1992] D.R.Begualt, Perceptual Effects of Synthetic Reverberation on Three-Dimensional Audio Systems, J. Audio Eng. Soc., 40, No. 11, 1992 [Blauret, 1983] J.Blauret, Spatial Hearing: The Psychophysics of Human Sound Localization, MIT Press, Cambridge, MA, 1983 [Cohen, 1992] M.Cohen, N.Koizumi and S.Aoki, Design and Control of Shared Conferencing Environments for Audio Telecommunication, Proceedings of the 2nd ISMCR Conference, 1992 [Electronic Musician, 1992] Electronic Musician, October 1992, 8, No. 10, pp. 38 [Jot, 1991] J.Jot, Digital Delay Networks for Designing Artificial Reverberators, Audio Eng. Soc. Preprint, No. 3030 (E-2), 1991 [Kendall et at, 1988] G.S.Kendall, W.L.Martens and M.D.Wilde, A Spatial Sound Processor For Loudspeaker and Headphone Reproduction, Proceedings of the AES 8th International Conference, 1988 [Kleiner et al, 1993] M.Kleiner, B.Dalenack and P.Svensson, Auralization - An Overview, J.Audio Eng.Soc., 41, No. 11, 1993 [Moorer, 1979] J.A.Moorer, About this Reverberation Business, Computer Music Journal, 3, No. 2, pp. 13-28, 1979 [Nielsen, 1993] S.H.Nielsen, Auditory Distance Perception in Different Rooms, J. Audio Eng. Soc., 41, No. 10, pp. 755-769, 1993 [Sakamoto et at, 1975] N.Sakamoto, T.Gotoh and Y.Kimura, On 'Out-of-Head Localization 'in Headphone Listening, Proceedings of the AES S2nd Convention, 1975 [Scaletti and Craig, 1991] C.Scaletti and A.B.Craig, Using Sound to Extract Meaning form Complex Data, Extracting Meaning from Complex Data: Processing, Display, Interaction 11, SPIE, 1459, 1991 [Sherman, 1993] W.R.Sherman, Integrating Virtual Environments into the Dataflow Paradigm, Fourth Eurographics Workshop on ViSC, 1993 [Starks, 1993] G.R.Starks, Virtual Reality Audio Applications, M.Mus. Dissertation, University of Glasgow, 1993 [Wenzel et al, 1990] E.M.Wenzel, P.K.Stone, S.S.Fisher and S.H.Foster, A System for Three-Dimensional Acoustic 'Visualisation' in a Virtual Environment Workstation, Visualisation 90, IEEE Computer Society Press, pp. 329-337, 1990 [Wightman and Kistler, 1989a] F.L.Wightman and D.J.Kistler, Headphone Simulation of Free-Field Listening i: Stimulus Synthesis, J. Acoust. Soc., 85, pp. 858-867, 1989 [Wightman and Kistler, 1989b] F.L.Wightman and DJ.Kistler, Headphone Simulation of Free-Field Listening ii: Psycho-physical Validation, J. Acoust. Soc., 85, pp. 868-878, 1989 Acoustics 470 ICMC Proceedings 1994