Page  00000266 Orchestra Spatialization Using The AUDIENCE Engine Leandro F. Thomaz, Regis Rossi A. Faria, Marcelo K. Zuffo and Joao Antinio Zuffo LSI - Laboratory of Integrable Systems, Polythecnic School, University of Sdo Paulo {lfthomaz, regis, mkzuffo, jazuffo} Abstract Orchestra spatialization is a feature much requested nowadays in music composition but results of experiments with spatialization are not easy to evaluate without a real orchestra available to the composer or conductor. This paper describes an implementation of an application developed under the AUDIENCE engine to help such experiments. The proposed application has the purpose of extending the possibilities in orchestration exploring relevant spatial aspect and providing support for usual or exquisite setups. We conceived a virtual quartet ensemble performing in a real-time editable scenario. It is presented the built system and experimental results. 1 Introduction Audio technology has been providing progressive increases in sound rendering capabilities at the consumer terminals, and this is changing dramatically the way music is perceived and enjoyed. Sound sources motion, distribution and positioning within the auditory space is an actual important feature in modern sound design, which turns possible the realization of bizarre composers' and producers' ideas, and innovative auditory configurations. The capacity to locate sound sources in the space around the listener is a much requested characteristic for a realistic presentation of musical pieces, soundtracks and for interactive electronic games. It is important in such a way to guarantee the expression of the original ideas of the composer, the conductor, producer or arranger, and for calibrating an optimal sound sources configuration for final presentation, considering the acoustics of the place. However, it is not always feasible to fulfill complex experimentations of orchestra spatialization either in rehearsal or in real performances. 2 Orchestra spatial composition Composers and conductors have been exploring the spatial configuration of the orchestra body systematically for half a century. Musical pieces using spatialization were conceived by composers such as I. Xenakis (Terretektorh, 1965-66), for 88 instruments spread at the auditorium; R. Murray Schaffer (Apocalypsis, 1976-77), for 12 choirs disposed in a circle; and K. Stockhausen (Gruppen, 1955-57 and Spiral, 1970), for three orchestras surrounding the listeners and loudspeakers spread in spherical form around the audience. More recently, Boulez (1988) conducted some spatialization experiments at IRCAM, e.g., in his piece Repons (1985), where the audience attention is driven from the orchestra, at the center, to soloists' sounds that keep moving around them. In Brazil, experiments with orchestra spatialization were made by Flo Menezes (Parcours of l'Entite, 1994, and Harmony of the Spheres, 2000). In the first piece, a flutist walks in the stage during all the presentation, exploring an unusual interaction with the audience with his continuous motion. With spatialization the composer has lots of possibilities for expanding the interest for his composition, but frequently it is very difficult for him to anticipate the results of a performance. Ideally, he could have an orchestra at his disposal to carry on spatialization experiments. This situation, however, is almost impossible due to the high cost of mobilizing an orchestra, leaving for the composer just the alternative of a mental image of his spatial composition. r igure 1..i scene or a musical ensemote spatianization experiment, with the listener surrounded by a violin, a cello, a trumpet and a flute. An interactive computational tool is of great value to help the composer and producer in the spatial composition process. The system described in this paper has been used for building and experiencing certain orchestral setups of 266

Page  00000267 particular interest for solving instruments spatialization challenges in music composition. The musical problem chosen for this experiment is the spatialization of a little orchestra: an ensemble quartet, consisting of a flute, a trumpet, a violin and a cello playing a quartet piece, as seen in figure 1. The position of the listener and the instruments can be freely changed, allowing an immediate appreciation and assessment of the sonic impact of the chosen disposition. 3 The AUDIENCE auralization engine A flexible and scalable auralization engine is under development in the AUDIENCE Project - Audio Immersion Experience by Computer Emulation - at the CAVERNA Digital of the University of Sao Paulo, a CAVE environment for complete immersive virtual reality. The main objective is to investigate and to provide solutions for multichannel sound immersion, integrated or not with a virtual reality system (Faria et al 2005). The architecture for spatial sound generation in the AUDIENCE engine is based on a modular approach consisting of four functional layers: acoustic scene description, acoustic simulation, spatial sound coding, and spatial sound decoding and reproduction, as illustrated in figure 2 (Faria 2005). This scheme allows the usage of different techniques and tools in the implementation of functions associated to each layer, provided that a prior set of convenient interface signals are adopted. -----------. --i- ------ -------- pass the scene information (such as the one used in this experiment) or some standardized parameter set, as in MPEG-4 Audio BIFS or X3D data structure. Next, the acoustic simulator layer calculates the acoustics propagation from sound sources to the listener, rendering the correct ambience for the (virtual) room. On the spatial audio coding layer, the localization cues and the environmental acoustic response (the spatial information) is convolved with the anechoic sound source, producing proper spatial coded audio signals through the chosen soundfield coding technique. The final layer is responsible for mixing all the sources, decoding the spatially coded audio signals, and reproducing the sound field through several possible output formats, either binaural or through multichannel speakers rigs. 4 Technologies and Infra-Structure In this experiment, the virtual sound scene soundfield is produced by eight LANDO speakers disposed in octagonal array surrounding the listener, as show in figure 3. A small angular inclination was imposed to the speakers downwards, to compensate for its original height designed for projecting soundfield into the elevated CAVE platform. LAYER 1 i ACOUS SE NSC SCOMi smnN Ig VBCRE POnN' LAYER ii Z...... | i SI: I N,; b. iy, ^1.' jatx -.^ ^4' s:;s |$t~~ris~el.API LAYER 3.<....................... *................................ 4: 11P R I Qwr 4 I 4T 8~ ULi 33-Qf.......................................................................... ~,~~?S`~S~,~~;:~~~A~' I'l:D `Y~:P M-AYW A.-~~aa~ Figure 2. A Subset of the AUDIENCE 4-layer architecture used in the current implementation. The acoustic scene composition layer acts as the interface to the user: e.g. the composer, conductor or listener, who defines the configuration of the room, the position of the instruments and his virtual location in the concert room. This layer can use a direct parameters set to Figure 3. Octagonal 2D speaker rig used in the experiment. Speakers are fed by two 4-channel SANKYA power amplifiers, with the audio signals coming from a M-Audio Deltal010 multichannel audio interface. The software runs in Linux or Windows~ operating system. The spatialization technique used in this experiment is Ambisonics, proposed by Gerzon (1973), allowing the recording, manipulation and reproduction of 2D and 3D sound fields, real captured or artificially generated. It is a two part technological solution, because it permits the coding and reproduction independently, in a way that it is not necessary to concern about the reproduction system during the recording or synthesis of the soundfield. The transmission format is known as B-Format and consists of a multichannel stream of at least four channels (W, X, Y, Z for a 1st order Ambisonics system). Manipulating psychoacoustics parameters can be done at the decoding process, enhancing spatial sound cues to help 267

Page  00000268 our hearing system in localizing sound sources. A shelf filter can be used to treat individually audio signals below and above 700 Hz, since our hearing system mostly discerns the low sounds by phase difference and high sounds by amplitude difference (Gerzon 1974). According to Gerzon (1980), the higher the system order, the best is the soundfield reproduction and sweet spot stability. The system order also determines the number of channels to be used and the minimum number of loudspeakers required. This technique is scalable: higher orders are obtained by adding channels to the existing ones. The limit is the computational process and the transmission bandwidth or storage size conveying these channels. One of the great advantages of Ambisonics is to use a fixed number of audio channels (depending on the system order) independently of the number of speakers used in the reproduction. It is possible, e.g., to use a cubic (eight speakers) reproduction rig with just four channels. Best results are obtained with a larger number of speakers disposed in a regular array surrounding the listener (Gerzon 1980). In this work we used a 1st order system, with four channels and a regular rig of eight speakers. We are using PureData (Pd) as the platform to build and patch the audio processing functional blocks of the system. Pd was developed by Pucket (2006), and it is an open, free and flexible graphical programming environment for musical applications and audio. It was mainly chosen due to its scalability and low latency capabilities. 5 Implementation Next, we present the blocks that implement the main layers functions, made in the C language as Pd externals. The blocks in the Pd patch are shown in figure 4. The main function of the acousticsim module (block 4 in figure 4) is to execute the acoustic simulation of the room. For this experiment, a rectangular room, with no obstruction, was chosen. We used a ray-based acoustic simulator, an adaptation of the source-image method described by Allen (1979). A reflection in this technique is based on geometry optics laws and comes from a virtual source-image (localized behind the wall) properly attenuated. Direct sound and all reflections of a sound source and their acoustic path to the listener are calculated and represented as impulse responses (IR). The number of rays and reflections is only limited by the simulation time frame length. In this experiment we used frames of 200ms, which retain all first reflections and initial reverberation, though other values can also be set. The acousticsim outputs a B-Format compliant IR set. The spatialcoder (block 5 in figure 4) encodes the soundfield by convolving the anechoic source audio with the B-Format IR signal set calculated in the acousticsim block, using the Ambisonics equations (Gerzon 1980 and Malham 1995). For a given sound source and listener coordinates, it positions the source in the 3D space, producing the audio channels encoded in B-format. The spatialdecoder (see block 6 in figure 4) developed is basically an Ambisonics decoder of first and second order. This block receives the audio signal in BFormat and decodes for a given array of speakers, reproducing the soundfield encoded by the block spatialcoder. A mixing is made before decoding, to group all the sound sources signals in one B-Format vector. The amplitude gain matrix for several speaker rigs was previously obtained by Furse (2006). These gains are applied to each B-Format input channel, with a unique weight set for each speaker, and summed up. In this version we used unitary gains for the psychoacoustics shelf filters described by Gerzon (1980) and no equalization for local acoustics de-reverberation. 6 Experiment The experiment patch (figure 4) uses the modules described before, as well as some internal Pd components. Four acousticsim-spatialcoder pairs are necessary to generate the spatial encoded audio for the whole quartet scene, one for each instrument (sound source). The room parameters, anechoic soundtracks and the positions of the instruments and listener are input to the system. The configuration of the room is made by interactive editing its parameters fields (see block 1 in figure 4). The listener and sound sources positions are controlled by xyz fields (see 2). These two components perform a layer-1 scene description. An internal sub-patch (3) loads the wave files. Layers 2, 3 and 4 of the AUDIENCE architecture are implemented respectively in blocks from 4 to 6 (figure 4). The instruments sounds in this experiment were artificially synthesized from a polyphonic score. We have tried very flexible distribution scenarios for the four instruments (flute, trumpet, violin and cello), repositionable in the virtual auditory space through the graphical interface. The experiment was auralized by an Ambisonics first order reproduction system, within a bi-dimensional speaker array (no height sensation) and reproduced through an eight speakers rig, shown in figure 4. Experiments were extensively made with several positions to assess both musical impact of changes and the potential of the system in such application field. 7 Results At first, the acoustic simulator used presents a good ambience reproduction, allowing a perception of the environment depth, though the sound source localization was dubious in some cases. To improve that, the source direction was reinforced with an additional Ambisonics spatial coder, embedded in Pd blocks. A critical issue found was the small sweet spot obtained 268

Page  00000269 AUDIENCE - ICMC-2006 - 4 sources Figure 4. Experiment Patch, inside de speakers rig. To obtain a stable perception, the listener was forced to reduce his movements and stay still in a center position. With the additional directional encoding, we were able to achieve a much better localization perception, and the system response was good enough to have an interactive control with no delays. 8 Conclusion and Future Works With the results obtained, we conclude that the system is very promising for utilization in musical applications, and that it can be a valuable tool for composers and conductors to experiment with orchestra spatialization at a low cost. The system is at an initial stage and many improvements shall be incorporated. The capability of adding more sound sources makes the system more useful to experiment with space and sonic imaging for mid-size orchestras, or larger. Improvements in the user interface (e.g., adding a joystick) also may enhance the usability for the composer/conductor to drive the spatialization and control positions. Speakers rigs positional calibration and higher order Ambisonics (HOA) are expected to enlarge the sweet spot and improve the quality and stability of the soundfield, freeing the user from a stand position at the center of the rig. An increase in the system order will require more loudspeakers, since their number must be equal or greater than the number of channels (Daniel 2003). A 3D configuration adding the notion of height is under implementation. This change only needs a new speaker rig (e.g., cubic), since the software already permits this. Finally, musicians, composers and conductors will be asked to experiment the system and give feedback for further fine tuning and future improvements. ýaE 3 with the four instruments. References Allen, J. B., Berkley, D. A. Image method for efficiently simulating small-room acoustics. Journal of the Acoustical Society of America, v.65, n.4, pg. 943-950, Abril, 1979. Boulez, P., Gerzso, A. Computers in Music. Scientific American, v. 258, n. 4, April, 1988. Daniel, J. Further Investigations of High Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging. 114th Audio Engineering Society Convention, Amsterdan, 2003. Faria, R. R. A. Auralization in immersive audiovisual environments. PhD Thesis in Electronic Engineering. Polytechnic School of the University of Sdo Paulo, 2005. Faria, R. R. A. et al. AUDIENCE - Audio Immersion Experiences in the CAVERNA Digital. Anais do 10~ Simp6sio Brasileiro de Computacgo Musical, p. 106-117, Outubro, 2005. Faria, R. R. A. AUDIENCE - Audio Immersion Experience by Computer Emulation. /nem/audience/. Last Access: 11 Mar 2006. Furse, R. First and Second Order Ambisonic Decoding Equations. Last access: 11 Mar 2006. Gerzon, M. Periphony: With-Height Sound Reproduction. J. Audio Eng. Soc., Vol. 21, No. 1, pg. 2-10, January/February, 1973. Gerzon, M. Surround-sound psychoacoustics. Wireless World, p. 483-485, December, 1974. Gerzon, M. Practical Periphony: The Reproduction of Full-Sphere Sound. Preprinted at the 65th Audio Engineering Society Convention, London, 1980. Malham, D., Myatt, A. 3-D Sound Spatialization using Ambisonic Techniques. Computer Music Journal, 19:4, p. 58-70, Winter 1995. Puckette, M. Pd Documentation. Pd documentation/. Last access: 11 Mar 2006. 269