Page  1 ï~~SPECTRAL SPATIALIZATION - AN OVERVIEW David Kim-Boyle Department of Music, University of Maryland, Baltimore County ABSTRACT Several frequency-domain spatialization techniques ranging from spectral delays to the use of emergent systems for the mapping of more sophisticated spatial movements are described and a recently developed stochastic spatializaion technique is also presented. Interface design considerations are described and perceptual issues of spectral spatialization are discussed throughout. Musical applications of the techniques by the author and by other composers are also presented. 1. INTRODUCTION While frequency domain processing techniques have been used in many ways to transform the timbral qualities of a sound, their application for spectral spatialization has been comparably limited. [1,8,9] In this paper several techniques for spectral spatialization will be described ranging from the use of spectral delays, which depend on the precedence effect for their spatialization effects through to the use of emergent systems which can simulate the spatial trajectories of more complex natural movements. The spatialization of a sound's spectral content enables the musical exploration of spatial percepts to a degree unobtainable with traditional spatialization techniques. While sound spatialization has always been a fundamental part of the language of electroacoustic music, spectral spatialization enables new timbral identities and morphologies to be explored. A number of musical applications of the techniques in the author's own creative work will be be described. All of the techniques developed have been implemented in the MaxMSP/Jitter programming environment for live performance in a multichannel playback system. 2. SPATIALIZATION TECHNIQUES 2.1 Spectral Delays Delaying the resynthesis of individual FFT bins of a shorttime Fourier transform can create musical effects not obtainable with traditional types of delays. When those delays are applied to sounds reproduced through the individual channels of a multi-channel playback system, unique spatialization effects across spectral bands can be realized. For example, if the delay on the first twenty FFT bins on the right channel of a stereo signal are increased over time, a gradual panning to the left of frequencies below around 860Hz, for a 1024-point FFT, will occur. Frequencies above 860Hz will remain spatially stable. The technique can be implemented in MaxMSP by resynthesizing FFT frames from delayed FFT bins with the delay times, measured in integer multiples of the FFT length, determined by indexing a user-defined buffer. Amplitude scaling functions, read from another user-defined buffer, can also provide control over the frequency response of each channel. Figure 1 outlines the basic FFT patch used to resynthesize an input signal. FFT Bin Index index's del1- index; (_ I PsecraIFramesie i To iFFT Figure 1. Spectral Delay Patch. The delay values are defined by the FFT length and the size of the window overlap. With an unwindowed, 2048 -

Page  2 ï~~point FFT at a sampling rate of 44100Hz the minimum delay time is 46.44ms. The minimum delay time of an unwindowed, 1024-point FFT is 23.22ms. Windowed FFTs allow these delay times to be reduced, simulating interaural time differences between channels which can be in the order of only a few milliseconds. Even with relatively large delay times, the precedence effect allows distinct spatial images to be realized. As noted by Wallach, Newman and Rosenzweig in their seminal study of the precedence effect, [16] the ability to localize sound through the precedence effect is largely affected by the nature of the sound itself. The author has found this to be borne out in practice with the spatialization of sharp, transient sounds not able to be perceived as clearly as the spatialization of sounds of a more continous, complex nature. By performing an additional FFT analysis of a control signal, it is possible to establish correlations between the strength of various harmonic components and the corresponding delay times for the FFT bins. For example, strong harmonic components may produce long delay times for those corresponding bins while weaker harmonic components may produce shorter delay times. Other interesting results can be obtained through gradually crossfading from one set of delay values to another - for example from random, noise-like values to user-defined values. This application was used in a recent work by the author for piano and computer and its implementation is presented in Figure 2. ControI Signal m in ' 1Ifftin' pp r |por:f e layE expectancies, x, y and z coordinates and velocities. The number of particles emitted per frame and their life expectancies can be modified. The jit.p.vishnu object applies global forces to the matrix generated by the jit.p.shiva object modifying the x, y and z coordinates and velocities of the particles. The angle of particle emission relative to the horizontal plane and the speed of particle emission are among the variables that can be modified. The jit.p.bounds object confines the particles in a matrix to specific x, y and z limits and also determines the behavior of the particles when they reach the matrix boundaries - particles may either bounce, wrap around, remain at the boundary or die. The jit.p.bounds object also allows the elasticity of the system boundaries to be modified. Complex patterns can be created when the particle system parameters are dynamically updated. Snapshots of the particle systems enabled by these three objects are shown in Figure 3. Figure 3. A particle system with a narrowly bounded particle stream (left) and a particle system with a narrowly bounded particle stream that skews to the left (right). The x, y, and z coordinates of each particle in the generated matrix is indexed with the jit.peek- object and written to a buffer as illustrated in Figure 4. A particle's coordinates are stored in the first row of a n column matrix where n is the number of particles, hence the use of a constant zero index. The second row of the matrix contains the value of the previous frame. r-pr-croisl-e-. ti nl i Figure 4. Storing matrix values in signal buffers. Figure 2. Signal control of delay times. 2.2 Particle Systems While particle systems find their natural realization in granular synthesis, their use can enable the realization of spatialization techniques that simulate more complex types of natural movements, such as the movement of a cloud of smoke or the foam of a wave. In the Jitter programming environment, there are three objects used to model particle systems - jit.p.shiva, jit.p.vishnu and jit.p.bounds. The jit.p.shiva object generates Jitter matrices where particles have unique ids, life

Page  3 ï~~Before the coordinates of the particles are written to buffers they can be further modified with basic mathematical functions. This can help realize spatial trajectories difficult to achieve with the jit.p.vishnu object. For example, by applying simple sine and cosine functions to the x and y coordinates, the particles can be made to rotate around the x and y axes, see Figure 5. Figure 5. Rotating particle coordinates around the x and y axes. After the final x and y positions have been determined and stored in buffers, those values can be addressed from an FFT and used to map the position of the spectral components of a sound. With the x coordinates providing left/right position and the y coordinates providing front/rear position it is a simple matter to map each FFT bin to a discrete location in a quadraphonic space. With the number of particles equal to half the FFT length, the magnitude of each bin is simply multiplied by four coefficients. These coefficients are obtained by indexing sine functions with the x and y values of each particle. The implementation is similar to that outlined in Torchia and Lippe's work on frequency-domain based spatial distribution. [15] The author has also experimented with the use of Ville Pulkki's vbap object because of its flexibility in regard to loudspeaker configurations. [4] The very simple, sine-based panning patch is illustrated in Figure 6. FFT Bin Index Bin Mag Si i Em FL FR RL RR Figure 6. Mapping particle coordinates to a quadraphonic spatial location. While the z coordinate does not have a lot of ready application it can be used to define movement in the median plane, allowing trajectories to be defined above the listener. Given that most performance environments do not have loudspeakers positioned above an audience, it does not have great practical utility. Consideration must also be given to the types of sounds projected along the z axis as localization of sound in the median plane is also greatly affected by the sound quality. [3] A more useful application of the z.. coordinate can be obtained by mapping the z position of particles to a global amplitude, inverse-square gain control for each particle. This can give a crude impression of depth. One of the attractive qualities of particle systems is their ability to model or visually mimic natural phenomena. In order to better model such systems with standard Jitter particle system objects, however, it is necesary to implement an additional logic whereby particles interact with one another. With the jit.p.vishnu object, all of the particles experience uniform forces. A more complex particle interaction, where particles might collide, have different masses and momentums, and experience forces from other individual particles can allow more interesting systems to be modelled. Such a model can be implemented with boids. 2.3 Boids Using Craig Reynolds's well known boids algorithm, [12] as implemented in a Max object designed by Eric Singer, [14] the harmonic components of a sound can be spatialized in complex flocking patterns. While Reynolds's work has been used in granular synthesis applications, [2] to the best of the author's knowledge it has been used on only one other occasion for sound spatialization. [5] Reynolds's study of the movement of flocks of birds is well documented and has been implemented in a number of

Page  4 ï~~algorithms. [10,6] Reynolds identified three primary factors that determined the steering behavior of the flock - separation, alignment and cohesion. These three parameters refer, respectively, to the preferable distance one bird would maintain from another, the tendency of a bird to fly towards the average heading of its local neighbors, and the tendency of a bird to steer towards the average position of its local neighbors. [12] Using these parameters in computer animation applications, Reynolds was able to realistically simulate natural flocking movements. Based on Reynolds's original analysis, Eric Singer's boids object is a Max object which simulates flocking movements in two and three dimensional space. Originally developed for use in Robert Rowe and Doris Vilas's A Flock of Words (1995), [13] Singer's object provides the x, y and z location of each boid as its flight is affected by various flock control parameters. While some of these parameters correspond to Reynolds's original analysis, for example the centering instincts of the boids, the tendency of a boid to avoid its neighbor or the attraction of a boid to a particular point, other parameters are unique - the inertia of the boid or its willingness to change speed and direction and the number of neighbors a boid consults when flocking. The window displaying the boid's movements is shown in Figure 7. Figure 7. Singer's Boids window showing the spatial location of each boid. In Singer's object, the trajectory of the flock is determined by a control device such as a mouse or trackpad. By changing the attraction, inertia and speed parameters, the tendency of the flock to respond to the point-location of the controller can be radically affected. While defining trajectories with the mouse is useful at times, the author has also created simple circular trajectories, defined trajectories with various alternate controllers, and modified trajectories with performance data. As in the particle system implementation, the coordinates of each boid can be modified before being written to Jitter matrices. This can help transform the boid's movements and subsequent spectral spatialization in ways not possible with the boids object alone. The author has implemented a number of useful transforms as illustrated in Figure 8. These transforms include the ability to add an offset to a boid's position, to squish the boids into a particular region in space, the ability to apply a separate rotational force to boids, and the ability to map regions in space into which the boids will not fly. Figure 8. Transforming x/y positions. From top left, a) not modified (rendered as lines), b) with an x offset (rendered as lines). From bottom left, c) with squished y coordinates (rendered as lines), d) avoiding the upper left quadrant (rendered as points). The position of each boids is graphically rendered through OpenGL for immediate visual feedback. Individual boids are also mapped to different colors - lower numbered boids to reds, higher boids to violet and those in between, the orange through indigo color spectrum. This is useful for identifying the spatial location of certain spectal bands. While it can be difficult to track the individual movements of boids with this display, it does provide useful and immediate information on the flock's general shape, dispersion and density. As outlined in Section 2.2, the final spatial location of each boid is eventually used to modify the magnitude of discrete FFT bins across various outputs. The general patch for this is illustrated in Figure 9 with vectral- objects used for interpolating positions between FFT frames and the z position once again used as an overall inverse-square gain control. fftin 1 f........................... rm p.........index y i g u e1 9ftu2 FTSpatu Tititot4 Figure 9. FFT Spatialization patch.

Page  5 ï~~An especially interesting spectral transformation involves squishing the spatial distribution of the flock through squishing boid positions. With a high squish factor, all the bins of the FFT can be spatially distributed to one point source - or in other words, the sound will simply appear to come from one location. As the squish factor is increased the spectrum can be dislocated in the x/y plane. A rendering of this is illustrated in Figure 10. Figure 10. Dislocating a spectrum through the squish factor. The ability to perceive the various spatial movements of a sound's spectral bands depends on many factors including the size of the FFT, the speed of the boid's movement, the physical size of the playback environment and the types of sounds spatialized. As noted by Kendall, [7] careful selection of sound sources can make spatial percepts a compositional area to explore. With natual sound sources, whose timbre's are dependent on features such as attack transients, the spatial distribution of the sound's spectral content can be an especially interesting musical effect. For most practical purposes the author has found that there is little need to use FFT sizes much greater than 256 bins as the various individual trajectories cannot be discretely perceived but instead tend to coalesce into complex spatial gestalts. This is supported by Bregman's work on auditory schemata governing timbral segregation. [4] Reduced FFT sizes also have the benefit of simplifying the visual information presented. The use of envelopes to control boid parameters has proven to be musically useful. Rather than having the various parameters remain fixed they can change as defined by a simple envelope generator. The author has also experimented with the manipulation of parameters from performance data. For example, it is fairly simple to change the speed that the boids fly around with data obtained from a real-time timbral analysis. Sounds with a high noise content might be mapped to low speeds and sounds with a more oscillatory content might cause the boids to move more quickly. In the author's own creative work, the implementations discussed have been applied in the work whisps (2006) for bass clarinet and computer. In whisps, bass clarinet multiphonics are analyzed with the strength of the harmonic components used to modify the spatial trajectories of the computer processed sounds. The attraction of one boid to another is also decreased as the piece develops which results in a disintegration of various timbral structures. Two separate flocks are employed throughout the piece - one being used to determine the spatial location of the bass clarinet's harmonic components and the other, the overtly computer generated sounds. These flocks sometimes mimic each other's flight patterns while at other times are symmetrically opposed. 2.4 Stochastic Spatialization and Interface Concerns A more recent stochastic spatialization technique has been developed which has several advantages over those techniques already described. In this method, the spatial location of FFT bins is randomly assigned by using a noise control signal. The bins are contained within boundaries defined by a global X/Y paramter and the overall location of the bins within two-dimensional space can be controlled with a mouse, joystick or assigned to circular trajectories. A spin control is also used to introduce a bias in the x location of the bins. The interface for these various controls is presented below. Figure 11. Stochastic Spatialization Interface. An additional control parameter is introduced with a dynamic threshold control. For this parameter, the spatial mappings of bins is only implied if the magnitude of a bin exceeds a defined threshold. If the magnitude is less than the threshold it is mapped evenly to all outputs. While the spatial controls of this interface are minimal, they do address one of the problems of FFT spatialization and that is the development of an interface with the power to control and transform coordinates for hundreds of particles which at the same time does not overwhelm the user with massive banks of control data. In the author's own creative work, the simple controls presented in Figure 11, have proven most useful. While the ability to control the flocking movements of boids with vast banks of data has its interest, a greatly simplified bank of controls has proven valuable. Aside from interface concerns, such an approach is also informed by perceptual considerations. As implied in Section 2.3, in typical FFT processing it is not possible to aurally perceive the fine degrees of control over bin distribution which a boids interface usually makes possible. Listening tests by the author have suggested, that with finer and finer control, the tendency of the ear to coalesce spectral

Page  6 ï~~components into spatial images widely dispersed is greatly increased. [4] Perceptually, the use of a noise control source is virtually indistinguishable from the use of boids. In addition, the simple controls used in the stochastic implementation can each be readily perceived unlike the fine degree of control used in the previous boids implementation. The removal of these extra levels of control has many interface benefits and there is little qualitative loss. Building on this reduced but perceptually informed approach to interface control, a visually intuitive method of amplitude scaling has also been implemented in the above technique. Through the use of gray-scale Jitter windows, controls typically used for transforming graphics have been applied to spectral processing. For example, brightness controls correlate to amplitude, zooming functions to band passes, and simple offsets to center frequency locations. These are all illustrated below. Figure 12. a) Top left - A jitter window with whiter shades representing louder amplitudes, b) Top right - with brightness reduced, c) Bottom left - with a zoom applied and a bandpass filter created, d) Bottom right - with an offset and zoom applied creating a a bandpass filter shifted to higher frequencies. 3. SUMMARY The use of frequency domain processing for sound spatialization has great potential for musical application and the author has employed it through various techniques in a variety of recent compositions. While all of these applications have been developed for real-time performance, the use of the technique for acousmatic composition would also seem to have considerable application. 4. ACKNOWLEDGEMENTS The author would like to thank Prof. Cort Lippe, from the State University of New York at Buffalo, Dr. Ludger Brtimmer and the staff at ZKM and Dr. Michael Alcorn and the staff at SARC for providing facilities where much of the work on these techniques was developed. 5. REFERENCES [1] Barreiro, D. Open Field. 2006. [2] Blackwell, T. and M. Young. "Swarm Generator", Applications of Evolutionary Computing: Evo Workshops 2004: EvoBIO, EvoCOMNET, EvoHOT, EvoASP, EvoMUSART, and EvoSTOC, Coimbra, Portugal, April 5-7, 2004, G. Raidl et al. (Eds.), Springer, Berlin, 2004, pp. 399-408. [3] Blauert, J. Spatial Hearing - The Psychophysics of Human Sound Localization. MIT Press, Cambridge, 1996. [4] Bregman, A. S. Auditory Scene Analysis - The Perceptual Organization of Sound. MIT Press, Cambridge, 1990. [5] Davis, T. and P. Rebelo. "Hearing Emergence: Towards Sound Based Self Organization", Proceedings of the 2005 International Computer Music Conference, International Computer Music Association, Barcelona, 2005, pp. 463-466. [6] Gombert, J. "Real-Time Simulation of Herds Moving Over Terrain", Available at < f>, 2005. [7] Kendall, G. 2007. "The Artistic Play of Spectral Organization: Spatial Attributes, Scene Analysis and Auditory Spatial Schemata", Proceedings of the 2007 International Computer Music Conference, International Computer Music Association, Copenhagen, 2007, pp. 63-68. [8] Lippe, C. termites. 2006. [9] Lippe, C. Music for Snare Drum and Computer. 2007. [10] Parker, C. < ml>. 2002. [11] Pulkki, V. "Virtual Sound Source Positioning Using Vector Base Amplitude Panning", Journal of the Audio Engineering Society, 45, 1997, pp. 456-466. [12] Reynolds, C. 1987. "Flocks, Herds and Schools: A Distributed Behavioral Model", Computer Graphics 21(4), 1987, pp. 23-34. Also, Reynolds, C. "Boids: Background and Update", <>, 2001. [13] Rowe, R. Machine Musicianship. MIT Press, Cambridge, 2001. [14] Singer, E. <>. 2007. [15] Torchia, R.H. and C. Lippe. "Techniques for MultiChannel Real-Time Spatial Distribution Using Frequency-Domain Processing", Proceedings of the 2003 International Computer Music Conference,

Page  7 ï~~International Computer Music Association, Singapore, 2003, pp. 41-44. [16] Wallach, H., E. Newman and M. Rosenzweig. "The Precedence Effect in Sound Localization", The American Journal of Psychology, Vol. 62, No. 3, 1949, pp. 315 -336.