Page  00000427 INTERACTIVE SOUNDSCAPE DESIGN WITH EVOLUTIONARY SOUND PROCESSING Jose Fornari Music Cognition Group. Department of Music. University of Jyviskyli. FINLAND formari @ nics. Adolfo Maia Jr. Institute of Mathematics. State University of Campinas CP 6166. 13091-970. BRAZIL adolfo @nics. Jonatas Manzolli Department of Music/NICS State University of Campinas CP 6166. 13091-970. BRAZIL jonatas @ nics. ABSTRACT Here is presented an approach for interactive soundscape design based on Interaural Time Difference (ITD) cues and Evolutionary Computation (EC). We define a Sonic Localization Field (SLF) where the pair of parameters: sound intensity and ITD azimuth localization angle are the Spatial Sound Genotypes (SSG) that control the adaptive sonic evolution. A Pure data (Pd) software application working with a parametric score is proposed as a way of dynamically guiding the automatic generation of soundscapes. 1. INTRODUCTION Design of soundscapes based on the control of sonic cues, such as sound localization, have already been studied [1-3]. Recently, Interactive Sound Spatialization (ISS) methods have been applied in hypermedia environments [4]. ISS has also been used in the context of helping people with special needs [5]. Psychoacoustic factors may also be seeing as sonic cues that allow any listener to perceive and recognize a particular soundscape [6]. In experiments involving sound perception, it is common to take into consideration only the classical psychoacoustic factors, such as: loudness (perception of sound intensity), pitch (perception of sound frequency) and spectrum (perception of partials that compose the sonic frequency-domain), where it is often relegated the importance of the Sound Spatial Positioning Perception (SSPP). However, SSPP turns to be very meaningful when we are in a space with several sound sources located in different places (e.g. in a concert hall, listening to a symphonic orchestra). SSPP can deliver important information, which is not only aesthetically meaningful, but may be also concerned to our safety (e.g. driving a car or crossing a traffic road). SSPP is mainly given by three types of cues: Interaural Time Differences (ITDs) [6], Interaural Level Differences (ILDs) [7] and HeadRelated Transfer Functions (HRTFs) [8]. ITDs refer to the difference in time for a sound from the same soundsource to reach both ears. Similarly, ILDs describe the difference of perceived intensity between ears, for the sound from the same sound-source. HRTFs are the more complex and individual-related collection of spatial cues that take into account the listener's head dimensions, outer-ears and torso shapes, for his/her sound localization perception. Here we propose an implementation using ITD as the localization cue for an interactive system for soundscape generation that takes ISS from the perspective of adaptive evolution [15]. In the following sections we present the theoretical model based on the integration between interactive soundscape design and adaptive evolution. Afterwards, we describe the system features (section 2), mathematical model (section 3), system implementation (section 4) and experimental results (section 5). 2. SPATIALIZATION AND SOUNDSCAPE DESIGN WITH EVOLUTIONARY COMPUTATION Since 2001, we have studied interactive and genetically generated music techniques suited for composing highly textured musical soundscapes [10]. We developed an Evolutionary Sound Synthesis methodology [11, 12] and, recently, we incorporated spatial information in its sonic genotype [13]. Our study has been based on the application of concepts from the theory of Complex Adaptive Systems (CAS) [14, 15] to sound design [18]. 2.1. Adaptive Sonic Evolution Spatialization sound systems, as mentioned above, can be considered as a CAS. As described in [15], CASs involve a large number of interconnected parameters that, altogether, exhibits a coherent emergent pattern in time. Spatialization can involve a large amount of loudspeakers, waveforms, acoustic cues, damping, and reflections, among others. It is known that CASs generate emergent and macroscopic properties [16], which arises from competition and cooperation. The large-scale system behavior is the result of a large number of interactions between many individuals (using the EC jargon). This self-organized process, in which a complex system is driven through different organizational states [17], can be applied to soundscape design [17], as we discussed in [13]. We pinpoint these concepts in the following assumptions: a) gesture input controls are used to generate Target Sets which will drive a EC process based on ITD cues, b) spatialization is represented by a Sound Localization Field, c) spatial similarities are measured by a Fitness procedure within an Iterative Evolutionary Cycle (introduced further) and d) the adaptation between gesture inputs (i.e. Target Sets) and the EC population is built as an evolutionary process controlled by genetic operators. 427

Page  00000428 2.2. Sound-Marks Cues In [17], Schafer defined the basic elements that compound a soundscape: key-sounds, key-signals and sound-marks. In this study we focused on the development of a system that majorly creates soundmarks using two immersive concepts: Sonic Localization Field and Spatial Similarity. Our approach uses ITD cues to interactively generate trajectories of evolutionary sound-marks. This evolution was then controlled by spatial sound genotypes (SSG) related with two parameters: sound intensity and ITD azimuth angle. The ITD function is an emulation of the strategy described in 1, which is used by the human brain to sense the equidistant frontal-horizontal position of a sound-source. Although valid for the full range of frequencies, ITDs are particularly useful to determine the location of those sounds with middle to low frequency. 3. INTERACTIVE SPATIALIZATION 3.1. Sound-Marks in the Sonic Localization Field Our model consists of a space of triplets G={(W, I, L)}, named Genotype Space, where: 0< I 1 is the waveform intensity factor and -1< L<1 is the waveform ITD localization factor, given by the azimuth angle 0, where L = (900 - 0)/ 900 and 0~ _ 0 s 1800. For more details, see [13]. The set of all possible values of the pair (I,L) is named Sonic Localization Field (SLF) (see Figure 1). We define a population as any finite subset of elements of G. In our model we start from an initial population P(O) and a target population T. Then, we iteratively generate a queue of R populations, as: G), G(2)..., G(r), where the k-th population is a subset of G with N individuals (elements) G(k)= {G, G2(k),... GN(k) }. We also define a Target population T= {ti, t2,...,tM } with M individuals. Spatial dispersion in the SLF is characterized by the distribution of set's pairs Si= (Ii, Li) as depicted in Figure 1. These pairs, in which the Genetic Operators are applied, are named as Spatial Sound Genotypes (SSG). Since G(k) and T are subsets of G, we define the distance between these two sets as follows: distance to the target set: dk =d(G(k), T). This new best individual is then used by the Iterative Evolutionary Cycle to build the next generation of G, as presented in Section 4. 3.2. Genetic Operators for Sound Spatialization In order to control the evolutionary spatialization process, we defined the following two basic genetic operations: Sonic Localization Field 0 7 0.7 L -1 -8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Figure 1. Sonic Localization Field (SLF). 3.2.1. Crossover Given the best individual of the k-th generation Gi*(k) = (Wi,(k), I,(k), Li*(k)) and the crossover rate a, with 0 < a < 1, individuals in the population will be renewed changing the values of their parameters as follows: Ii(k+l)- a Ii(k) + (-a). Ii(k), and Li(k+l) =a. Li(k) + (I-a). Li(k) (3) for 1< i < N, and k= 0,1,..., R. where R is the number of iterations. 3.2.2. Mutation Similarly, given the best individual of the k-th generation Gi,(k) = ( Wi,(k) Ii (k), Li(k)) and the rate B, with 0 < B < 1, the mutation operation is defined as: Ii(k+) = p. (rand) + (1-3). Ii(k) and Li(k+l) =p. (rand) + (1-P). Li(k) for 1 < i N, and k=0,1... r. (4) where "rand" is a random parameter in the interval [0,1] and 3 controls the degree of randomness of the mutation operation. d (G,(k)T) 2 A 4 B (1) 4. IMPLEMENTATION The constants A and B are taken as the maximum of the intensity and localization factors, respectively, and the distance is normalized to the interval [0,1]. The distance dk between G(k) and T is defined by min dij (G(k),T) dk=d(G(k),T)= - ij (2) for i=1,...,N and j=1,2,..., M. The best individual in the k-th population G(k) is Gi,(k)= ( Wi*(k), Ii(k), Li*(k)), representing the one with minimum To simulate this system we have been working on the implementing of an Iterative Evolutionary Cycle (IEC) using Pure Data (Pd) software. The IEC consists of two main processes: a) an evolutionary sound synthesis module, which performs genetic operators that modify the waveform shape (see details in [11]), and b) an evolutionary spatial engine module which applies crossover and mutation on the population, as described in section 3.2. 428

Page  00000429 4.1. Creating Populations We generated a population of waveforms in which prerecorded samples were segmented using a fixed slice of time. That was done using two procedures: a) automatic segmentation of a single large audio file, and b) generation of a large population of sine-waves with random generated frequency and amplitude. For this experiment, the time-slices varied from 50 milliseconds to (low-level description) 2 seconds (mid-level description), thus covering the low to mid-range timeframe of audio descriptors, as described in [19]. 4.2. Evolutionary Engines The Evolutionary Sound Synthesis (ESSynth) Engine, as presented in previous works, such as [10, 11] was used here as evolutionary engine to control the adaptive sound trajectory of this system that weaves the soundscape. Here, ESSynth was modified in order to use Eq. 2 as fitness function. The Evolutionary Spatial Engine (ESE) has also been described in [13, 14] and was used as engine. It was also modified to use Eq. 3 and Eq. 4 for genetic operation. 4.3. User-Interface & Parametric Score This system aims to allow the user to design soundscapes in two ways: a) real-time interaction using an input-device that influences the fitness evaluation, and b) off-line interaction using a parametrical score. The parametrical score is an ASCII text file with a given sequence of lines as described in Table 1. The idea here is to let the user design a general evolutionary behavior and use the input-control in real-time to produce novel soundscapes. Since the population of individuals can be very large, we implemented a parameter to control a selection of sub-sets in the population. We use two integer numbers n, and n2 to define the Population Segmentation Window (PSW) as the subinterval [n1, n2] where 0 < n, < n2 < N. Only the individuals belonging to the chosen PSW will be used in the IEC. Table 1. Description of the Parametric Score parameter description application 0 5 a 5 1 crossover rate increases Correlation 0 5 B 5 1 mutation rate increases Randomness 0~ n, < n2 ~ N population defines a sub-set in the segmentation population window Tb upgrade time rate for defines the duration of each each generation (in iterative evolutionary cycle millisecs). Tm Delay new defines the rhythm of the waveforms is sent to sound transformation a circular buffer (in millisecs). Flags = 1, 2, 3 status selector indicates population segmentation (0), synthesis (1), spatial engine (2) or (3) _______________________end. 4.4. Iterative Evolutionary Cycle Finally, it is possible to describe the whole iterative process in which we implemented an adaptive evolution applied to spatialization and sonic synthesis for soundscape design. Notice that there are two main circuits here: a) off-line: that is controlled by the Parametrical Score, and b) on-line, that is interactively controlled by the user, with a gesture controller interface to produce a Target Set. Both circuits are applied to the evolutionary engines and the sound output is given by the best individual of each generation. 5. RESULTS We presented here the simulation results of this implementation. The tested parameters and the parametric scores are presented in Table 2. The first parameter on the score is always the time in milliseconds. Basically, we evaluated how the ITD cues work as part of the sound genotype and how the evolutionary synthesis method modifies the generated sound. Table 2. Parametric Score of the Sound Example Parameter Line Comment 0, 0,25,30 Time Population Segments 0, 1,.5,.0,950,100 Synth - alfa=0.5 and beta=0 0, 2,9.2,9.3,50,100 Spatial - alfa=0.2 and beta=0 50, 2,9.2,9.3,50,100 Spatial - alfa=0.2 and beta=0.1 100,92,.2,.3,50,100 Spatial - alfa=0.2 and beta=0.2 110,2,.0,.0,50,100 Spatial - alfa=0.2 and beta=0.3 130,93,.0,.0,50,100 End of the process The example presented here is the result of a population generated using a sample of speech voice. Below, in Figure 2, we presented the sound example generated with a population with time slices of 0.2 seconds. It is possible to verify the dynamics of the genetic operators in Table 2. When the process starts, EESynth and ESE are applied with null mutation rates. The SLF presented in Figure 2 (top) allocates all the spatializing information near its maximum values. Parameters Tb and Tm described in Table 1 were not tested yet, as we are working on the Pd (Pure Data) implementation of this system. me, Figure 2. Sound Localization Field used to generate the sound example (top); resultant waveform (bottom). 429

Page  00000430 6. CONCLUSION Here was presented a system for interactive soundscape design based on sound localization control and evolutionary sound synthesis. Our system is related to recent studies in which we incorporated spatial information as sonic genotype descriptor. Here we linked these new achievements with the idea of using an input-controller as Target Set. Using the Pd implementation we plan to test the control parameters through parametric scores and real-time control of the sound localization field. Further studies will include: a) Testing the usage of several sensor-devices to capture the position of performers motions to produce geometric trajectories to build curves in the Target Set. b) Implementing reverb cues as a new psychoacoustics descriptor. c) Incorporating high-level musical audio descriptors (contextual-based) on the score, such as: tonality, rhythm pattern and chord complexity. 7. ACKNOWLEDGMENTS We want to thank the agencies that financially supported this research. Manzolli is supported by the Brazilian Agencies CNPq ( and Fornari was supported by FAPESP ( We specially thank the Music Cognition Group, at the University of Jyvaskyla, Finland, represented by Prof. Petri Toivianen. 8. REFERENCES [1] J. Blauert. Spatial hearing: the psychophysics of human sound localization. Cambridge: MIT Press, 1997. [2] V. Pulkki, "Virtual sound source positioning using vector base amplitude panning". Journal of the Audio Engineering Society, vol. 45, pp. 456-66, 1997. [3] J. M. Chowning, "The simulation of moving sound sources," presented at Audio Engineering Society 39th Convention, New York, NY, USA, 1970. [4] Michael Cohen, Jens Herder and William L. Martens., Cyberspatial Audio Technology. J. Acous. Soc. Jap., vol.20, No.6, pp.389--395, Nov. 1999. [5] Barbieri, T., Bianchi, A. and Sbattella, L. MinusTwo: Multimedia, Sound Spatialization and 3D Representation for Cognitively Impaired Children. In Computer Helping people with Special Needs, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2004. [6] Jack B. Kelly and Dennis P. Phillips. "Coding of interaural time differences of transients in auditory cortex of Rattus norvegicus: Implications for the evolution of mammalian sound localization". Hearing Research, Vol 55(1), pages 39-44. 1991. [7] Birchfield, S.T. & Gangishetty, R. "Acoustic Localization by Interaural Level Difference". IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, Pennsylvania, March 2005. [8] Douglas S. Brungart and William M. Rabinowitz. "Auditory localization of nearby sources. Headrelated transfer functions". The Journal of the Acoustical Society of America -- September 1999 -- Volume 106, Issue 3, pp. 1465-1479. [9] Fels, S. S. and Manzolli, J. "Interactive, Evolutionary Textured Sound Composition". 6th Eurographics Workshop on Multimedia, pp. 153 -164. Sept. 2001. [10] Manzolli, J., Fornari, J., Maia Jr., A., (2001) Damiani F. The Evolutionary Sound Synthesis Method. Short-paper. ACM multimedia, ISBN:1 -58113-394-4. USA. [11] Fornari, J., Manzolli, J., Maia Jr., A., Damiani F., "The Evolutionary Sound Synthesis Method". SCI conference. Orlando, USA. 2001. [12] Fornari, J.; Maia Jr. A.; Manzolli, J.. "A Sintese Evolutiva Guiada pela Espacializagio Sonora". XVI Congresso da Associacgo Nacional de Pesquisa e P6s-graduaaio (ANPPOM). Brasilia. 2006. [13] Fornari, J.; Maia Jr. A.; Manzolli, J.."Creating Soundscapes using Evolutionary Spatial Control". In proceedings of the EvoMusarts, SpringVerlag,Valencia. 2007. [14] Holland J. H. Ada ptatol iinNat.iaf and Artifiial Svstems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence, MIT Press, Cambridge MA, 1992. [15] Holland J.H. Hiidde Order: How Adaptation Buicds Coi ^ lex Addison-Wesley 1996. [16] Caetano, M. J6natas Manzolli, J. Fernando Von Zuben' F. Self-Organizing Bio-Inspired Sound Transformation. In proceedings of the EvoMusarts, Spring-Verlag,Valencia. 2007. [17] R. Murray Schafer, M. (1977). "The Soundscape". ISBN 0-89281-455-1. [18] Truax, B.. (1978) "Handbook for Acoustic Ecology". ISBN 0-88985-011-9. [19] Leman, M., Vermeulen, V., De Voogdt, L., Moelants, D., & Lesaffre, M. (2004). Correlation of Gestural Musical Audio Cues. Gesture-Based Communication in Human-Computer Interaction:5th International Gesture Workshop, GW 2003, 40-54. 430