Page  315 ï~~REAL-TIME GRANULAR MORPHING AND SPATIALISATION OF SOUNDS WITH GESTUAL CONTROL WITHIN MAX /FTS Todor Todoroff Faculte Polytechnique de Mons & Conservatoire Royal de Mons Research founded in Belgium by the Region Wallonne todor@ musique.fpms.ac.be ABSTRACT: We will first discuss what objectives we find necessary to build tools aimed to fulfil the needs of composers of concrete electroacoustic music. Special attention should be given (1) to the user interface, (2) in trying to achieve real-time processing and (3) in giving the user a gestual access to the values of the parameters. We think that a right combination of those three key factors may help to establish a creative instrumental relation between the composer/performer and his computer tools. It may also contribute to give a higher internal coherence when generating sound morphologies. We will then show how we applied those principles to granular morphing and to spatialisation of sounds. 1. Introduction We feel that in a time where many signal processing techniques have acquired a state of maturity, the accent should be put on the conception of more effective and more natural ways of controlling them. And when it comes to building tools specially aimed at composers of concrete electroacoustic music, careful attention should be given to the three following factors: (1) The choice of a graphical user interface for a musical software surpasses the simple userfriendliness purpose. Experience shows that the look and feel of programs has a profound impact on the kind of music they help to produce. For instance, abusive use of nested windows and menus are very disturbing because it forces the composer to concentrate on the interface rather than on the music. On the other hand, multi-dimensional control objects, allowing to modify several parameters in one single gesture, offer an efficient and meaningful global control over the on-going process. There is no doubt that MAX graphical facilities presently available on the NeXT need to be augmented by functionalities similar to those formerly available in Animal [Lindemann, E & de Cecco, M.]. (2) Trying to achieve real-time processing is another important goal, because when a composer has to wait too long between the moment he gives values to the parameters and the moment he can hear the musical result, it often breaks his creative elan and limits his research for the optimal musical result. When real-time is achieved, it becomes far more easy to scan the effects of possible parameters' values and to define the domain in which one wants to experiment, thereby saving a lot of otherwise wasted time and energy that may be spent in more creative parts of the compositional work. (3) The next important step is to offer a natural way of controlling the on-going musical process, one that goes beyond the nowadays "classical" way of moving virtual faders with a mouse, one at a time. Composers of concrete music have indeed developed over the years a way of composing based on direct access to most sound processing and recording parameters. Their needs lay therefore not only in getting the most powerful signal processing workhorse, but also in receiving a natural way of controlling the ongoing musical process. This is very important, because the impossibility to modify several parameters simultaneously has far-reaching consequences in that it leads to compose music made out of successions of stable sound events rather than based on dynamically varying sound structures, an important aspect of concrete music writing techniques [Teruggi, D.]. Parallel access to the parameters, through the use of control systems like MIDI faders or a data glove (we use the Mattel PowerGlove interfaced with the STEIM SensorLab), restores a gestual way of controlling sound and allows precise control over dynamically varying sound morphologies. It opens new worlds for composers by offering them the power and precision made possible by today's computer technology, combined with the ease of use of older analogue equipment. A renewed instrumental relation can then be established between the composer/performer and his virtual computer tools, allowing him to enhance his expressive power. I CM C P RO C EED I N G S 1995 315

Page  316 ï~~2. Achieving a higher internal coherence when generating sound morphologies Experience in composing with both analogue and digital equipment has strengthened our believe that musically interesting sound morphologies are generally obtained when there is a correlation between several sound parameters. This applies to both sound synthesis and sound processing. It proves to be more effective because the listener tends to recognise an internal coherence in the sound produced. It seems that it is due to the way the perceptive structure of our brain is formed trough training over the years. Indeed, most natural sound event show a very high correlation between pitch, amplitude, timber and sound localisation and directivity. There is an audible internal logic as those sound parameters are all intimately related to the evolution of the energy source that produces the sound. We think that a two-step approach to the design of instruments could be a way to generate audible internal coherence without limiting the generality of a sound processing technique. On a lower level, one should try to develop algorithms providing highly independent control over the various sound parameters and extending the working domain as far as possible. On a higher level, one should design several related instruments, using the same lower level algorithms, but imposing various types of correlations between the parameter's values as well as lower and upper boundaries. Those would be like different windows, through which one looks from another perspective at the world of all the sounds made possible at the lower level. Each of those instruments allows the user to explore a specific sound domain ruled by specific laws. Each action controls more than one parameter and the listener gets to hear and to recognise the existence of those laws. This gives a higher level of "credibility" to the resulting sounds as the auditory system perceives some kind of causality. The listener is induced to believe that the sound he hears could eventually be produced by something that really exists. This approach also helps to limit the number of parameters one has to control. This proves to facilitate the use of these instruments by composers unaware of the complexity of the signal processing algorithms used; composers that would be confused if they had too many parameters to handle at a time. When possible, the process of building and modifying this kind of instruments using lower level algorithms should be made easy enough for composers without extensive computer knowledge. 3. Granular morphing We have developed MAX algorithms realising what is sometimes called granulation [Jones, D. and Parks, T.] referring to the use of sampled sounds to construct the grains. The originality of our approach comes from the way we control the synthesis, by the mean of a gesture, as opposed to the preliminary writing of a score describing the evolution of each parameter. The user has direct and total access to all parameters in real time. He just has to move MIDI faders, hit MIDI buttons, or move his hand and fingers using a data glove to subtly or drastically modify the resulting sound output. There is no main scheduling function. Each of the 32 available synthesis voices (allowing to reach densities up to approximately 2500 grains per second on one ISPW-16 card) is totally independent. When a voice is activated, it looks for the parameters' values in a central memory location that immediately reflects any change induced by mouse action, by any other MAX patch, or by incoming MIDI messages received from any external device (gestual interface, sequencer,...). We used non-linear mapping to convert MIDI data to the actual parameters' values. This takes auditive perception into account: changing the grain duration from 5 to 6 ms could have a tremendous effect on the sound quality, but no one would notice a change from 999 to 1000 ms. Each voice can be activated or deactivated on the fly at the push of a button, but one may also define complex synchronisation patterns, ranging from perfect coincidence to precise definition of delays between voices. Spatial (stereo, quadraphonic,....) distribution of voices as well as wide ranges of grain's durations allow for the generation of rich and diversified sound textures able to evolve continuously from fluid granulation to fragmentation and complex rhythmic and spatial structures. 316 IC M C P R OC EE D I N G S 1995

Page  317 ï~~Beside the usual control parameters (amplitude, attack, sustain and release time of an individual grain, delay between successive ones, transposition factor), it is possible to switch between two modes: Fig. IA: Normal mode Fig. lB: Fade in/fade out mode The fade in / fade out mode uses two perfectly synchronised synthesis voices to insure that the release phase of each grain in one voice coincides exactly with the attack phase of a new grain in the other. It allows to create a sound continuum that proved to be very effective for performing independently varying time-shifting and frequency transposition. Figure 2 shows how this is done. One may define a start point and an end point to select a working zone within the recorded sound buffer (for instance, to isolate a word or a sentence). It is then possible to loop within that zone with variable speed and direction, defining the first sample of sound that will be used to construct a grain. This is how one controls time-shifting and time-reversal. Then, depending on the grain duration and on the transposition factor (transposing one octave higher requires to take twice the duration in the sound buffer), the program defines the portion of the sound buffer (the hachured portion in fig. 2) that will be multiplied by the envelop to construct one grain. The speed and direction used to read that portion of the recorded sound buffer control the transposition and time reversal of each individual grain; they are totally independent form the time-shifting speed and direction and may be different from one grain to another if some randomness is used. Recorded Sound Buffer (up to 10 s) Part of the buffer used to build the gains Direction and speed of first 4rnple of grain Effect: time-shifting and tim -reversal................................. Start End Sound actually used Direction and speed of sample reading for the generation of Effect frequency transposition one individual grain Fig. 2: Transposition and Time-shifting from the sound buffer The tonal quality of the resulting sound may be further modified by changing the envelope parameters, without affecting the time scale and transposition factors. The user may also define cumulative shifts of the portion of sound used for each synthesis voice. When the voices are synchronised, it creates a comb filtering effect, increasing the resonance as more voices are activated. One may also generate chorus-like effects, using random variation of the start point. ICMC PROCEEDINGS 199531 317

Page  318 ï~~Control over the amount of random variations of most of the parameters gives a way to define continuously varying tendency masks. It is also possible to dynamically interpolate between several sets of parameters localised on a plane or in a 3-D space using either a mouse or a data glove. This is a way to correlate several parameters to achieve more coherent sound morphologies as we discussed before. The same method is used to morph between several sound buffers on a probabilistic base, taking the user defined probabilities to decide from which buffer the sound will be taken to build each grain. In the future, we expect to be able to provide better control (filtering, refined spatial distribution and automated rate control [Truax, B.]), as well as improved performances (higher density of grains, phase locking) by replacing some of the MAX patches by external C written objects. 4. Spatialisation tools We developed spatialisation tools controlled through a data glove and/or MIDI faders. They allow both for a direct control on the sound localisation (directly related to the hand position) and for an indirect one through the use of spatialisation primitives controlled by the hand position and the finger's bending. The later allow the generation of otherwise impossible movements: they overcome the speed limitations due to the hand's inertia and authorise simultaneous control over multiple sound sources. Depending on the primitive chosen (circle, ellipse, Lissajou shapes, spatial fragmentation, scintillation, acceleration, deceleration,...), the user has a varying amount of parameters at his disposal. He is free to map them to the available controllers as he likes best. If we take the example of the elliptic movement primitive, the parameters are: co-ordinates of centre, dimensions of large and small axis of the ellipse, angle of large axis, direction and speed (or angular speed) of sound movement and control over the intensity of the reverberation and of the Doppler effect. One could for instance chose to assign the (X, Y) position of the hand to the centre of the ellipse and the Z co-ordinate of the hand to the rotation speed and direction. The bending of the fingers and the rotation of the hand could control several other parameters, like the dimension of the ellipse's axis, the intensity of the Doppler effect, etc. Space does neither allow a detailed description of all the available parameters nor a discussion on which mapping methods were found most effective. Regarding the signal processing part, we first tried out Chowning's method. It is effective when moving sounds far from the centre. But it fails to simulate sounds crossing the room or moving close to the centre, as only two loudspeakers are active at a time, determined solely on an angular base disregarding radial distance. To avoid the jumps of sound caused by changes from one active pair to another, we developed another method, using tables to independently simulate left-right and front-back movements in a quadraphonic space. When the sound image is located near the centre all 4 loudspeaker are active. Those tables may be redrawn in the concert hall to fit the loudspeakers' placement and the local acoustic. In the future we will program graphical FTS clients to provide a better method to visualise and to edit movements. We will also implement SMPTE and MTC synchronisation to allow the recording and playback of movements along with the music. We are currently experimenting ways to control both movements and sound transformations with a single gesture. Such a combination might generate spatial and spectral morphologies that are highly coherent from a perceptual point of view. References Chowning, J., "The Simulation of Moving Sound Sources", J. AES, Vol. 19 (1), January 1971, pp. 2-6. Jones, D. and Parks, T., "Generation and Combination of Grains for Music Synthesis", CMJ Vol. 12 (2), Summer 1988, pp. 27-34. Lindemann, E. & de Cecco, M., "Animal: Graphical Data Definition and Manipulation in Real Time", CMJ, Vol. 15, No. 3, Fall 1991, pp. 78-100. Roads, C., "Granular Synthesis of Sound" In C. Roads and J. Strawn, Editors, 1985, Foundations of Computer Music, Cambridge, Massachusetts: MIT Press, pp. 145-159. Teruggi, D., "The Morpho Concepts: trends in software for Acousmatic Music Composition", Proc. ICMC 1994, Aarhus, pp. 213-215. Todoroff, T., "Outils de synth~se granulaire et de spatialisation du son commande gestuelle dans l'environnement MAX - FTS", JIM '95, Paris, Institut Blaise Pascal, LAFORIA 95/13, pp. 101-1 08. Truax, B., "Time-Shifting of Sampled Sound with a Real-Time Granulation Technique", Proc. ICMC 1993, Tolcyo, pp. 82-85. 318 8ICMC PROCEEDINGS 1995