Page  00000379 A PARAMETRIC MODEL OF EXPRESSIVENESS IN MUSICAL PERFORMANCE BASED ON PERCEPTUAL AND ACOUSTIC ANALYSES Sergio Canazza; Antonio Roda; Nicola Orio University Of Padova, Center of Computational Sonology (CSC) Dept. of Electronic and Informatics - Via Gradenigo 6/a - 35100 Padova, Italy canazza@dei.unipd.it ar@cscl.unipd.it orio@dei.unipd.it Abstract Musical performance introduces some deviations from nominal values specified in the score. Music reproduced without such variations is usually perceived as mechanical. Most investigations explore how the musical structure influences the performance. There are a few studies on how the musician's expressive intentions are reflected in the performance. The purpose of this work is to develop a model for the expressive modification in real time of musical performance. Perceptual analyses were conducted on some performances played with different intentions (correlated with a set of sensorial adjectives). From these analyses, two distinct expressive directions were observed: the first one correlated with "energy" and the second one with the "kinetics" of the pieces. The two-dimensional space (Perceptual Parametric Space, PPS) obtained represents how the subjects arranged the pieces in their own minds. Acoustical analysis allowed us to correlate the expressive directions of the PPS with the main acoustic parameters. Each point of PPS is therefore associated with a set of acoustic parameters. Analysis-by-synthesis method was used to validate the model. In order to carry out computer generated performances, we developed a real time software. 1 Introduction In western tradition, music is usually conveyed by means of a symbolic description, namely a score. The conventional score is quite inadequate to describe the complexity of a musical performance so that a computer might be able to play it. The performer, in fact, introduces some microdeviations in the timing of performance, in the dynamics, in the timbre, following a procedure correlated to his own experience and to the common instrumental practice. Further on, the performer operates on the microstructure of the musical piece not only to convey the structure of the text written by the composer, but also to communicate his own feeling or expressive intention. It is well know that spoken language can be enriched by different meanings depending on the variation introduced by the speaker. Similarly the musician adds and varies expressiveness to the musical message changing the timing, the dynamics, the timbre of the musical events from the written score, during the performance. The aim of this work is to give a parametric representation of the performance, which can be used to modify the expressiveness of the musical message according to user's intentions (i.e. expressive intentions). The developed model can be applied to multimedia field, in order to generate performances with continuously changeable expressive characteristics. In fact, the relevance of music in multimedia systems is increasingly growing. The music is a useful media for expression and communication, which goes beyond the mere aesthetic point of view. In multimedia system the musical media is essentially based on pre-recorded audio, which is simply played without modification. Alternatively the musical message may be coded as in a music score (e. g. using MIDI protocol). In this latter case the sound is generated by a synthesizer, or an audio card. The MIDI file can be a simple translation of a written score (in this case the performance will be perceived as mechanical), or a MIDI recording of a human performance. In any case it is impossible to change the expressiveness of a musical message, or to adapt it to user's intentions. In this sense, the model allows us to modify in real time the expressive characteristics of a musical performance. This model was obtained starting from perceptual analysis carried out on several recorded performances. It allows us to describe the expressive intentions by means of judgment categories based on listeners' experience. Using the model, we developed a real time software able to carry out computer generated performances depending on user's expressive intentions (see fig. 1). Perceptual tests on these performances allow us to improve and validate the model, following analysis-by-synthesis method. ICMC Proceedings 1999 -379 -

Page  00000380 Expressive intentions (by user) e Score --- Musical Neutral perf. Expressiveness.. toeivpeomn by musician or Model rule system) Post Processing Fig. 1: Architecture of the model. 2 Models of expressiveness Expressive deviations are related both to musical structure and expressive intentions. Friberg (1991) developed a rule system for music performance and Bresin (1998) used neural network to obtain automatic performance of musical score. Quite a lot of studies have been carried on to understand how much the performer's intentions are perceived by the listener, that is to say how far they share a common code. Gabrielsson & Juslin (1996) in particular, studied the importance of emotions in the musical message. In order to model the expressive characteristics of the different performances, we analysed how different category of listeners arrange the expressive intentions in their own minds. For this purpose, a set of sensorial adjectives was chosen which should inspire a certain expressive idea to a musician. We observed that a musician, inspired by appropriate adjectives, produces different performances of the same piece. Perceptual analysis (Canazzal997a, 1997c) proved that the audience can indeed perceive the kind of intention he wanted to convey. Acoustic analysis (Canazzal997b, 1997c, DePolil998) confirmed that there are microdeviations in the note parameters. Starting from the results of the acoustic and perceptual analysis, Canazza et al. (1998) developed models able to add expressiveness to automatic musical performance. The models are able to obtain an expressive intention, transforming a neutral performance (i.e. a literal human performance of the score without any expressive intention or stylistic choice) both with reference to the score and the acoustic signal itself. It must be underlined that this approach provides for the adoption of hierarchical structures similar to the spoken language ones (words, phrases), in the musical language. Once these structures are recognized, it is possible to modify the parameter of a group of notes (for example metronome or intensity) following a certain curve. Such curve describes the characteristic of the musical gesture on the group of notes. In this paper we will present a development of this model, provided of real time capabilities. 3 Perceptual Parametric Space We selected a set of scores from Western Classical and Afro-American music. For each score, different performances (correlated with different expressive intentions) were played by professional musicians. We made experiments to determine the judgment categories used by subjects called in to listen to the various interpretations of the same musical excerpt. Two different factor analysis were made. Factor analysis on adjectives (see fig. 2) allowed us to determine a semantic space defined by the adjectives proposed to the listeners. Two significant components accounted for 87.2% of the variance. Varimax rotation was used in order to simplify the factors' interpretation. By means of factor scores, it was possible to insert the performances into this space. The comparison between the performance positions and the evaluation adjectives demonstrated a good recognition of the performers intentions by the subjects. mel ten swe air gen bla eff' a.im vap.PP He ser 1.2 *.N Factor 1 -1.2dis mas fre g H1, abr S"sha -1.2.__ Fig. 2: Factor analysis on adjectives. The first factor explains 60% of the total variance, the second 27.2%. Evaluation adjectives: black, oppressive, serious, dismal, massive, rigid, mellow, tender, sweet, limpid, airy, gentle, effervescent, vaporous, fresh, abrupt, sharp. Performances: Neutral, Light, Bright, Hard, Dark, Heavy, Soft. It can be noticed a good recognition of the performers intentions by the subjects. The second factor analysis used performances as variables (see fig. 3). It showed that the subjects had placed the performances along only two axes. The first two factors, in fact, explained 75.2% of the total variance. The two dimensional space (Perceptual Parametric Space, PPS) so obtained represents a model of expressiveness, by which the - 380 - ICMC Proceedings 1999

Page  00000381 subjects arranged the pieces in their own minds. The first factor (expressive intention bright vs. expressive intention dark) seem to be closely correlated to the acoustic parameters which regard the kinetics of the music (for instance Tempo). The second factor (expressive intention soft vs. expressive intention hard) is connected to the parameters which concern the energy of the sound (intensity, attack time). Other acoustic parameters (e. g. legato, brightness) are related to the PPS axes and we will show that in the next paragraph. This correlation was deduced from acoustic analyses (Canazzal997b, Canazzal997c, Canazzal998, Depolil998). 0 Cu r2 ~i 1.0 *Hard 0,s - Heavy.Dark 00 *.Bright -o.s5.Light.Soft.1,0... -I. -- I I i.1,0.-0, 0,0 factor 1 0,5 1.0 mean envelope energy (I), the time location of the energy envelope center of mass (EC). * The subscript in indicates that the value of the parameter P is calculated from the inputs. In fact, some parameters (e. g. IOI and L) were calculated starting from both score and neutral performance. * The subscript out indicates the value of the expressive performance, that is the output of the system. * P stand for the mean of the values measured in the input performance (for the parameter P). * kp and mp are coefficients that carry out two different transformations of the parameter P. The first one performs a translation and the second performs a stretching of the values (see fig. 4). For each parameter P, starting from PPS (see paragraph above) the k and m coefficients are calculated by means of the equation k(x, y)= ao + a1 x + a2 y m(x, y)=bo+b - x +b2 -y where x and y are the coordinates of the PPS; a, and bi are coefficients that represents the relation between the transformations of the acoustic parameters and the axes of the PPS. These coefficients are estimated carrying out a multiple regression on the recorded expressive performances. The PPS was obtained using a set of sensorial adjectives to sample the semantic space. What do intermediate points of the space mean? We hypothesize that they can be used as an interpolation of the original samples: i.e. the points between heavy and light versions would have intermediate expressive characteristics. Analysisby-synthesis method was applied choosing intermediate points of the space: the computergenerated performances have intermediate characteristics and show that all the points of PPS have an expressive meaning. These results imply that PPS can be used to render a kind of morphing between expressive characteristics. Hence, selected a generic point (user's expressive intention) in the PPS, the equation (2) calculates the m and k coefficients for each acoustic parameter. Basically, these coefficients are equivalent to instructions like "play louder", "play faster" or "amplify rubato", "play a tempo", and so on. Then, the equation (1) computes the expressive deviation of the parameter P, which is the input of the rendering step of the system. Generally, during the same performance, a trajectory that moves from a region to another one of the PPS can be drawn. The k and m coefficients, in that case, are functions of time (k(t). m(t)) and the performance will be characterized by changeable expressive features. Fig 3: factor analysis using performance as variables. First factor (75.2%) is correlated with kinetics of music; second factor is correlated with energy of the sound. 4 The model The input of the expressiveness model is composed by a description of a neutral musical performance (played without any expressive intention), by the nominal score of the performance, and by a control for the user's expressive intention. The expressiveness model act on the symbolic level, computing the deviations of all musical parameters involved in the transformation. For each expressive intention, the deviations of the acoustic parameters are calculated using the following equation P.(n) Pi Pi P,, (n) P,, (n) P,,(n) where: * n is the cardinal number of the n-th note of the score. * P stand for the different parameters modified by the model. Up to now, the model can process the Inter Onset Interval between the current and the successive note (IOI), the duration of the current note (DR), the duration of the attack (DRA), the ICMC Proceedings 1999 - 381 -

Page  00000382 I4 1. Cj s0 50 40. 30 -20 10 0 - ' v vy I 'v 0 20 40 60 0 100 120 note number r-Am 80 ct 70 S60 o. C I 40 ~; 0 20 40 60 60 100 120 note number Fig 4: example of transformations by means of k and m coefficients. 5 Validation In order to validate the model, we developed a software that allows us to generate expressive performances in real time. The model computes the expressive deviations- necessary to the rendering step in order to synthesize an expressive performance starting from a neutral one. The rendering step, using post-processing technique, is described in detail in Canazza et al (1999). The software was used to generate performances of different musical repertories. Besides the fact that the model was developed mainly for western classical music, it showed a general validity in its architecture, even if it needs some tuning of the parameters. Expressive syntheses of pieces belonging to different musical genres (European classical, European ethnic, Afro-American) verified the generalization of the rules used in the model. 6 Conclusions The performance of a piece cannot be reduced to a simple translation of the conventional score symbols into sounds just following the metrical duration and the note pitch. The musician puts his own sensitiveness and experience to communicate emotions and feelings to the listener, using the expressive resources of his instrument. Studies on musical expressiveness made clear which are the choices made during performance in order to give a certain expressive intention. Starting from perceptual and acoustic analyses we developed a musical expressiveness model. The model was provided with a series of controls working on the single note. Characteristics such as intensity or note attack are described intuitively thanks to PPS. The model showed a general validity in its architecture, independent of musical genre. This work has been supported by Telecom Italia under research contract Cantieri Multimediali. References Bresin R. (1998) Artificial neural networks based models for automatic performance of musical scores. Journal of New Music Research, 27, 3, pp. 239-270. Canazza, S., De Poli, G., Rinaldin, S., & Vidolin, A. (1997a). Sonological analysis of clarinet expressivity. In M. Leman (ed.), Music, gestalt, and computing. Studies in cognitive and systematic musicology, pp. 431-440. Berlin, Heidelberg: Springer-Verlag. Canazza, S., De Poli, G., & Vidolin, A. (1997b). Perceptual analysis of the musical expressive intention in a clarinet performance. In M. Leman (ed.), Music, gestalt, and computing. Studies in cognitive and systematic musicology, pp. 441-450. Berlin, Heidelberg: Springer-Verlag. Canazza, S., De Poli, G., Roda', A., & Vidolin, A. (1997c). Analysis and synthesis of expressive intentions in musical performance. In Proceedings of the International Computer Music Conference 1997. Tessaloniki. pp. 113 -120. Canazza S., De Poli G., Di Sanzo G., Vidolin A. (1998). A model to add expressiveness to automatic musical performance. In Proc. of International Computer Music Conference 1998. Ann Arbour. pp. 163-169. Canazza S., De Poli G., Di Federico R., Drioli C. & Rodk A. (1999). Expressive processing of audio and MIDI performances in real time. In Proc. of the International Computer Music Conference 1999. De Poli, G., Rodk, A., & Vidolin, A. (1998). Noteby-note analysis of the influence of expressive intentions and musical structure in violin performance. Journal of New Music Research, 27,3, pp. 293-321. Friberg, A. (1991). Generative Rules for music performance: a formal description of a rule system. Computer Music Journal, 15(2), pp. 56-71. Gabrielsson. A. & Juslin, P. N. (1996). Emotional expression in music performance: between the performer's intention and the listener's experience. Psychology of music, 24, pp. 68 -91 -382 - ICMC Proceedings 1999