Page  00000001 Rule-Based Emotional Coloring of Music Performance Roberto Bresin and Anders Friberg Department of Speech, Music and Hearing (TMH) - Royal Institute of Technology (KTH) Drottning Kristinas viig 31, 10044 Stockholm, Sweden {roberto, andersf}@ ABSTRACT Previously unexplored potentials offered by the KTH rule system highlight the possibility to obtain musical and emotionally different performances of the same score by hyper-articulating the performance of the score itself, i.e. by marking the structure of the composition. This new application of the KTH performance rule system is presented here. It is shown how particular combinations of performance rules, called rule palettes, can produce music performances with specific emotional coloring. In particular the principles and choices behind the design of a rule palette for angry colored performances are illustrated. INTRODUCTION This paper presents part of a new computational model of expression in music performance-The GERM model (Juslin et al., 1999). Drawing on previous research on performance, the GERM model is based on the hypothesis that expression in performance derives from four primary sources of variability, namely (1) Generative rules that function to convey the musical structure in an appropriate manner (e.g., Clarke, 1988; Friberg, 1991; Palmer, 1989); (2) Emotional expression governed by the performer's expressive intention (e.g., Juslin, 1997a); (3) Random fluctuations reflecting internal timekeeper variance and motor variance (e.g., Shaffer, 1982; Wing & Kristofferson, 1973); and (4) Movement principles dictating that particular aspects of the performance should be shaped in accordance with our perception of physical motion (Bresin & Battel, forthcoming; Friberg et al., 2000; Friberg & Sundberg, 1999; Shove &Repp, 1995). In the following we focus on the Emotional expression source of variability in music performance and illustrate how it can be simulated by means of special combinations of generative rules. RULE BASED EMOTIONAL EXPRESSION An important contribution in music performance is given by the emotional component, often marked in the score by the composer with Italian words, such as Con fuoco, Tenero, Vivace, Brillante. Juslin and Gabrielsson (1996) asked musicians (playing piano, violin, flute, and guitar) to communicate different emotions in performances of the same piece. They found that each emotion was coupled to a particular set of acoustic cues. In subsequent listening tests, listeners, both musically trained and untrained, used the same set of cues for decoding these emotions. The importance of different performance cues in the identification of emotional qualities was tested in synthesis experiments (Juslin, 1997b). Each expressive cue was set by hand to different values in synthetic performances on a commercial sequencer. Listeners could recognize and identify the intended emotions also for these stimuli. Continuing in the same direction we used Director Musices (DM), a software for automatic performance of music that has been developed in our group (Bresin and Friberg, 2000), to automatically control each cue in emotionally expressive performances. The main idea was that particular combinations of the DM performance rules, henceforth rule palettes, could give rise to performances that differ with respect to emotional expression. Six different rule palettes, each corresponding to an emotion (fear, anger, happiness, sadness, tenderness, and solemnity) were designed using the qualitative cue descriptions by Gabrielsson and Juslin. These rule palettes produce variations of the performance variables 101 (Inter-Onset Interval), 001 (Offset-Onset Interval) and L (Sound Level) for each note in the score. In a listening test with 16 listeners two pieces were performed without rules and with the six rule palettes for a total of 14 performances. The emotions associated with the DM rule palettes were correctly classified in most cases (Bresin and Friberg, forthcoming). A more detailed description can be found in Bresin and Friberg (forthcoming). In this paper we want to illustrate the principles and choices that guided us in simulating a specific emotion, anger. SIMULATING ANGER It has been observed that the mean tempo and the mean sound pressure level are two of the most important expressive cues for communicating emotion in music performance are mean tempo and sound pressure level (SPL). These cues exhibit similar qualitative deviations in performance of instrumental music as in speech and singing, see Figure 1. For these reasons two new rules was introduced in the DM system. The Tone duration rule, for shortening or lengthening all notes in the score by a constant percentage, and the Sound level rule, for decreasing or increasing the SPL of all notes in the score by a constant value in dB. Gabrielsson and Juslin observed that players trying to communicate anger are using very rapid tempo, loud SPL, non-legato articulation, moderate time deviations, structural reorganization, and

Page  00000002 6 6OSPEECH (House) 3 n O SINGING (Kotlyar and 3 I Morozov) QI ISINGING (Langeheinecke et -o al., Relative Happy).-3 INSTRUMENTAL MUSIC -6 _0 C (13 E Cl) a O0 Figure 1. Sound pressure level (SPL) deviations in speech, in singing, and in performance of instrumental music. increased contrast between long and short notes (Table 1). This qualitative description was translated to the generative rules using an analysis-by-synthesis method. The result is in the form of a rule palette in DM for each emotion defining the rule selection and its parameters. For achieving very rapid tempo and loud SPL the Tone duration and Sound level rules presented above were used. Non-legato articulation was obtained with the Duration contrast articulation rule set to a detached articulation. Articulation seems to convey high degree of information in expressive playing (Bresin & Battel, forthcoming). Results from some informal listening tests conducted by the authors show that dead-pan performances with staccato articulation are classified as happy and brilliant, and dead-pan performances with legato articulation are classified as sad and dark. Moderate time deviations were obtained by using moderate values of the rule quantity parameters. The structural reorganization described by Gabrielsson and Juslin has been realized with an unconventional use of the Phrase arch rule. In a natural performance phrasing is characterized by an accelerando and crescendo followed by a rallentando and decrescendo. Here, the rule is applied in the opposite way yielding a rallentandodecrescendo in the beginning of a phrase and a accelerando-crescendo in the end of the phrase. The Duration contrast rule, applied with a positive rule quantity, was used for increasing the contrast between long and short notes. It has been shown that Duration Contrast rule plays an important role in communicating different emotions in music performance ranging from very high duration contrast for Afraid-performances and no contrast at all for Sad- and Tender-performances. The First note in measure rule stresses the first note in each measure by playing it louder. Metrical patterns alter mainly the motional quality (Friberg & Battel, forthcoming) and in this case creates a dominantly downward motion similar to a forceful type of walking (c.f. Friberg et al., 2000). The rule palette was applied to different scores. As an example, time and SPL deviations for each note in a Swedish nursery tune (Ekorrn satt i granen, henceforth Ekorrn, "The squirrel sat on the fir-tree", composed by Alice Tegner) are displayed in Figure 2. The rule palette (without the First note in measure rule) was validated in a listening test. In total, 91% of the 16 subjects classified the performances as communicating anger (Bresin & Friberg, forthcoming). APPLICATIONS One immediate application is the introduction of new features in commercial music sequencers and editors by programming new plug-ins. In this way users will be able to obtain emotionally colored performances of any MIDI file and can continue to use their favorite software tool. A second application is in performance analysis. By applying the performance rules in a reverse mode it is possible to extract some information about the emotional content of a performance. This opens new perspective in the field of music performance didactics, where the KTH rule system has already been successfully applied (Friberg and Battel, forthcoming). A third appealing but maybe underestimated field of application is that of consumer electronics involving automatic performance of synthetic sounds like in mobile phones. Rule-based techniques can be used to improve the quality of ringing tone performances in mobile phones. Emotion Expressive Cue Gabrielsson and Juslin Director Musices Tempo Very rapid Tone duration is shortened by 15% SPL Loud Sound level is increased by 8 dB Articulation Mostly non-legato Duration contrast articulation rule (k = 1) * Moderate * Structural reorganizations * Punctuation rule (k = 2) r Time deviations * Phrase rule (k = -0.7, turn. position = 0.5, next = 1.3, amp = 4) & * Sub-phrase rule (k = -0.7, turn. position = 0.3, amp = 4, last = 1) SPL deviations * Increased contrast * Duration contrast rule (k = 2, amp = 0) between long and short notes nots First note in measure (amp = 1) Table 1. Playing a music score with Anger. Each performance rule involved has a particular setting in order to comply with the observations made by Gabrielsson and Juslin on each expressive cue.

Page  00000003 1 12 '-,o S10 > 8 S6 4 0 0 2 -5 -10 -15 - -20 -25 -30 NJ J \._.)._/,1,2, 3, i 4, 5 0,,,_ 7,,,,1 09,,, i),, I11, i 12,,,, 13,.14 v \ \ .,.I!! J J.J 7 J!! I!;.l!!!...!_-I. l,!J!!!.,,,,!!;.,!, I,7!,,,,, I:... j. 4w 4 w W'4 - " i i. - ti4 4. 4. go Figure 2. Time and SPL deviations, for each note in the Swedish nursery tune "Ekorrn satt i granen, produced by the rule palette for Anger. This is probably the most popular music synthesizer, at least with the highest diffusion, and we are listening to its mechanical and dull performances every day. In a current project we are using our technique to design an emotional mobile phone. The caller will be able to send emoticons to the receiver in order to express different emotional situations, influencing the performance of ringing tones on the receiver's phone. In this way it will be possible to have a musical interaction between the caller and the receiver. Furthermore some mobile phone producers are included the possibility of using the MIDI standard as format for ringing tone files. A further application could be in the new Mpeg-7 standard. Mpeg-7 includes smart Karaoke applications, where the user will sing the melody of a song to retrieve it from a database. An emotional toolbox, capable of both recognizing the emotion of the singer and of translating it back in the Karaoke performance would probably improve the human-computer interaction. (Ghias et al., 1995). CONCLUSIONS The main result in this investigation is that emotional coloring of performances can be obtained by using rules for the communication of the musical structure. This was obtained by defining rule palettes in DM. It also opens the doors to the design of new rule palettes for other expressive situations usually expressed in Italian terms in classical music performance. The same rule palette can be applied to different compositions since the rules were designed to handle different musical scores. This was also validated in the listening test for different emotions. On the other end, the rules themselves are triggered by the musical structure. This suggests that the structure of the composition plays a primary role in expressive automatic playing. As a final consideration, we think that a performance system, based on rules or other techniques, should be included in the next MPEG-7 standard in order to enhance the performance possibility in interactive systems involving music. LINKS KTH performance rules description: http://www. speech.kth. se/music/performance/ Sound and MIDI examples of emotional performances: http://www. speech.kth. se/-roberto/emotion/ REFERENCES Bresin, R., and G.U. Battel. Forthcoming. "Articulation strategies in expressive piano performance." Journal of New Music Research Bresin, R., and A. Friberg. Forthcoming. "Emotional Expression in Music Performance: Synthesis and Decoding." Computer Music Journal. Bresin, R., and A. Friberg. 2000. "Software Tools for Musical Expression." Proceedings of the ICMC 2000, Berlin, in this publication Clark, E.F. 1988. "Generative principles in music performance". In J. Sloboda (Ed.) Generative Processes in Music, Oxford: Clarendon Press, 1-26 Friberg, A. 1991. "Generative Rules for Music Performance: A Formal Description of a Rule System." Computer Music Journal 15(2): 56-71. Friberg, A., and G. U. Battel. Forthcoming. "Structural Communication: Timing and Dynamics." In R. Parncutt and Gary McPherson (Eds.) Science and Psychology of Music Performance. Friberg, A, V. Colombo, L. Fryden, and J. Sundberg. Forthcoming. "Generating Musical Performances with Director Musices." Computer Music Journal. Friberg, A., and J. Sundberg. 1999. "Does music performance allude to locomotion? A model of final ritardandi derived from measurements of stopping runners." Journal of the Acoustical Society of America, 105(3): 1469-1484. Friberg, A., J. Sundberg, and L. Fryden. 2000. "Motion in music: Sound level envelopes of tones expressing human locomotion." TMH-QPSR, Speech Music and

Page  00000004 Hearing Quarterly Progress and Status Report, Stockholm, 1/2000: 73-82 Gabrielsson, A., and P. Juslin. 1996. "Emotional expression in music performance: between the performer's intention and the listener's experience." Psychology of Music, 24: 68-91 Ghias, A., J. Logan, D. Chamberlain, B.C. Smith 1995. "Query By Humming - Musical Information Retrieval in an Audio Database." ACM Multimedia '95, San Francisco, USA House, D. 1990. "On the perception of Mood in Speech: Implications for the Hearing Impaired." Lund University, Dept. of Linguistics. Working Papers, 36:99-108 Juslin, P.N. 1997a. "Emotional communication in music performance: a functionalist perspective and some data." Music Perception, 14 (4): 383-418. Juslin, P.N. 1997b. "Perceived emotional expression in synthesized performances of a short melody: capturing the listener's judgment policy." Musicae Scientiae, 1 (2): 225-256. Juslin, P.N., A. Friberg, and R. Bresin. 1999. "Towards a Computational Model of Performance Expression: The GERM Model." Paper presented at the Meeting of the Society for Music Perception and Cognition (SMPC'99), Evanston, USA Kotlyar, G.M., and V.P. Morozov. 1976. "Acoustical correlates of the emotional content of vocalized speech." Sov. Phys. Acoust., 22(3):208-211 Langeheinecke, E.J., H.U. Schnitzler, M. HirsherBuhrmester and K.E. Behne. 1999. "Emotions in the singing voice: Acoustic cues for joy, fear, anger, and sadness." Journal of the Acoustical Society of America, 105(2), Pt. 2, p. 1331 Palmer, C. 1989. "Mapping musical thought to musical performance." Journal of Experimental Psychology: Human Perception and Performance, 15: 331-346 Shaffer, L.H. 1982. "Rhythm and timing in skill." Psychological Review, 89: 109-122 Shove, P., and B. Repp. 1995. "Musical motion and performance: theoretical and empirical perspectives." In J. Rink (Ed.) The Practice of Performance. Cambridge, U.K.: Cambridge University Press, 55-83. Wing, A.M., and A.B. Kristofferson. 1973. "The timing of Interresponse Intervals." Perception & Psychophysics, 13: 455-460.