Page  00000200 Towards a Closed System Automated Composition Engine: Linking 'Kansei' and Musical Language Recombinicity. Ian Whalley (Copyright 2002) Music Department, The University of Waikato. Email: Abstract This paper focuses on software based automated composition systems operated by a single user to create music for closed system dramatic narratives where the dramatic parameters are known but the dramatic shape and outcomes are not predetermined. The concern is with a system that will address Kansei (emotion based) (Hashimoto 1998) approaches to narrative structure, musical generation, and performance. The model proposed allows for music creation from controlling a flight simulator' interface that represents emotional states rather than dealing directly with the composition process, allowing non-composers to recompose or explore a work in different ways. The system could be incorporated into non-linear interactive digital media, allowing different musical paths through the structure can be taken. 1 Introduction Automated composition systems fall into distinct areas: Algorithmic composition from various nonhuman sources, such as fractals (Miranda 2000, 2001); categorizing and reconstituting existing musical material (Cope 2000); interactive music systems (Rowe 1993); systems for generating and specifying sound material (Wanderley et. al. 1998); and autonomous music systems that include Kansei material (Camurri et. al. 1998, Riecken 1997), usually with some sort of robotic input (Camurri et. al. 2000). The idea of a Kansei (emotion based) (Hashimoto 1998) system centred on one computer driven by one user has begun to be explored theoretically as a means of automated film music scoring (Cooper et. al. 2000), but there are limitations in the theoretical model through addressing musical language independently of the emotional structure of the dramatic narrative. Further, the system proposed is a means of automating scoring for linear films. Recent work in automated film/multimedia scoring suggests a way forward by beginning from existing moving images, translating these to an emotional dynamic outcomes, and assigning these outcomes to automated musical equivalents. Noncomposers can use these systems. The limitations in its first incarnation (Nakamura et. al. 1994) are that it cannot deal with complex dynamic moods, and it is primarily for linear systems. More recent work (Hasegawa, Kitahara 2000) extends this approach in allowing a score to be automatically generated by images without the film having first to be manually broken into segment and moods manually assigned, making it more suited to non-linear approaches. The advantage of both systems is the automation of Kansei based music composition with a musical/thematic sense, making them more aligned to traditional methods of music composition for general communication. The drawbacks are that neither begins from an overall sense of the dramatic structure of the film or media they are to be used for, or musical thematic development. Further, although the notational outcomes are automated, the performance outcomes neglect Kansei approaches to automated performance (Widmer 2000). These are also largely CRISP systems that lack a flexibility of responses to similar situations. Recent work on mapping emotional narrative structures (Whalley 2000, 2001), gives established composers the tools to generate Kansei based works from simulating the emotional flux and tension between part of a system, but not the means to generate music automatically. By adding Kansei based music generators to Kansei structural models (Whalley 2001), a composition could be altered by dealing directly with an emotion scheme, addressing the limitations of current closed system. The modelling process, allows for music creation from controlling a 'flight simulator' interface that represents emotional states rather than dealing directly with the composition process, allowing non-composers to recompose or explore a work in different ways. The Kansei link can be reinforced by adding, Kansei generators to the performance outcomes. 2 Modelling narrative structure A Kansei based music generation system first needs to address dynamic narrative as structural 200

Page  00000201 expression in combination with a musical grammar approach to generation. (Milicevic 1998). The limitations of a purely grammar based approach to recombinicity in this context are evident in the next section. Research into applying system dynamics modeling (Anderson et. al. 1983) to map musical dramatic narratives (Whalley, 2000) allows experimental composers to address the structural and semiotic way many non musicians 'read' music and presents a contrary view to the idea that musical language/grammatical structure and its semiotic meaning are interchangeable. The approach, allows the modelling and simulating closed systems narratives. As such, the method provides tools to help come to terms with the dynamic structural experience of emotional narratives that can be used to generate musical works (Whalley 2001). The advantage of this approach as the basis for music generation is twofold. First, composers can use Kansei information as the basis for structural dynamic expression, and very complex emotional narratives modelled. Second, by using commercially available software (Stella, for example), simple flight-simulator interfaces can be constructed to control a few or many parts of a very complex SD models allowing others to test different strategies and dramatic outcomes possible in the model without having to understand how to construct the underlying model's structure. 3 From simulation to music How does a composer/performer use the simulations to compose music? Writing music to reflect changes of emotion in dramatic situation is a longstanding tradition in the last century in narrative film music, with its roots in the operatic tradition (Gorbman 1974). The approach is incorporated in recent automated film music systems (Nakamura et. al. 1994). The historical techniques of film music composition allow creativity within a broadly agreed framework of semiotic meaning (Schubert, 1999), with an intended audience at a specific time and in a particular context. A further advantage of the approach as a basis to move to an automated composition system, in contrast to others (Cope 2000), is that it does not attempt to express meaning based in a shared semiotic scheme by taking snippets of previous musical language and reassembling them in an emotion/semiotic haphazard manner but begins from coherent semiotic narrative, i.e. you have something to say in the first instance, or a story to relate with structural/thematic coherence. A parallel is that rather than attempting to write a book by learning grammar and collecting words from books, one starts from themes and plot, and then moves to realising the story modelled on how a writer might control syntax and grammar. The recreated words (performance) must however also reflect the semiotic scheme, an issue that will be deal with later. Given the difference in approach, the question remains how to automate the human agency (composer) or 'expert system' in an emotion/ semiotic grounded system; and how to keep the system flexible enough so that it does not have to be rebuilt for small alterations an operator may want to make to a flight-simulator interface effecting the underlying system dynamics model which drives the composition. 4 Rigid systems Many automated music composition systems like M Music allow non-professionals access to music composition tools without high levels of expertise in music. Yet, the musical results are usually poor because of a lack of methodology to control dramatic musical structure and emotional semiotics. Professional automated composition systems such as MAX allow extensive music generation, but these systems are also not primarily grounded in emotional/dramatic semiotics. An intuitive way to approach the automation problem from a compositional perspective, and the way many conventionally approach the problem in current automated systems (Miranda 2000) is to construct a number of music rules that will feed existing composition packages such as MAX, and add Kansei translations to the rule base (Camurri et. al. 1998): CRISP systems build for specific instances. There are however major limitations with the approach theoretically and practically, as there are with many current composition systems that use an approach that is too prescriptive in algorithmic composition. For example, even with greater random functions added and greater choice of material, the output lacks variety to single solutions, and the flexibility to be generalised. This general approach is extended and implemented in current automated film/Kansei music approaches (Hasegawa, Kitahara 2000, Nakamura et. al. 1994) which are influenced by automatic music generation based on music template information (Aoki 1998). A limitation is that these systems allow a flexibility of music responses not from specific emotion/structural mapping (which may sometimes reinforce but sometimes require contrast with visual information), but by interpreting the visual information imputed through scenic feature extraction. ie. The primary technique becomes 'mickey-mousing' at a micro level. A further limitation with these automated systems (Hasegawa, Kitahara 2000, Nakamura et. al. 1994) is as not being able to quickly change the rule base, and the style of one composer (expert) dominating the musical outcomes. That is, they again lack generality. In addition, in these systems, there is a problem in dislocating performance value and MIDI triggering of 201

Page  00000202 the generated score. Performance value as a means of communication is not integrated with the emotional/semiotic message conveyed by the Kansei structure of the narrative. In order to be affective in a holistic sense of semiotic coherence, a Kansei based automated composition system needs to integrate structural narrative expression, note generation tied to semiotic structure, and expressive output related to semiotic message. At the same time, the system needs to be robust enough to deal with many different problems, and flexible enough to generate a number of outcomes. These issues are what an integrated Kansei system seeks to address, based on a SD model of the dramatic structure of a work. By using a SD approach to generate the basis of the score (Whalley 2001) the advantage is that the emotional content can be very precisely mapped and with a sense of the overall dynamic; one is not tied to visual information as an input; the system is flexible to be used as a stand alone devise or in the context of many media, with or without significant visual input; and semiotic structure is itself becomes a significant means of Kansei expression. The disadvantage is that the parameters and structure of the closed narrative have to be mapped before the work can be recomposed. This becomes a primary creative/compositional act for narrative artist. 5 A structure/language/output Kansei automated composition system Figure 1. Integrated Kansei system Figure 1 provides the conceptual model of the integrated Kansei system. There are similar approaches to some aspects of the model currently developed in robotics/Kansei field with musical outcomes (Wassermann et. al. 2000), but it has yet to be fully explored with a Kansei base, or integrated stand-alone systems. The decision maker module (Adaptive agent) 'trades off' the rule base from a number of influencing modules. The Timing and Episode module alters tempo and converts the length of time available into bars, a standard part of most film music-scoring programs such as Cue. A phrase generator is included. Further, the module decides if a theme should be current or absent in the music depending on its prominence in the underlying SD model. Not all themes are present at all times. They may appear and reappear/alter, for example, with reoccurring episodes. The theme to motive rule base translates emotional themes into musical motivic material. In the Western music tradition, there is a long history of this translation process in operatic composition (Wagner), romantic symphonic work (Berlioz, Greig). The dynamic manipulation of emotions based on musical themes is part of the standard techniques of film music composition (Korngold, Steiner), as is the manipulation of musical material to provide variation and interest in standard compositional practice. The rule base for this module based on these principles need not be extensive (Cope 2000). The rule base is grounded in the way basic elements of music (pitch, rhythm, tempo, mode, timbre, pan, reverb etc.) react at the poles of emotional extremes (Schubert 1999), avoiding the need for an extensive expert system. The timbre section module according to mood is a similar part of the standard technique of film music composition. For example, love as a Tuba rarely represents an emotion unless one wants to be comical. Existing empirical and applied research work in the literature influences other aspects of the system. Music composition rule bases modules on tonal (or atonal [Cope 2000]) music grammar and syntax is part of many currently available algorithmic software packages (Miranda 2001), although few of these are driven by semiotic concerns. The rule base to match semiotic input to music output in generic terms based on conventional practice has been implemented in other systems (Nakamura et. al. 1994). Music style selectors are regular feature of many commercially driven sequencers. The module includes composer stylistic signatures, a notion central to Cope's (2000) work on aping other composers and extending one's own compositional work. 202

Page  00000203 The output of the central decision making module triggers a MIDI data generator, influenced by the performance expression module, through applying, for example, machine learning methods to understand the relationship between emotional response and the manipulation of real-time instrumental performance elements (Widmer 2000), amongst other methods (Bresisn, Fridburg 2000). The control module links aspects of the narrative dynamic model with performance expression to ensure that the performance reflects the emotional/semiotic basis of the music. Physical modelling synthesis is intended as the main driver for the lead expressive parts, as this allows timbral control in real-time to be an aspect of expressive real-time performance in the way most acoustic instrumentalists manipulate this parameter as a means of expression. Background elements need not use the same synthesis methods. 6 Conclusion The approach/ model presented attempts integrate Kansei information at a structural, generation, and performance level, and in doing so address criticism of other automated composition systems and even some aspects of current contemporary music composition (Milicevic 1998). Through the generation of material from emotion GUI's by noncomposers, it allows them to interact with sonic symbols, structural listening, and dynamic message in the semiotic language that reception experience is based. The same composition may then have many outcomes by altering the assumptions that generate the underlying SD model/emotional dynamic allowing the evolution of non-linear narrative film music techniques in the digital realm with or without visual information. A criticism likely to be levelled is that applied to many algorithmic composition systems: GIGO, or the lack of originality. In answer, the ability of composers to reinvent within known archetypes is a difference between a focus on creativity in contrast to invention. Also, the system proposed allows for 'experimental' styles to be continually added, modules to be continuously updated and extended, and random functions to be manipulated to create unexpected outcomes. Cope's (2000) notion of 'recombining' as a legitimate form of creativity then remains central to the model, yet invention to be incorporated as original extensions stemming from a coherent semiotic structural dynamic. References Aoki, E., Sugiura, E. 1998. "Method and Device for Automatic Music Compostion Employing Music Template Information." US Patent 5,736,663. Anderson, D., Deal, R., Garet, M. Roberts, N., Schaffer, W. 1983. Introduction to Computer Simulation: A System Dynamics Modelling Approach. Productivity Press. Bresisn, R., Fridberg, A. 2000. "Software Tools for Musical Expression." Proceedings ofInternational Computer Music Association Conference, Berlin: 499- 502. Camurri, A., Ferrentino. P., Dapelo, R. 1998. "A Computational Model of Artificial Emotions". KanseiThe Technology of Emotion AIMI International Workshop Proceedings, Genoa: 16-23. Camurri, A. Coetta, P. Massimiliano, P. Ricchetti, M., Ricci, A., Trocca, R., Volpe, G. 2000. "A Real-time Platform for Interactive Dance and Music Systems". Proceedings ofInternational Computer Music Association Conference, Berlin: 262-265. Cope, D. 2000. The Algorithmic Composer. A-R Editions Cooper, D., Ng, K., Pierce, S. 2000. "Scorebot: Towards an Automatic System for Film Music Composition." Proceedings of 13th Colloquium on Musical Informatics, L'Aquila: 167-170. Gorbman, C. 1974. Unheard Melodies: Narrative Film Music. New York: Garden City. Nakamura, J. Hyun, K., Kaku, J., Noma, J., Yoshida, S. 1994. "Automatic Background Music Generation Based on Actors' Mood and Motions". The Journal of Visualization and Computer Animation 5:247-264. Hasegawa, T., Kitahara, Y. 2000. "Automatically Composing Background Music for an Image by Extracting a Feature Set Thereof." US Patent 6,084,169. Hashimoto, S. 1998. "KANSEI as the Third Target of Information Processing and Related Topics in Japan." Proceedings ofInternational Workshop on Kansei - Technology ofEmotion, Tokyo, Japan: 101-104. Miranda, E.R. (ed). 2000. Readings in Music and Artificial Intelligence. Harwoood Academic Publishers. Miranda, E.R. 2001. Composing Music With Computers. Oxford: Focal Press. Milicevic, M. 1998. "Deconstructing Musical Structure" Organised Sound, 3(1):27-34. Riecken, D. 1997. "Wolfgang: Emotion and Architecture which Bias Musical Design". Kansei - The Technology ofEmotion AIMI International Workshop Proceedings, Genoa: 9-15. Rowe, R. 1993. Interactive Music Systems: Machine Listening and Composing. Cambridge, MA: MIT Press. Schubert, E. 1999. "Measuring Emotion Continuously: Validity and Reliability of the Two-dimensional Emotion Space". Australian Journal ofPsychology. 51(3):154-165. Wanderley, M, Schnell, N., Rovan, J. 1998. "ESCHER - Modelling Performing Composed Instruments in Realtime". Proceedings ofIEEE International Conference on Systems, Man and Cybernetics.San Dieago, CD, USA, October 1998. Wassermann, K., Blanchard, M., Bernadet, U., Manzolli, J., Verchure, P. 2000. "Roboser- An Automomous Interactive Musical Composition System". Proceedings of International Computer Music Association Conference, Berlin: 531-534 Whalley, I. 2000. "Emotion, Theme and Structure: Enhancing Computer Music Through System Dynamics Modelling". Proceedings of International Computer Music Association Conference, Berlin: 213 -216. Whalley, I. 2001. "Applications of System Dynamics Modelling to Computer Music". Organised Sound 5(3): 149-157 Widmer, G. 2000. "Large-Scale Induction of Expressive Performance Rules: First Quantitative Results". 203

Page  00000204 Proceedings ofInternational Computer Music Association Conference, Berlin 344-347. 204