ï~~REINFORCEMENT LEARNING FOR LIVE MUSICAL AGENTS Nick Collins University of Sussex N.Collins @ sussex.ac.uk ABSTRACT Current research programmes in computer music may draw from developments in agent technology; music may provide an excellent test case for agent research. This paper describes the challenge of building agents for concert performance which allow close and rewarding interaction with human musicians. This is easier said than done; the fantastic abilities of human musicians in fluidity of action and cultural reference makes for a difficult mandate. The problem can be cast as that of building an autonomous agent for the (unforgiving) realtime musical environment. Live music is a challenging domain to model, with high dimensionality of descriptions and fast learning, responses and effective anticipation required. A novel symbolic interactive music system called Improvagent is presented as a framework for the testing of reinforcement learning over dynamic state-action case libraries, in a context of MIDI piano improvisation. Reinforcement signals are investigated based on the quality of musical prediction, and on the degree of influence in interaction. The former is found to be less effective than baseline methods of assumed stationarity and of simple nearest neighbour case selection. The latter holds more promise; an agent may be able to assess the value of an action in response to an observed state with respect to the potential for stability, or the promotion of change in future states, enabling controlled musical interaction. 1. INTRODUCTION Interactive music systems [25] are software and hardware systems founded on AI techniques which are designed for music-making, most typically in live concert performance combining machine and human musicians. Contemporary work in this field includes investigations into both machine listening (realtime audio analysis) and robotics; an inspiring project in this regard is Ajay Kapur's MahaDeviBot, a thirteen armed Indian percussionist which can synchronise to sensor input from a human sitarist [19]. Recent years have also seen a number of such projects intersecting with the agent community, from Belinda Thom's Band-out-of-the-Box [30] and Wulfhorst and colleagues' Virtual Musical MultiAgent System [32], to the Musical Acts - Musical Agents architecture [23], OMax [1] and Arne Eigenfeldt's Drum Circle [10]. Indeed, agent technology has a potential to influence the general field of computer music, as discussed by Dahlstedt and McBur ney [6], a composer and agent researcher who have collaborated on generative music software. Perhaps the earliest explicit live musical agent work is that of Peter Beyls, whose 1988 description of Oscar (Oscillator Artist) [3] characterised the system as an autonomous agent.1 A goal of this research is the realisation of autonomous agents for interactive music, which can at a minimum operate independently of composer intervention during performance, though they may not be so independent of the composer's programming. Behavioural autonomy in a concert and rehearsal situation (self sufficiency of action) is sought whenever the agent is switched on, but constitutive autonomy (continuity of existence at the scale of everyday life) is not expected [12]. To quickly align with musical demands, techniques for fast adaptation to musical situations must be explored. This paper will proceed by more closely examining work to date on musical agents. Because machine learning is identified as a shortcoming of much existing work, we will investigate the combination of music and reinforcement learning techniques adopted by the agent community. A new MIDI based system will be described, intended as a testbed for experiments in online adaptation. 2. AGENT ARCHITECTURES FOR MUSIC A natural viewpoint in applying agent metaphors to music is to place the agents at the level of individual musicians such that each concert participant is a single autonomous agent. This will be the primary level at which agency is discussed in this paper, but individual artificial musicians as multiagent systems have also been considered. Minsky's society of mind metaphor has been applied by Robert Rowe particularly in his Cypher project; the Meta-Cypher includes multiple listener and player agents as well as a Meta-Listener [26, pp. 310-15]. In the setting of artificial life, Jonathan Impett has demonstrated the complex emergent musical interaction possible with swarm intelligence [18]. But multiagent architectures within individual artificial musicians have usually been notional, for instance, as simple active hypotheses in computational beat tracking [13]. Where systems have sought to explore flexible interaction with human musicians, style specific cultural conventions and innate human musical behaviours (such as synchronisation abilities) have provided severe challenges for 1 Though it fails to qualify under more restrictive definitions [4].
Top of page Top of page