ï~~REINFORCEMENT LEARNING FOR LIVE MUSICAL AGENTS
Nick Collins
University of Sussex
N.Collins @ sussex.ac.uk
ABSTRACT
Current research programmes in computer music may
draw from developments in agent technology; music may
provide an excellent test case for agent research. This
paper describes the challenge of building agents for concert performance which allow close and rewarding interaction with human musicians. This is easier said than
done; the fantastic abilities of human musicians in fluidity
of action and cultural reference makes for a difficult mandate. The problem can be cast as that of building an autonomous agent for the (unforgiving) realtime musical environment. Live music is a challenging domain to model,
with high dimensionality of descriptions and fast learning,
responses and effective anticipation required.
A novel symbolic interactive music system called Improvagent is presented as a framework for the testing of
reinforcement learning over dynamic state-action case libraries, in a context of MIDI piano improvisation. Reinforcement signals are investigated based on the quality
of musical prediction, and on the degree of influence in
interaction. The former is found to be less effective than
baseline methods of assumed stationarity and of simple
nearest neighbour case selection. The latter holds more
promise; an agent may be able to assess the value of an
action in response to an observed state with respect to the
potential for stability, or the promotion of change in future
states, enabling controlled musical interaction.
1. INTRODUCTION
Interactive music systems [25] are software and hardware
systems founded on AI techniques which are designed for
music-making, most typically in live concert performance
combining machine and human musicians. Contemporary
work in this field includes investigations into both machine listening (realtime audio analysis) and robotics; an
inspiring project in this regard is Ajay Kapur's MahaDeviBot, a thirteen armed Indian percussionist which can synchronise to sensor input from a human sitarist [19].
Recent years have also seen a number of such projects
intersecting with the agent community, from Belinda Thom's
Band-out-of-the-Box [30] and Wulfhorst and colleagues'
Virtual Musical MultiAgent System [32], to the Musical
Acts - Musical Agents architecture [23], OMax [1] and
Arne Eigenfeldt's Drum Circle [10]. Indeed, agent technology has a potential to influence the general field of
computer music, as discussed by Dahlstedt and McBur
ney [6], a composer and agent researcher who have collaborated on generative music software. Perhaps the earliest explicit live musical agent work is that of Peter Beyls,
whose 1988 description of Oscar (Oscillator Artist) [3]
characterised the system as an autonomous agent.1
A goal of this research is the realisation of autonomous
agents for interactive music, which can at a minimum operate independently of composer intervention during performance, though they may not be so independent of the
composer's programming. Behavioural autonomy in a concert and rehearsal situation (self sufficiency of action) is
sought whenever the agent is switched on, but constitutive
autonomy (continuity of existence at the scale of everyday
life) is not expected [12]. To quickly align with musical
demands, techniques for fast adaptation to musical situations must be explored.
This paper will proceed by more closely examining
work to date on musical agents. Because machine learning
is identified as a shortcoming of much existing work, we
will investigate the combination of music and reinforcement learning techniques adopted by the agent community. A new MIDI based system will be described, intended as a testbed for experiments in online adaptation.
2. AGENT ARCHITECTURES FOR MUSIC
A natural viewpoint in applying agent metaphors to music is to place the agents at the level of individual musicians such that each concert participant is a single autonomous agent. This will be the primary level at which
agency is discussed in this paper, but individual artificial
musicians as multiagent systems have also been considered. Minsky's society of mind metaphor has been applied
by Robert Rowe particularly in his Cypher project; the
Meta-Cypher includes multiple listener and player agents
as well as a Meta-Listener [26, pp. 310-15]. In the setting of artificial life, Jonathan Impett has demonstrated
the complex emergent musical interaction possible with
swarm intelligence [18]. But multiagent architectures within
individual artificial musicians have usually been notional,
for instance, as simple active hypotheses in computational
beat tracking [13].
Where systems have sought to explore flexible interaction with human musicians, style specific cultural conventions and innate human musical behaviours (such as synchronisation abilities) have provided severe challenges for
1 Though it fails to qualify under more restrictive definitions [4].