A Machine Learning Approach to Musical Style
Recognition
Roger B. Dannenberg, Belinda Thornm, and David Watson
School of Computer Science, Carnegie Mellon University
{rbd,bthom,dwatson}@cs.cmu.edu
Abstract
Much of the work on perception and understanding of music by computers has focused on low-level
perceptual features such as pitch and tempo. Our work demonstrates that machine learning can be
used to build effective style classifiers for interactive performance systems. We also present an analysis explaining why these techniques work so well when hand-coded approaches have consistently
failed. We also describe a reliable real-time performance style classifier.
1 Introduction
The perception and understanding of music by computers offers a challenging set of problems. Much
of the work to date has focused on low-level perceptual features such as pitch and tempo, yet many computer music applications would benefit from higherlevel understanding. For example, interactive performance systems [3] are sometimes designed to react
to higher-level intentions of the performer. Unfortunately, there is often a discrepancy between the
ideal realization of an interactive system, in which
the musician and machine carry on a high-level musical discourse, and the realization, in which the musician does little more than trigger stored sound events.
This discrepancy is caused in part by the difficulty of
recognizing high-level characteristics or style of a performance with any reliability.
Our experience has suggested that even relatively
simple stylistic features, such as "playing energetically," "playing lyrically," or "playing with syncopation," are difficult to detect reliably. Although it may
appear obvious how one might detect these styles,
good musical performance is always filled with contrast. For example, energetic performances contain
silence, slow lyrical passages may have rapid runs
of grace notes, and syncopated passages may have
a variety of confusing patterns. In general, higherlevel musical intent appears chaotic and unstructured
when presented in low-level terms such as MIDI performance data.
Avoiding higher-level inference is common in
other composition and research efforts. The research
literature is filled with articles on pitch detection,
score following, and event processing. There are also
interactive systems that respond to simple features
such as duration, pitch, density, and intervals, but
there is relatively little discussion of higher-level music processing.
In short, there are many reasons to believe that
style recognition is difficult. Machine learning has
been shown to improve the performance of many perception and classification systems (including speech
recognizers and vision systems). We have studied the
feasibility of applying machine learning techniques to
build musical style classifiers.
The result of this research is the primary focus
of this paper. Our initial problem was to classify an
improvisation as one of four styles: "lyrical," "frantic," "syncopated," or "pointillistic" (the latter consisting of short, well-separated sound events). We
later added the additional styles: "blues," "quote"
(play a familiar tune), "high," and "low." The exact
meaning of these terms is not important. What really
matters is the ability of the performer to consistently
produce intentional and different styles of playing at
will.
The ultimate test is the following: Suppose, as an
improviser, you want to communicate with a machine
through improvisation. You can communicate four
different tokens of information: "lyrical," "frantic,"
"syncopated," and "pointillistic." The question is, if
you play a style that you identify as "frantic," what
is the probability that the machine will perceive the
same token? It is crucial that this classification be
responsive in real time. We arbitrarily constrained
the classifier to operate within five seconds.