A Machine Learning Approach to Musical Style Recognition

Dannenberg, Roger B.; Thom, Belinda; Watson, David

PDF
Print
Share+
- Twitter
- Facebook
- Reddit
- Mendeley

A Machine Learning Approach to Musical Style Recognition

Dannenberg, Roger B.; Thom, Belinda; Watson, David

Volume 1997, 1997

Permalink: http://hdl.handle.net/2027/spo.bbp2372.1997.090

Permissions: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact [email protected] to use this work in a way not covered by the license.

For more information, read Michigan Publishing's access and usage policy.

- 150% +

image text pdf

A Machine Learning Approach to Musical Style Recognition Roger B. Dannenberg, Belinda Thornm, and David Watson School of Computer Science, Carnegie Mellon University {rbd,bthom,dwatson}@cs.cmu.edu Abstract Much of the work on perception and understanding of music by computers has focused on low-level perceptual features such as pitch and tempo. Our work demonstrates that machine learning can be used to build effective style classifiers for interactive performance systems. We also present an analysis explaining why these techniques work so well when hand-coded approaches have consistently failed. We also describe a reliable real-time performance style classifier. 1 Introduction The perception and understanding of music by computers offers a challenging set of problems. Much of the work to date has focused on low-level perceptual features such as pitch and tempo, yet many computer music applications would benefit from higherlevel understanding. For example, interactive performance systems [3] are sometimes designed to react to higher-level intentions of the performer. Unfortunately, there is often a discrepancy between the ideal realization of an interactive system, in which the musician and machine carry on a high-level musical discourse, and the realization, in which the musician does little more than trigger stored sound events. This discrepancy is caused in part by the difficulty of recognizing high-level characteristics or style of a performance with any reliability. Our experience has suggested that even relatively simple stylistic features, such as "playing energetically," "playing lyrically," or "playing with syncopation," are difficult to detect reliably. Although it may appear obvious how one might detect these styles, good musical performance is always filled with contrast. For example, energetic performances contain silence, slow lyrical passages may have rapid runs of grace notes, and syncopated passages may have a variety of confusing patterns. In general, higherlevel musical intent appears chaotic and unstructured when presented in low-level terms such as MIDI performance data. Avoiding higher-level inference is common in other composition and research efforts. The research literature is filled with articles on pitch detection, score following, and event processing. There are also interactive systems that respond to simple features such as duration, pitch, density, and intervals, but there is relatively little discussion of higher-level music processing. In short, there are many reasons to believe that style recognition is difficult. Machine learning has been shown to improve the performance of many perception and classification systems (including speech recognizers and vision systems). We have studied the feasibility of applying machine learning techniques to build musical style classifiers. The result of this research is the primary focus of this paper. Our initial problem was to classify an improvisation as one of four styles: "lyrical," "frantic," "syncopated," or "pointillistic" (the latter consisting of short, well-separated sound events). We later added the additional styles: "blues," "quote" (play a familiar tune), "high," and "low." The exact meaning of these terms is not important. What really matters is the ability of the performer to consistently produce intentional and different styles of playing at will. The ultimate test is the following: Suppose, as an improviser, you want to communicate with a machine through improvisation. You can communicate four different tokens of information: "lyrical," "frantic," "syncopated," and "pointillistic." The question is, if you play a style that you identify as "frantic," what is the probability that the machine will perceive the same token? It is crucial that this classification be responsive in real time. We arbitrarily constrained the classifier to operate within five seconds.