Tracking Monophonic Music for Modelling Melodic Segmentation Processes

Serman, Maja; Griffith, Niall J.L.

TRACKING MONOPHONIC MUSIC FOR MODELLING MELODIC SEGMENTATION PROCESSES Maja Serman, Dept. of CSIS, University of Limerick, Ireland. Niall, J.L. Griffith, Dept. of CSIS, University of Limerick, Ireland. Nikola Serman, Dept. of Power Engineering, FMENA, University of Zagreb, Croatia. INTRODUCTION This paper describes research towards computational investigation and modelling of musical processes across cultures. It is focused on work towards a model of melodic segmentation. Research in music perception points to a key role for Gestalt principles in the basic grouping mechanisms for melodic segmentation (Lerdahl & Jackendoff, 1983; Narmour, 1989). The perception of some difference between regions (Deliege, 1987) is arguably the underlying principle of all grouping mechanisms. However, what constitutes a region and what guides our perception when differentiating regions? Usually, this question is simplified by the definition of melodic descriptors and grouping mechanisms in terms of Western Tonal Music (WTM). For instance, the melodic descriptor of pitch has 12 semitone values per octave, and it is within this space that low or high frequencies, or small or big intervals are defined. The Grouping Preference Rules (GPRs) proposed by (Lerdahl & Jackendoff, 1983) are an example of this approach. GPRs operate on a sequence of notes, and anchor the development of differentiation in any melodic descriptor to the percept of pitch change. While there is little doubt of the importance of pitch change in segmentation processes, in non-western music other melodic descriptors can also carry perceptually significant changes while pitch remains constant (Titon, 1992). It is arguable that by using melodic descriptors defined in terms of WTM we impose perceptual categories before even starting to assess their universality, and the grouping mechanisms involved. WTM is a product of complex perceptual and cognitive processes, strongly influenced by development within western culture. Using WTM notation for transcribing nonwestern music is a recognised problem in ethnomusicology (Sachs, 1962). Serman et al, (2000) discusses the influence of the transcription process on segmentation. Several types of analogue devices - known as the Melograph (Crossley-Holland, 1974), were made in the 1950's to address this issue. However, the aim of the device was not music perception research. While the importance of the idea for research into universal aspects of music perception was recognised (Crossley-Holland, 1974), the Melograph's contribution to comparative musicology was ahead of its time and interest in it passed as ethnomusicology developed in other directions. Most subsequent computational models of melodic segmentation rules have used notated melodies (or a MIDI encoding) as their input. The use of WTM notation brings into question cross-cultural investigation of melodic segmentation processes, and confines research to investigating grouping mechanisms that operate solely on changes that can be extracted from notation, i.e. pitch and duration. SEGMENTATION FROM PERFORMANCE - MUSICTRACKER Assuming that it is desirable that the information relevant for the computational investigation and modelling of melodic segmentation processes across cultures should be extracted directly from the performed music, the MusicTracker project has been initiated. This involves a rather different set of problems to those presented by notation. If we use sound as the input, (as received by the auditory system), we are faced with the need to identify the perceptual structures that are built up from the sound. However, our perception of the descriptors of pitch, loudness and timbre, is not derived from a single parameter obtainable by computational signal analysis (Handel, 1995). Furthermore, in perceiving these descriptors a listener responds not only to the sequence, but is influenced by other aspects of the performance, e.g. the overall loudness of the sound, acoustic properties of the place, etc. Therefore, instead of attempting to quantify the melodic descriptors themselves by computational signal analysis and then map them into (possibly illdefined) WTM categories, the emphasis in this project has been on estimating the change that occurs within a melodic descriptor that can then be used as input to a model of grouping processes. This seems justified (at least initially) by what is known of our nervous and cognitive system's sensitivity to change. The MusicTracker takes a digitally recorded music signal (.wav) as its input and calculates the values of "indicators": pitch, perceptive dynamics and timbre. The term "indicator" denotes that the value reflects the change of the corresponding melodic descriptor, rather than quantifying it directly. An outline of the MusicTracker basic functions 1. The signal is divided into equal chunks of 20ms duration, named "frames" and for each frame the values of "indicators" are calculated relying on the spectral analysis procedure. 2. The sequences of indicator values form discrete functions of time which can be presented graphically and/or can be stored into file for further manipulation, 0