TRACKING MONOPHONIC MUSIC FOR MODELLING MELODIC SEGMENTATION
PROCESSES
Maja Serman, Dept. of CSIS, University of Limerick, Ireland.
Niall, J.L. Griffith, Dept. of CSIS, University of Limerick, Ireland.
Nikola Serman, Dept. of Power Engineering, FMENA, University of Zagreb, Croatia.
INTRODUCTION
This paper describes research towards
computational investigation and modelling of musical
processes across cultures. It is focused on work towards a
model of melodic segmentation. Research in music
perception points to a key role for Gestalt principles in the
basic grouping mechanisms for melodic segmentation
(Lerdahl & Jackendoff, 1983; Narmour, 1989). The
perception of some difference between regions (Deliege,
1987) is arguably the underlying principle of all grouping
mechanisms. However, what constitutes a region and what
guides our perception when differentiating regions?
Usually, this question is simplified by the definition of
melodic descriptors and grouping mechanisms in terms of
Western Tonal Music (WTM). For instance, the melodic
descriptor of pitch has 12 semitone values per octave, and
it is within this space that low or high frequencies, or
small or big intervals are defined. The Grouping
Preference Rules (GPRs) proposed by (Lerdahl &
Jackendoff, 1983) are an example of this approach. GPRs
operate on a sequence of notes, and anchor the
development of differentiation in any melodic descriptor
to the percept of pitch change.
While there is little doubt of the importance of
pitch change in segmentation processes, in non-western
music other melodic descriptors can also carry
perceptually significant changes while pitch remains
constant (Titon, 1992). It is arguable that by using melodic
descriptors defined in terms of WTM we impose
perceptual categories before even starting to assess their
universality, and the grouping mechanisms involved.
WTM is a product of complex perceptual and cognitive
processes, strongly influenced by development within
western culture.
Using WTM notation for transcribing nonwestern music is a recognised problem in
ethnomusicology (Sachs, 1962). Serman et al, (2000)
discusses the influence of the transcription process on
segmentation. Several types of analogue devices - known
as the Melograph (Crossley-Holland, 1974), were made in
the 1950's to address this issue. However, the aim of the
device was not music perception research. While the
importance of the idea for research into universal aspects
of music perception was recognised (Crossley-Holland,
1974), the Melograph's contribution to comparative
musicology was ahead of its time and interest in it passed
as ethnomusicology developed in other directions. Most
subsequent computational models of melodic
segmentation rules have used notated melodies (or a MIDI
encoding) as their input. The use of WTM notation brings
into question cross-cultural investigation of melodic
segmentation processes, and confines research to
investigating grouping mechanisms that operate solely on
changes that can be extracted from notation, i.e. pitch and
duration.
SEGMENTATION FROM PERFORMANCE -
MUSICTRACKER
Assuming that it is desirable that the information relevant
for the computational investigation and modelling of
melodic segmentation processes across cultures should be
extracted directly from the performed music, the
MusicTracker project has been initiated. This involves a
rather different set of problems to those presented by
notation. If we use sound as the input, (as received by the
auditory system), we are faced with the need to identify
the perceptual structures that are built up from the sound.
However, our perception of the descriptors of pitch,
loudness and timbre, is not derived from a single
parameter obtainable by computational signal analysis
(Handel, 1995). Furthermore, in perceiving these
descriptors a listener responds not only to the sequence,
but is influenced by other aspects of the performance, e.g.
the overall loudness of the sound, acoustic properties of
the place, etc. Therefore, instead of attempting to quantify
the melodic descriptors themselves by computational
signal analysis and then map them into (possibly illdefined) WTM categories, the emphasis in this project has
been on estimating the change that occurs within a
melodic descriptor that can then be used as input to a
model of grouping processes. This seems justified (at least
initially) by what is known of our nervous and cognitive
system's sensitivity to change. The MusicTracker takes a
digitally recorded music signal (.wav) as its input and
calculates the values of "indicators": pitch, perceptive
dynamics and timbre. The term "indicator" denotes that
the value reflects the change of the corresponding melodic
descriptor, rather than quantifying it directly.
An outline of the MusicTracker basic functions
1. The signal is divided into equal chunks of 20ms
duration, named "frames" and for each frame the
values of "indicators" are calculated relying on the
spectral analysis procedure.
2. The sequences of indicator values form discrete
functions of time which can be presented graphically
and/or can be stored into file for further manipulation,
0