TIMING IS TEMPO-SPECIFIC
Henkjan Honing
Music Cognition Group
ILLC / University of Amsterdam
www.hum.uva.nl/mmm
ABSTRACT
This study is concerned with the question whether there
is perceptual invariance of expressive timing under
tempo-transformation in audio recordings. This was investigated by asking listeners to distinguish between an
original recording and a time-stretched (i.e. tempotransformed) version. The original recordings were identified by a significant proportion of the participants. The
results suggest that expressive timing can function as a
clue in identifying a real performance. This is taken as
evidence for the tempo-specific timing hypothesis, and
counter evidence for the relational invariance hypothesis that predicts proportionally scaled expressive timing
to be perceived natural as well. The results are discussed
in the context of whether there is perceptual invariance
of expressive timing under tempo transformation and
possible improvements to state-of-the-art time-stretching
algorithms.
1. INTRODUCTION
Perceptual invariance is an important theoretical issue in
cognitive science. It concerns the study of whether, and
if so, how certain event properties remain perceptually
invariant under transformation (e.g., [20]). Also for
computer music software it is a relevant topic since it
will influence the ease with which perceptually convincing transformations of musical data can be supported.
A well-known and uncontroversial example of perceptual invariance in music is melody. When a melody
is transposed to a different register, it not only maintains its frequency ratios in performance, it is also perceived as the same melody. As such, melody remains
perceptually invariant under transposition. Sequencer
and notation software take advantage of this characteristic, and hence the transposition transformation is easily
supported.
With respect to other aspects of music, such as
rhythm, supporting transformations is less trivial.
While one might expect rhythm to scale proportionally
with tempo (i.e. being perceptually invariant under
tempo transformation), several empirical studies have
shown that this is not always the case (e.g., [8]).
Rhythms are timed differently at different tempi ([17]),
and listeners do not generally recognize proportionally
scaled rhythms as being identical when scaled to another
tempo ([3], [9]).
However, the relation between timing and tempo has
long be assumed perceptual invariant, both in computer
music and music cognition research. Most sequencers
have a tempo controller, suggesting timing to be scalable with tempo. It is a result of representing timing
and tempo-change in computer music systems as a continuous function of score position (a so-called tempo
curve; [4], [11f]). While such a representation captures
the tempo deviations as measured in a recording, it in
fact also suggests that the shape of a tempo curve is independent of the number of events (or note density), the
rhythmic structure (i.e. differentiated durations), and the
overall tempo of the performance ([16]). However, a
simple test, like changing the tempo of a recorded track
of a drummer playing a certain groove, will reveal to a
listener that timing cannot be simply represented like
that: the result will sound awkward ([1 1]).
2. THIS STUDY
The present study investigates whether expressive timing is perceptually invariant under tempo transformation
in a variety of musical repertoires, aiming to resolve
this rather undecided issue in music perception.* A relatively large-scale experiment (N 307) was conducted
using fragments from commercially available audio recordings from a variety of musical repertoires. Both experiments included original and tempo-transformed versions of these audio recordings and tested whether listeners were able to identify the original recording by
focusing on the use of expressive timing in those performances.
3. EXPERIMENT
3.1. Aim
The aim of the experiment was to systematically study
the effect of tempo on the identification of an original
recording in two musical genres: "Jazz" and "Classical".
The participants were asked to listen to a number of
sound examples and to indicate whether it was an original recording or a time-stretched version (i.e. a sloweddown or speeded-up version of the original), referred to
as identification task. All tempo-transformed sound excerpts were time-stretched by the same amount (either
20% faster or slower), ten sound examples were used for
This is research in progress (May 2005). Related and more elaborate studies are available as [13] and Honing (in preparation), see
www.hum.uva.nl/mmm under 'Publications'.