TIMING IS TEMPO-SPECIFIC Henkjan Honing Music Cognition Group ILLC / University of Amsterdam www.hum.uva.nl/mmm ABSTRACT This study is concerned with the question whether there is perceptual invariance of expressive timing under tempo-transformation in audio recordings. This was investigated by asking listeners to distinguish between an original recording and a time-stretched (i.e. tempotransformed) version. The original recordings were identified by a significant proportion of the participants. The results suggest that expressive timing can function as a clue in identifying a real performance. This is taken as evidence for the tempo-specific timing hypothesis, and counter evidence for the relational invariance hypothesis that predicts proportionally scaled expressive timing to be perceived natural as well. The results are discussed in the context of whether there is perceptual invariance of expressive timing under tempo transformation and possible improvements to state-of-the-art time-stretching algorithms. 1. INTRODUCTION Perceptual invariance is an important theoretical issue in cognitive science. It concerns the study of whether, and if so, how certain event properties remain perceptually invariant under transformation (e.g., [20]). Also for computer music software it is a relevant topic since it will influence the ease with which perceptually convincing transformations of musical data can be supported. A well-known and uncontroversial example of perceptual invariance in music is melody. When a melody is transposed to a different register, it not only maintains its frequency ratios in performance, it is also perceived as the same melody. As such, melody remains perceptually invariant under transposition. Sequencer and notation software take advantage of this characteristic, and hence the transposition transformation is easily supported. With respect to other aspects of music, such as rhythm, supporting transformations is less trivial. While one might expect rhythm to scale proportionally with tempo (i.e. being perceptually invariant under tempo transformation), several empirical studies have shown that this is not always the case (e.g., [8]). Rhythms are timed differently at different tempi ([17]), and listeners do not generally recognize proportionally scaled rhythms as being identical when scaled to another tempo ([3], [9]). However, the relation between timing and tempo has long be assumed perceptual invariant, both in computer music and music cognition research. Most sequencers have a tempo controller, suggesting timing to be scalable with tempo. It is a result of representing timing and tempo-change in computer music systems as a continuous function of score position (a so-called tempo curve; [4], [11f]). While such a representation captures the tempo deviations as measured in a recording, it in fact also suggests that the shape of a tempo curve is independent of the number of events (or note density), the rhythmic structure (i.e. differentiated durations), and the overall tempo of the performance ([16]). However, a simple test, like changing the tempo of a recorded track of a drummer playing a certain groove, will reveal to a listener that timing cannot be simply represented like that: the result will sound awkward ([1 1]). 2. THIS STUDY The present study investigates whether expressive timing is perceptually invariant under tempo transformation in a variety of musical repertoires, aiming to resolve this rather undecided issue in music perception.* A relatively large-scale experiment (N 307) was conducted using fragments from commercially available audio recordings from a variety of musical repertoires. Both experiments included original and tempo-transformed versions of these audio recordings and tested whether listeners were able to identify the original recording by focusing on the use of expressive timing in those performances. 3. EXPERIMENT 3.1. Aim The aim of the experiment was to systematically study the effect of tempo on the identification of an original recording in two musical genres: "Jazz" and "Classical". The participants were asked to listen to a number of sound examples and to indicate whether it was an original recording or a time-stretched version (i.e. a sloweddown or speeded-up version of the original), referred to as identification task. All tempo-transformed sound excerpts were time-stretched by the same amount (either 20% faster or slower), ten sound examples were used for This is research in progress (May 2005). Related and more elaborate studies are available as [13] and Honing (in preparation), see www.hum.uva.nl/mmm under 'Publications'.
Top of page Top of page