~Proceedings ICMCISMCI2014 14-20 September 2014, Athens, Greece
Musical Timbre and Emotion: The Identification of Salient Timbral Features in
Sustained Musical Instrument Tones Equalized in Attack Time and Spectral
Centroid
Bin Wu', Andrew Horner, Chung Lee2
'Department of Computer Science and Engineering,
Hong Kong University of Science and Technology
2The Information Systems Technology and Design Pillar,
Singapore University of Technology and Design
{bwuaa, horner}@cse.ust.hk,
ABSTRACT
Timbre and emotion are two of the most important aspects of musical sounds. Both are complex and multidimensional, and strongly interrelated. Previous research
has identified many different timbral attributes, and shown
that spectral centroid and attack time are the two most important dimensions of timbre. However, a consensus has
not emerged about other dimensions. This study will attempt to identify the most perceptually relevant timbral
attributes after spectral centroid and attack time. To do
this, we will consider various sustained musical instrument
tones where spectral centroid and attack time have been
equalized. While most previous timbre studies have used
discrimination and dissimilarity tests to understand timbre,
researchers have begun using emotion tests recently. Previous studies have shown that attack and spectral centroid
play an essential role in emotion perception, and they can
be so strong that listeners do not notice other spectral features very much. Therefore, in this paper, to isolate the
third most important timbre feature, we designed a subjective listening test using emotion responses for tones equalized in attack, decay, and spectral centroid. The results
showed that the even/odd harmonic ratio is the most salient
timbral feature after attack time and spectral centroid.
1. INTRODUCTION
Timbre is one of the most important aspects of musical
sounds, yet it is also the least understood. It is often simply defined by what it is not: not pitch, not loudness, and
not duration. For example, if a trumpet and clarinet both
played A440Hz tones for 1s at the same loudness level,
timbre is what would distinguish the two sounds. Timbre
is known to be multidimensional, with attributes such as
attack time, decay time, spectral centroid (i.e., brightness),
and spectral irregularity to name a few.
Several previous timbre perception studies have shown
Copyright: 2014 Bin Wu', Andrew Horner', Chung Lee2 et al.
This is an open-access article distributed under the terms of the
which permits unrestricted use, distribution, and reproduction in any medium, provided the original
author and source are credited.
[email protected]
spectral centroid and attack time to be highly correlated
with the two principal perceptual dimensions of timbre.
Spectral centroid has been shown to be strongly correlated
with one of the most prominent dimensions of timbre as
derived by multidimensional scaling (MDS) experiments
[1, 2, 3, 4, 5, 6, 7, 8].
Grey and Gordon [1, 9] derived three dimensions corresponding to spectral energy distribution, temporal synchronicity in the rise and decay of upper harmonics, and
spectral fluctuation in the signal envelope. Iverson and
Krumhansl [4] found spectral centroid and critical dynamic
cues throughout the sound duration to be the salient dimensions. Krimphoff [10] found three dimensional correlates:
(1) spectral centroid, (2) rise time, and (3) spectral flux corresponding to the standard deviation of the time-averaged
spectral envelopes. More recently, Caclin et al. [8] found
attack time, spectral centroid, and spectrum fine structure
to be the major determinates of timbre through dissimilarity rating experiments. Spectral flux was found to be a less
salient timbral attribute in this case.
While most researchers agree spectral centroid and attack
time are the two most important timbral dimensions, no
consensus has emerged about the best physical correlate
for a third dimension of timbre. Lakatos and Beauchamp
[7, 11, 12] suggested that if additional timbre dimensions
exist, one strategy would be to first create stimuli with
identical pitch, loudness, duration, spectral centroid, and
rise time, but which are otherwise perceptually dissimilar.
Then, potentially multidimensional scaling of listener dissimilarity data can reveal additional perceptual dimensions
with strong correlations to particular physical measures.
Following up this suggestion is the main focus of this paper.
While most previous timbre studies have used discrimination and dissimilarity to understand timbre, researchers
have recently begun using emotion. Some previous studies have shown that emotion is closely related to timbre.
Scherer and Oshinsky found that timbre is a salient factor
in the rating of synthetic tones [13]. Peretz et al. showed
that timbre speeds up discrimination of emotion categories
[14]. Bigand et al. reported similar results in their study of
emotion similarities between one-second musical excerpts
[15]. It was also found that timbre is essential to musical
genre recognition and discrimination [16, 17, 18]. Eerola
- 928 -