~Proceedings ICMCISMCI2014 14-20 September 2014, Athens, Greece
Animating Timbre - A User Study
Sean Soraghan
ROLI
Centre for Digital Entertainment
sean@roli. com
ABSTRACT
The visualisation of musical timbre requires an effective
mapping strategy. Auditory-visual perceptual correlates
can be exploited to design appropriate mapping strategies.
Various acoustic descriptors and verbal descriptors of timbre have been identified in the psychoacoustic literature.
The studies suggest that the verbal descriptors of timbre
usually refer to material properties of physical objects. Thus,
a study was conducted to investigate the visualisation of
acoustic timbre features using various visual features of
a 3D rendered object. Participants were given coupled
auditory-visual stimulations and asked to indicate their preferences. The first experiment involved participants rating
audio-visual mappings in isolation. The second experiment involved participants observing multiple parameters
at once and choosing an 'optimal' mapping strategy. The
results of the first experiment suggest agreement on preferred mappings in the isolated case. The results of the
second experiment suggest both that individual preferences
change when multiple parameters are varied, and that there
is no general consensus on preferred mappings in the multivariate case.
1. INTRODUCTION
Timbre is a complex and multi-dimensional attribute of audio. It has been defined as the perceptual attribute of audio
by which two sounds with identical pitch, loudness and
duration can be discriminated [1]. Before the introduction and popularisation of the computer, the easiest way to
produce differences in timbre was through varying instrumentation or articulation. Musical scores therefore elicit
changes in timbre by using various articulation indicators
(e.g. legato). Computers have introduced the possibility to
produce widely varying timbres, in real-time, through the
exploration of complex parameter spaces. These parameter spaces have been referred to as 'timbre spaces' [2, 3].
On a traditional musical instrument, timbre manipulation is
directly related to articulation. With timbre spaces, however, any form of control interface can be designed since
the sound is produced digitally [4].
In modern audio production software environments and
graphical user interfaces (GUIs), control interfaces for the
Copyright: 2014 Sean Soraghan et al. This is an open-access article distributed
under the terms of the u re s, which
permits unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited.
exploration of timbre spaces invariably take the form of arrays of sliders and rotary knobs. This form of interaction
is sub-optimal and comes from a tendency towards skeuomorphism in interface design. 3D software environments
offer the opportunity to present timbre as a complex 3D
object, each of its visual features (e.g. brightness, texture)
representing a particular parameter of the timbre [5]. This
would facilitate intuitive exploration of the timbre space,
as the overall timbre would be represented visually as one
global object. Such 3D control environments would require the design of a mapping strategy such that timbre features are effectively and intuitively visualised to the user.
The aim of this study has therefore been to explore user
preferences for timbre-feature to visual-feature mappings.
Existing research into both acoustic descriptors and verbal
descriptors of timbre has been drawn upon in order to identify timbre-feature and visual-feature groups and explore
user preferences for mappings between the two. As will be
explored in the next section, existing research into audiovisual mappings has mainly focussed on static, 2D visual
stimuli and rarely concentrates on timbre. This study explores mappings in 3D visual space and is focussed on visual representations of timbre features.
2. RELATED WORK
Most of the previous research into audio-visual mappings
has found that users tend to pair colour and position with
pitch and volume, and pair timbre features with features of
texture/shape [6, 7, 8, 9].
Lipscomb and Kim conducted a user study that investigated the relationship between auditory and visual features of randomised audio-visual stimuli. As audio features they used pitch, loudness, timbre and duration. The
visual features they used were colour, vertical location,
size and shape [9].
Giannakis and Smith have carried out a number of studies
looking at auditory-visual perceptual correlates [10, 7, 11].
Most related to this study is their investigation into sound
synthesis based on auditory-visual associations [11]. In
that particular study they present a number of corresponding perceptual dimensions of musical timbre and visual
texture. Their study focusses on texture alone, however
it has been suggested that visual texture qualities are only
one type of semantic descriptor used to identify timbre
[12]. The present study therefore explores entire 3D structures and includes material properties such as reflectance
and transparency. These properties have been chosen in accordance with salient semantic timbre descriptors that have
- 586 -
0