Animating Timbre - A User Study

Soraghan, Sean

~Proceedings ICMCISMCI2014 14-20 September 2014, Athens, Greece Animating Timbre - A User Study Sean Soraghan ROLI Centre for Digital Entertainment sean@roli. com ABSTRACT The visualisation of musical timbre requires an effective mapping strategy. Auditory-visual perceptual correlates can be exploited to design appropriate mapping strategies. Various acoustic descriptors and verbal descriptors of timbre have been identified in the psychoacoustic literature. The studies suggest that the verbal descriptors of timbre usually refer to material properties of physical objects. Thus, a study was conducted to investigate the visualisation of acoustic timbre features using various visual features of a 3D rendered object. Participants were given coupled auditory-visual stimulations and asked to indicate their preferences. The first experiment involved participants rating audio-visual mappings in isolation. The second experiment involved participants observing multiple parameters at once and choosing an 'optimal' mapping strategy. The results of the first experiment suggest agreement on preferred mappings in the isolated case. The results of the second experiment suggest both that individual preferences change when multiple parameters are varied, and that there is no general consensus on preferred mappings in the multivariate case. 1. INTRODUCTION Timbre is a complex and multi-dimensional attribute of audio. It has been defined as the perceptual attribute of audio by which two sounds with identical pitch, loudness and duration can be discriminated [1]. Before the introduction and popularisation of the computer, the easiest way to produce differences in timbre was through varying instrumentation or articulation. Musical scores therefore elicit changes in timbre by using various articulation indicators (e.g. legato). Computers have introduced the possibility to produce widely varying timbres, in real-time, through the exploration of complex parameter spaces. These parameter spaces have been referred to as 'timbre spaces' [2, 3]. On a traditional musical instrument, timbre manipulation is directly related to articulation. With timbre spaces, however, any form of control interface can be designed since the sound is produced digitally [4]. In modern audio production software environments and graphical user interfaces (GUIs), control interfaces for the Copyright: 2014 Sean Soraghan et al. This is an open-access article distributed under the terms of the u re s, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. exploration of timbre spaces invariably take the form of arrays of sliders and rotary knobs. This form of interaction is sub-optimal and comes from a tendency towards skeuomorphism in interface design. 3D software environments offer the opportunity to present timbre as a complex 3D object, each of its visual features (e.g. brightness, texture) representing a particular parameter of the timbre [5]. This would facilitate intuitive exploration of the timbre space, as the overall timbre would be represented visually as one global object. Such 3D control environments would require the design of a mapping strategy such that timbre features are effectively and intuitively visualised to the user. The aim of this study has therefore been to explore user preferences for timbre-feature to visual-feature mappings. Existing research into both acoustic descriptors and verbal descriptors of timbre has been drawn upon in order to identify timbre-feature and visual-feature groups and explore user preferences for mappings between the two. As will be explored in the next section, existing research into audiovisual mappings has mainly focussed on static, 2D visual stimuli and rarely concentrates on timbre. This study explores mappings in 3D visual space and is focussed on visual representations of timbre features. 2. RELATED WORK Most of the previous research into audio-visual mappings has found that users tend to pair colour and position with pitch and volume, and pair timbre features with features of texture/shape [6, 7, 8, 9]. Lipscomb and Kim conducted a user study that investigated the relationship between auditory and visual features of randomised audio-visual stimuli. As audio features they used pitch, loudness, timbre and duration. The visual features they used were colour, vertical location, size and shape [9]. Giannakis and Smith have carried out a number of studies looking at auditory-visual perceptual correlates [10, 7, 11]. Most related to this study is their investigation into sound synthesis based on auditory-visual associations [11]. In that particular study they present a number of corresponding perceptual dimensions of musical timbre and visual texture. Their study focusses on texture alone, however it has been suggested that visual texture qualities are only one type of semantic descriptor used to identify timbre [12]. The present study therefore explores entire 3D structures and includes material properties such as reflectance and transparency. These properties have been chosen in accordance with salient semantic timbre descriptors that have - 586 - 0