ï~~
ILMLI2012NON-COCHLEAR SOUND
__ LJUBLJANA _9.-14. SEPTEMBER
VISUALIZATION OF PERCEPTUAL QUALITIES IN TEXTURAL
SOUNDS
Thomas Grill, Arthur Flexer
Austrian Research Institute for Artificial Intelligence (OFAI)
Vienna, Austria
{thomas. grill, arthur. flexer}@ofai. at
ABSTRACT
We describe a visualization strategy that is capable of efficiently representing relevant perceptual qualities of textural sounds. The general aim is to develop intuitive screenbased interfaces representing large collections of sounds,
where sound retrieval shall be much facilitated by the exploitation of cross-modal mechanisms of human perception. We propose the use of metaphoric sensory properties
that are shared between sounds and graphics, constructing
a meaningful mapping of auditory to visual dimensions.
For this purpose, we have implemented a visualization
using tiled maps, essentially combining low-dimensional
projection and iconic representation. To prove the suitability we show detailed results of experiments having
been conducted in the form of an online survey. Potential
future use in music creation is illustrated by a prototype
sound browser application.
1. INTRODUCTION
For music-making in the digital age, techniques for efficient navigation in the vast universe of digitally stored
sounds have become indispensable. These imply appropriate characterization, organization and visual representation of entire sound libraries and their individual elements. Widely used strategies of sound library organization include semantic tagging or various techniques from
the field of Music Information Retrieval (MIR) to automatically classify and cluster sounds according to certain
audio descriptors which characterize the signal content.
Moreover, there is a need for appropriate user interfaces
in order to browse through such libraries. Common interest lies especially in graphical, screen-based interfaces
that are capable of representing both the attributes of individual sounds as well as the structure of an entire collection of sounds. Such interfaces should allow users
to efficiently pinpoint a sound given some specifications
and also to learn about the context of a sound element,
e.g. which other sounds of the collection exhibit related
properties (see e.g. [20]). Sensory (as opposed to arbitrary, cf. [25]) properties that are aligned with human perception are most expedient, since they enable access without the necessity of learning.
In this paper, we outline and evaluate an implementation of a screen-based interface capable of representing
major perceptual qualities of sounds. We restrict our focus
to textural sounds, that is, sounds that appear as stationary
(in a statistical sense), as opposed to evolving over time.
This broad class of sounds of diverse natural or technical origin (cf. [23]) is interesting for electroacoustic composers, sound designers or electronic music performers
because of its neutrality and malleability, functioning as
source material for many forms of structural processing.
The structure of the paper is as follows: In Section 2
we contextualize our research endeavor and describe our
approach and the experimental setup. This is followed
by a detailed evaluation of our experimental results and a
prototype application example in Section 3. Finally, Section 4 concludes with a summary of the findings and possible implications for the future.
2. METHOD AND CONTEXT
2.1. Perceptual qualities of textural sounds
For the following, we refer to recent research of our
group in [12]. We have elicited a number of so called
personal constructs that are relevant to human listeners for the description and distinction of textural sounds.
More precisely, those constructs are group norms that are
shared by the range of persons - all trained listeners in
that case - who participated in the experiments. The
most significant constructs are listed in Table 1, sorted
by the degree of agreement among listeners. As can be
seen, each of the constructs is bipolar, spanning a continuous range from one extreme to the other. The construct natural-artificial is somewhat special as it does
not refer to an objective, potentially measurable quality of sound, but rather to the source of its production.
Since in parallel research we are mainly interested in automatically computable quantities we have not considered this construct for the present paper. Furthermore,
the obvious quality of loudness has been explicitly excluded since its perception is much more related to the
sound reproduction than it is an inherent sound property.
The listed qualities describe spectral/timbral (high-low,
tonal'-noisy) and structural/temporal (ordered-chaotic,
smooth-coarse, homogeneous-heterogeneous) aspects of
sound. Apart from the perceptual qualities proper, we can
I Tonal, as in tonal language is synonymous to pitched
589
0