ï~~ ILMLI2012NON-COCHLEAR SOUND __ LJUBLJANA _9.-14. SEPTEMBER VISUALIZATION OF PERCEPTUAL QUALITIES IN TEXTURAL SOUNDS Thomas Grill, Arthur Flexer Austrian Research Institute for Artificial Intelligence (OFAI) Vienna, Austria {thomas. grill, arthur. flexer}@ofai. at ABSTRACT We describe a visualization strategy that is capable of efficiently representing relevant perceptual qualities of textural sounds. The general aim is to develop intuitive screenbased interfaces representing large collections of sounds, where sound retrieval shall be much facilitated by the exploitation of cross-modal mechanisms of human perception. We propose the use of metaphoric sensory properties that are shared between sounds and graphics, constructing a meaningful mapping of auditory to visual dimensions. For this purpose, we have implemented a visualization using tiled maps, essentially combining low-dimensional projection and iconic representation. To prove the suitability we show detailed results of experiments having been conducted in the form of an online survey. Potential future use in music creation is illustrated by a prototype sound browser application. 1. INTRODUCTION For music-making in the digital age, techniques for efficient navigation in the vast universe of digitally stored sounds have become indispensable. These imply appropriate characterization, organization and visual representation of entire sound libraries and their individual elements. Widely used strategies of sound library organization include semantic tagging or various techniques from the field of Music Information Retrieval (MIR) to automatically classify and cluster sounds according to certain audio descriptors which characterize the signal content. Moreover, there is a need for appropriate user interfaces in order to browse through such libraries. Common interest lies especially in graphical, screen-based interfaces that are capable of representing both the attributes of individual sounds as well as the structure of an entire collection of sounds. Such interfaces should allow users to efficiently pinpoint a sound given some specifications and also to learn about the context of a sound element, e.g. which other sounds of the collection exhibit related properties (see e.g. [20]). Sensory (as opposed to arbitrary, cf. [25]) properties that are aligned with human perception are most expedient, since they enable access without the necessity of learning. In this paper, we outline and evaluate an implementation of a screen-based interface capable of representing major perceptual qualities of sounds. We restrict our focus to textural sounds, that is, sounds that appear as stationary (in a statistical sense), as opposed to evolving over time. This broad class of sounds of diverse natural or technical origin (cf. [23]) is interesting for electroacoustic composers, sound designers or electronic music performers because of its neutrality and malleability, functioning as source material for many forms of structural processing. The structure of the paper is as follows: In Section 2 we contextualize our research endeavor and describe our approach and the experimental setup. This is followed by a detailed evaluation of our experimental results and a prototype application example in Section 3. Finally, Section 4 concludes with a summary of the findings and possible implications for the future. 2. METHOD AND CONTEXT 2.1. Perceptual qualities of textural sounds For the following, we refer to recent research of our group in [12]. We have elicited a number of so called personal constructs that are relevant to human listeners for the description and distinction of textural sounds. More precisely, those constructs are group norms that are shared by the range of persons - all trained listeners in that case - who participated in the experiments. The most significant constructs are listed in Table 1, sorted by the degree of agreement among listeners. As can be seen, each of the constructs is bipolar, spanning a continuous range from one extreme to the other. The construct natural-artificial is somewhat special as it does not refer to an objective, potentially measurable quality of sound, but rather to the source of its production. Since in parallel research we are mainly interested in automatically computable quantities we have not considered this construct for the present paper. Furthermore, the obvious quality of loudness has been explicitly excluded since its perception is much more related to the sound reproduction than it is an inherent sound property. The listed qualities describe spectral/timbral (high-low, tonal'-noisy) and structural/temporal (ordered-chaotic, smooth-coarse, homogeneous-heterogeneous) aspects of sound. Apart from the perceptual qualities proper, we can I Tonal, as in tonal language is synonymous to pitched 589 0
Top of page Top of page