Page  00000001 Influence of Sensory Interactions between Vision and Audition on the Perceptual Characterisation of Room Acoustics Chrysanthy Nathanail (1) Catherine Lavandier (1) Jean-Dominique Polack (2) Olivier Warusfel (3) (1) IUT Ddp. Gdnie Civil, Universitd Cergy-Pontoise, Nathanail-Masson@msn.com & lavandie@u-Cergy.fr (2) Laboratoire d' Acoustique Musicale, Universitd Paris VI -CNRS UMR 9945-Ministdre de la Culture et de la Francophonie, polack@ccr.jussieu.fr (3) Ircam, warusfel@ircam.fr Abstract The present work aims at an auditory-visual subjective assessment of room acoustics quality. The influence of vision on auditory apparent distance was studied in a semantic absolute paradigm involving a simulation procedure with two independently controlled sound and visual artificial environments. 3-D conventional slides projection was used for vision while the sound stimuli were generated by a virtual room acoustics processor and were diffused in binaural format. The results show an enhancement of the auditory distance judgements with the depth of the visual stimulus. This trend seems to increase as the test proceeds giving rise to the hypothesis of a training effect. In order to validate these results two more experiments are currently being designed. 1 Introduction Various studies in subjective room acoustics prove that there exists a limited number of perceptual factors on which the auditors judge the acoustic quality of auditoria. These factors may be related to a number of objective indices measured on the room response [1-2]. A body of evidence suggests that the perceptual modalities of vision and audition are not two independent processes functioning in isolation, but they co-operate and influence one another in a complex relationship. The strength or even the actual appearance of interactions depends on the physical and cognitive compatibility between auditory and visual stimuli [3-6]. Visual information available to listenersspectators in concert-halls is therefore likely to interfere in the evaluation process of the acoustical quality. This seems particularly true for spatial subjective attributes of sound as is the apparent auditory distance. The work aims to study sensory interactions between auditory and visual perceptions in an attempt first to evaluate their impact on the subjective characterisation of room acoustics and second to refine the correlations between objective and perceptual parameters by taking into account visual information. A parallel aim is to discuss a methodology appropriate for testing audiovisual interactions in real or artificial environments. The paper describes three experiments designed to examine the influence of the visually perceived distance to a sound source in a concert hall on the apparent auditory distance. Results of the first experiment are presented here. 2 Experiment 1 2.1 Method In order to simulate the sound and visual conditions of a concert hall two independently controlled sound and visual artificial environments were created. They provide varying impressions of visual and auditory distances respectively. The visual environment is created by 3-D pictures

Page  00000002 obtained by the light polarisation method. They are large pictures (size = 175x115cm) projected from the rear on a screen in dark conditions. They are taken in a real concert hall at seating positions increasingly distant from the stage; they show a loudspeaker positioned at the center of the stage. The sound field, is created by the "Spatialisateur" a virtual room acoustics processor developed by Ircam & Espaces Nouveaux (France) [7-8]. The "Spatialisateur" can simulate the localisation of sound sources together with the room effect for arbitrary source and receiver positions. Although the signals are usually processed in real time, in the present experiment they have been processed and then recorded on a DAT tape. When the "Spatialisateur" is set-up for head-phones listening, it uses binaural technique in order to render 3-D sound effects. The six auditory stimuli used here, are obtained by the processing of an anechoic signal -a song of S.Vega- with six different "settings" of the Spatialisateur. These settings are either derived from an analysis/synthesis of the actual acoustical quality of the room presented on the screen or from arbitrary configurations chosen to drive the auditory apparent distance on the base of psycho-acoustic results. 2.2 Procedure The subjects are seated in real concert hall seats at a distance of 1,6m from the projection screen, within the test chamber. They are asked to judge the apparent auditory distance of the stimuli presented to them and answer orally using a numeric scale from 0 to 9 (0 denotes the shortest auditory impression while 9 the most distant). The two extreme stimuli Al and A6 are presented at the beginning of each test session as a reference of the range of variation. There is a 10 sec time delay between the off-set and the on-set of two successive stimuli. Twelve adults participated each in three tests: * One auditory unimodal control test, where no visual stimulus accompanies the presentation of the auditory sequence. The test is conducted in semi-dark conditions with the projection screen slightly illuminated from the back (visual condition 1). * Two auditory bimodal tests, where simultaneously with the presentation of the auditory sequence the image V1 -a close view of the concert hall stage- or the image V2 -a distant view of the concert hall stage- are projected on the screen (visual conditions 2 and 3 respectively). In visual conditions 2 and 3 subjects are asked to try to imagine themselves in the hall shown in the pictures but still to judge the apparent auditory distance. Each one of the three tests consists of three iterations of the six stimuli. The order of presentation of the stimuli within each iteration and within the global test were varied. All subjects participate to the auditory unimodal test first, in order to help clarify to them that the test is auditory and to serve as a training session for the other conditions. 2.3 Results/Discussion Mean values and standard deviations of the answers obtained for the three visual conditions are presented in Table 1. Figure 1 shows the graphical representation while the differences between the visual conditions are presented in Table 2. The stimuli are ordered according to the apparent auditory distance obtained in the unimodal test. Each value is the mean of 36 distance judgements (3 stimuli iterations x 12 subjects). V.Condition 1 V.Condition 2 V.Condition 3 Mean SD Mean SD Mean SD Al 0,56 0,97 0,50 0,77 0,92 1,18 A2 2,00 1,69 2,36 1,48 2,92 1,63 A3 4,17 2,02 4,72 1,58 5,36 1,81 A4 6,11 1,83 6,22 1,57 6,50 1,36 A5 5,42 1,93 5,97 1,99 5,83 1,98 A6 8,33 0,76 8,53 0,74 8,58 0,94 Table 1: Mean values and standard deviations of perceived auditory distance Apparent9 Auditory 8 Distance 7 Al A2 A3 A5 A4 A6 Auditory Stimuli -4-Visual Condition 1 -h-Visual Condition 2 -U-Visual Condition 3 4 3 2 0 f Al A2 A3 A5 A4 A6 Auditory Stimuli Figure 1: Auditory apparent distance for each auditory stimulus A1-A6 (mean values are obtained over stimuli iterations and subjects).

Page  00000003 Al A2 A3 A4 A5 A6 C2-C1 -0,06 0,36 0,56 0,11 0,56 0,19 C3-C1 0,36 0,92 1,19 0,39 0,42 0,25 C3-C2 0,42 0,56 0,64 0,28 -0,14 0,06 Table 2: Auditory distance differences between Visual Conditions 1, 2 and 3. The results seem to reveal a small but rather global increase in the auditory apparent distance between visual conditions 1, 2 and 3 with visual condition 1 giving the lowest auditory distance impression while visual condition 3 giving the highest (16 out of 18 difference values of Table 2 are positive). A three-way repeated measures analysis of variance (ANOVA) was performed on the answers of the subjects. ANOVA permits to decompose the total variance of a dependent variable into partial contributions due each to a predetermined experimental factor as well as to statistical interactions among them. Apparent auditory distance was the dependent variable, Sound Stimuli, Visual Conditions, and Iteration were the 3 factors. All significant effects revealed by the analysis are established at the 95% confidence level (p<0.05): Sound stimuli had a significant main effect [F(5,55) = 209], which was expected since the stimuli were constructed to give different impressions of auditory distance. Visual Conditions revealed also a significant main effect [F(2,22)= 4], indicating that the visual perception influence the auditory responses. Tukey HSD comparisons between the three visual conditions revealed a significant difference between visual conditions 1 and 3 (p<0.05) but failed to show such differences between visual conditions 1 and 2 or visual conditions 2 and 3. Iteration did not reveal a significant main effect but showed an interaction with Sound stimuli [F(10,110)= 2.35] indicating that the auditory stimuli were judged differently in the three iterations. This interaction and the fact that the mean differences between the three visual conditions were higher for the third stimuli iteration led to an analysis involving only those answers. The significant effects (p<0.05) were again Sound stimuli [F(5,55) = 122] and Visual Conditions [F(2,22)=6]. Tukey HSD comparisons for the three visual conditions revealed a significant difference between visual conditions 1 and 3 (p<0.05) while planned comparisons revealed a significant difference between visual conditions 2 and 3 [F(1,11)= 6.6]. Figure 2 shows this effect: the difference between visual conditions 2 and 3 increases at the third iteration of the stimuli (13). 5,2 5,0 4,8 4,6 4,4 4,2 Condition 3 dition 2 ndition 1 Il 12 13 Figure 2: Auditory apparent distance given for each of the three iterations II, 12, 13 of the stimuli and for the three Visual Conditions (mean values are obtained over auditory stimuli and subjects). As the tests proceed subjects seem to be more influenced by the visual distance of the loudspeaker and/or the depth of the visual image. Auditory distance judgements tend to be lower when the loudspeaker is seen close (visual condition 2) and higher when it is seen distant (visual condition 3). At the same time subjects become more consistent in their answers; the variability between subjects drops in the third iteration. This gives rise to a "training" hypothesis that is: during the experiments that last 30 to 45 min, the subjects become familiar with the test conditions, possibly associate more easily sound and image and tend to be influenced by the visual stimulus. Indeed, two of them reported spontaneously that towards the end of the experiments they could imagine themselves more easily in the room shown in the pictures. The differences between visual conditions 1 and 2, and between visual conditions 1 and 3 suggest that the presence of a 3-D picture enhances the perception of the apparent auditory distance. The visual stimulation seems to provide a spatial framework of reference for the auditory space. The results seem consistent with a number of findings from the domain of experimental psychology [9-10] concerning an improvement of the sound localisation performance in the presence of vision. These studies come to the conclusion that vision serves to organise auditory space and/or helps to retain spatial auditory memory. It is possible that the visual image helps to "exteriorise" the auditory image, which traditionally is considered to be restrained and localised at the head area when head-phones are used. However the use of the binaural technique in the present experiment should have limited this effect. Furthermore, as Visual Condition 1 was always presented at the beginning of the tests it is possible that

Page  00000004 subjects answered in a somehow 'conservative' manner underestimating the auditory distance. These conclusions need therefore careful checking in a new series of tests. 3 Experiments 2 and 3 Experiment 2 is designed to improve sound simulation quality. While experiment 1 uses the binaural sound diffusion technique over head-phones, experiment 2 uses the transaural technique over loud-speakers. Transaural listening has the advantage to minimise the front/back confusions (known to appear with headphones listening especially when individual HRTFs and head-tracking is not available) while preserving the perceptual characteristics of a diffuse sound field which is important to provide an envelopment impression of the room effect. Also, the fact that subjects are not wearing head-phones improves the realism of the simulation technique. New auditory stimuli were created by the "Spatialisateur"; they are based on measured impulse responses of the "Theatre des Champs Elysees" in Paris which is the hall shown in the pictures. A series of preliminary purely auditory tests helped to choose the anechoic signal used in the main tests. The "magnitude estimation method" is used for the answers of the subjects who are free to use a scale of their choice. This should allow a better assessment of the auditory perceived distance especially for the extreme auditory stimuli. Experiment 3 investigates the hypothesis of the "training" process. A large number of iterations of the stimuli is presented to a group of subjects and the results should be able to answer on such interaction mechanisms. The experiment uses also the transaural sound diffusion technique and the "magnitude estimation method" for the answers. 4 Conclusions A highly controlled experimental procedure was designed to test audio-visual interactions in complex environments. It reveals a small offset type influence of the visually perceived distance on the apparent auditory source distance. This effect is significant only for the third and last iteration of the sound stimuli, indicating that, as they get familiar with the test conditions, subjects associate more easily sound and image. Furthermore the significant auditory distance differences between the unimodal test and the bimodal ones seem to support findings of the Experimental Psychology concerning the facilitation of auditory localisation tasks in the presence of vision. For this purpose, new experiments with further controlled sound stimuli and transaural sound reproduction are designed and should be able to answer questions arising from the present results. References [1] Kahle, E. 1995. "Validation d'un modele objectif de la perception de la qualite acoustique dans un ensemble de salles de concerts et d'operas". These de Doctorat, Universite du Maine. [2] Lavandier, C. 1989. "Validation perceptive d'un modele objectif de la characterisation de la qualite des salles". These de Doctorat, Universite du Maine. [3] Woszczyk, W., Bech, S. and Hansen, V. 1995. "Interactions between audio-visual factors in a home theater system: Definition of subjective attributes". 99th AES Convention, New-York, October 1995. [4] Warren, D.H. 1979. "Spatial localisation under conflict situations: Is there a single explanation?". Perception, 8, pp. 323-337 [5] Warren, D.H., Welch, R.B. and McCarthy, T.J. 1981."The role of visual-auditory "compellingness" in the ventriloquism effect: Implications for transitivity among the spatial senses". Perception and Psychophysics, 30(6), pp. 557-564. [6] Ragot, R., Cave, C. and Fano, M. 1988. "Reciprocal effects of visual and auditory stimuli in a spatial compatibility situation". Bulletin of the Psychonomic Society, 26(4), pp. 350-352. [7] Jot, J.M, V. Larcher and 0. Warusfel 1995. "Digital signal processing issues in the contex of binaural and transaural stereophony". Proc. 98th Cony. AES preprint 3980. [8] Jot,, J.M. 1996. "Synthesizing three dimensional sound scenes in audia or multimedia production and interactive human computer interfaces". 5th International Conference " Interface to real & virtual worlds", Montpellier, 21-24 May 1996. [9] Shelton, B. R., and C. L. Searle 1980. "The influence of vision on the absolute identification of sound source position". Perception & Psychophysics, 28(6), pp. 589-596. [10] Warren, D. H. 1970. "Intermodal interactions in spatial localization". Cognitive Psychology, 1, pp.114-133.