Page  257 ï~~Loudness as a cue in distance perception Jan Chomyszyn CCRMA, Dept. of Music, Stanford University, StanfordCA 94305,USA:. email:jan @ Abstract: Common experience of many composers of computer music and sound engineers suggests an influence of room acoustic context on the ability of estimation of distance and loudness. This paper investigates mutual relationships between the perception of distance and loudness of musical sounds with different direct-to-reverberant ratios being the result of presentation of the sounds from different distances in a room. The creation of a proper auditory perspective remains an important factor of composing computer music. Composers often try to create an impression of a sound space for the composition, to differentiate sound sources as to their direction and distance. In many cases, however, there is no straighforward correspondence between physical parameters of sound that can be controlled and the aural effect achieved, because our perception involves many physical dimensions rather than a single one. Distance of sound origin is a typical example of such a complex phenomenon. This research addresses interrelationships between apparent distance of sound and its loudness as they are affected by the acoustics of the room. The idea expressed in (Chowning 1990) can be restated as follows: people tend to use all the cues created by an auditory perspective, not just intensity of the sound alone, to estimate its loudness. In making the decision as to loudness people use their everyday experience of room acoustics. Therefore in reverberant conditions a listener may overestimate a distant singer's loudness and underestimate a close one even though the intensity of both sounds at the listener's position is equal. Based on the judged effort of the produced sound and changes introduced by the room, the listener estimates rather how loud the sound would be at the source than at his/her position. In our research, stress was placed on the inverse question: is loudness a necessary cue to estimate apparent distance of sound? More precisely, we wanted to check whether distance judgements of sounds in a reverberant (room) environment would be still precise after their loudness had been matched and therefore eliminated as a cue. We also used the opportunity to check the ability of matching loudness of sounds presented at different distances in the room and the impact of changes introduced to the sounds by reverberation increasing along with distance on such matching. Several acoustic variables of sound are altered when physical distance between sound source and listener is changed. Among the most important factors for the relative estimation of the distance are direct-to- reflected sound energy ratio, first reflection time, sound intensity, binaural differences, and spectral changes, especially "roll-off" of high frequencies. All of the variables, including the intensity are influenced by the reverberation of the room, and some of them seem to be correlated. Change in intensity of sound is often considered an obvious cue for auditory depth. Some works including (Sheeline 1983), (Dodge 1985), (Mershon and King 1975) imply the importance of direct- to- reverberant sound ratio as the primary cue for distance judgement. In a real reverberant environment all the cues including loudness coexist, however, and the situation when one of the factors is removed is perceived as artificial. An incidental research has been done to find the importance of some of them on distance judgement, but our knowledge of this phenomenon is still rather deficient. Familiarity of the sound is regarded as an important factor for distance perception. As we believe that 257

Page  258 ï~~the judgement of distance is learned from our everyday experience and it is based on our knowledge of the response of a room to the sound produced, we decided to make an experiment in "natural" conditions rather than by using artificial reverberation and synthesized sounds or noise pulses. A violin sound (a440, with medium vibrato rate, and duration about 1.8 sec) was recorded in an anechoic chamber. This sound was then reproduced in a common rehearsal hall (the Braun rehearsal hall at Stanford) at different distances of 1,2,3,...,8 meters with a constant playback level and recorded with an artificial head at a constant recording level. During the recording the speaker was moved along the longer axis of the room, whereas thehead remained in the same position. The eight sounds were then edited (cut) after about 2.7 sec., ie. after the point at which the sound was below background noise level. The background noise was then filtered by applying the following procedure. Noise floor was first estimated, ie. given the noise recorded alone, its power spectrum was calculated. Next, the spectra of recorded stimuli were compared to the noise spectrum. For each FFT transform bin, if FFT amplitude fell below the noise threshold it was attenuated in a given (constant) ratio. This procedure was then repeated for successive time frames, and an envelope was applied to smooth the result. After the filtering, as timbral artifacts introduced were small in comparison to a great improvement of the quality of the sound, we decided the sounds were representative samples and reflected what takes place acoustically in a "real" room environment. The sounds were next presented in a formal listening test. In the test, the sounds were reproduced in pairs in a randomized order through headphones. Each pair was presented once, but the whole test was repeated twice (with a different pairs order) to check reliability of the subjects. The order of elements in each pair was also random. Only pairs of sounds recorded at different distances were included. In the first part of the experiment, subjects had to decide for each pair, which sound of the pair was closer to them. The result was scored as one if it corresponded with the physical distance of the sounds or as zero if it was a failure in this respect. Afterwards, the subjects were ordered to match the loudness of the sound they decided was closer to the loudness of the other (more distant) sound, which remained constant. They could adjust, the intensity of the sounds by using two buttons ("softer" and "louder") with a resolution of 0.5 dB. The amount by which the subject attenuated the stronger sound to achieve a satisfactory loudness match, expressed in dB, was stored to a disk file. Subjects had to answer the questions in alternation: ie. for each pair the loudness match followed the distance question. The attenuation factors were next used during the second stage of the experiment to reproduce equalized sounds the subjects had made. Given the equally loud sounds the subjects were asked to answer the distance question again, that is they were to estimate which sound in each pair was closer this time. The sounds in the pairs were randomized and appeared in different order than during the first stage. Performance at this stage should reflect the result of attempting to eliminate the loudness clue. Matching loudness of complex sounds is generally a difficult task, especially if the sounds differ considerably in other parameters such as timbre. Moreover, the ability of a person to match loudness of the sounds can only be estimated as a relative measure, because there is no "objective" criterion to fulfill. For this reason we required each subject to take the test twice. The reliability can be based on the consistency between the two measurements each subject provided, ie. on the ability of each person to repeat closely the measurement. The consistency can be formally expressed in terms of correlation: coefficient between these, measurements. We have taken 0.85 as the threshold for the reliability. The results show that 28 of 29 subjects repeated the loudness match with a very high correlation above this threshold, usually being equal or above 0.92. For the 28 results the average 258

Page  259 ï~~mean adjustment difference between the measurements was reasonably small: 1.065 dB, the average absolute difference from the mean as well as the average standard deviation were very small: 0.7 dB and 0.9 dB respectively. In comparing to the 0.5 dB adjustment step, the results prove that the subjects were able to repeat the results with a very high degree of accuracy, therefore we can conclude they had a very well established notion of loudness and their attempts at matching loudness was successful. In working out the results for distance we were concerned with the difference in percentage of correct answers between the two measurements in each phase of the experiment. The results show a very high percentage of correct answers - above 85% during the first phase on recorded sounds (with loudness cue present). The distance of recorded sounds was perceived clearly by the subjects. During the second phase of experiment the performance decreased. All the results but two in each measurement on equalized sounds lie above a chance answer (with a confidence level of 5%). Though the performance decreased during the second phase it was usually still high -over 80%. We also measured the percent of answers that were correct in both measurements, that is an answer for a particular pair of sounds was scored as correct only if it was correct in both measurements. With this criterion 21 of 29 subjects, ie. 72.4%, were still above chance rate. Intensity ratios were calculated for all the stimuli along the distance. As it is depicted on the scatter diagram on Fig 1. the adjustments follow closely intensity changes along the distance. The correlation coefficient equal to 0.973 calculated between the intensities and gain adjustments for each pair reveals a very close correspondence. This result seems to confirm the fact that on the whole, intensity is the main factor of the loudness judgement and that reverberation cues do not affect it strongly, if at all. We have to limit the scope of our statement to a middle size rehearsal room. On the other hand, the standard deviations from the mean adjustments increase for most distant sounds - see Fig 2. Sounds distant by 6 or 7 meters differ more than the closer ones, thus they can be simply more difficult to match. Nevertheless, the changes are the effect of the reverberant action of the room. Therefore one can not exclude the possibility that the increasing reverberation cues for most distant sounds misled subjects, who estimated the loudness of the source rather than at their position. This issue needs a closer focus and an explanation in another experiment, however. gain adj usraent di stance " 2 3 4 5 6 7 -:12.5 -10 -7.5 -5 -2.5 2.5 ntensitI-2 "-- 1 --8 --12 -:12-:I -10 -224 mean gain adjustmnent Fig.i Intensity ratos along distance versus gain Fig.2 Me an gain adjustnente and standard adjusfrnent (in dB) provided by subjec ts deviab~ons for distances During the first phase on non-equalized sounds the performance of distance judgement was extremely good, which is not a surprise. In this part we just wanted to check if distance of our recorded stimuli was perceived well, in other words we wanted to know if the stimuli were appropriate for the test. 259

Page  260 ï~~During the second phase, on equalized sounds, the performance decreased, however it was usually still far above the chance rate. The comparison of the performance in the first and second phases as a function of distance is depicted on Fig.3. For sounds close to each other the differences in reverberation cues were not strong enough to provide the relevant information, thus to allow the recognition of distance when the intensity cue was missing. Except sounds which differed by 1 m and 2 m the performance was usually very good. They are pretty well correlated, with the correlation coefficient of 0.85. The conclusion can be drawn, that except very, close sounds, the appropriate distance judgement can be still made even when the intensity cue is missing. Finally, Fig. 4 shows the scatter diagram of all the scores on the recorded sounds versus the scores on the equalized sounds... For each pair of the sounds, mean scores of the distance judgement over 28 subjects were matched and plotted with recorded sound scores (phase 1) on the x-axis and scores of equalized sounds on the y-axis. The points are distributed randomly with a rather small correlation. Clearly, the distance judgement scores of equalized sounds are independent on the distance scores of recorded sounds. Therefore we can conclude that loudness is not a necessary cue in estimating distance except close sounds. vr scorepas e 60 5555. 55 50" 504" 45 Â~ 40 40 35 35 30 0 1.. 4....... t5 2 5 3 5 4 5 5 56 57 58 h Fig 3. Ave rage scores at different distances on recorded Fig 4. Scores of recorded sounds versus scores of sounds (black dots) and equalized sounds (gray) equalized sounds. One implication of the conclusions is that computer music composers need not be overly concerned with keeping the loudness of the sounds they want to place far away in the proportion to the loudness of such sounds that are intended to be close. The distance should be perceived well if only the parameters specific for the reverberant environment ie. direct-to-reverberant sound ratio, quicker decaying of the high frequencies, and first reflection time will be controlled. Similarly, moderate rate of reverberation, if modelled with an intention to imitate some room impression, should not influence the loudness of produced sounds. References: J. Blauert, "Spatial Hearing", The MIT Press, Cambridge, Massachusetts, 1983. J. Chowning, "Music from Machines: Perceptual Fusion & Auditory Perspective - for Ligeti.", Center for Computer Research in Music and Acoustics, Report No. STAN-M-64, 1990. C. Dodge and T.Jerse, 'Computer Music",. Schirmer Books, New York, 1985. D.H. Mershon and J.N. Bowers, "Absolute and Relative Cues for the Auditory Perception of Egocentric Distance.", Perception,. v. 8 pp. 311-322, 1979. D.H. Mershon and L.E. King, 'lntensity and. Reverberation as Factors in the. Auditory Perception of Egocentric Distance", Perception and Psychophysics v. 18 pp. 409-415, 1975. C.W.Sheeline, "An Investigation of the Effects of Direct and Reverberant Signal Interactions on Auditory Distance Perception", CCRMA Department of Music Report No. STAN-M-13, 1983. 260