The Effects of Multi-Channel Signal Decorrelation in Audio Reproduction

Kendall, Gary

PDF
Print
Share+
- Twitter
- Facebook
- Reddit
- Mendeley

The Effects of Multi-Channel Signal Decorrelation in Audio Reproduction

Kendall, Gary

Volume 1994, 1994

Permalink: http://hdl.handle.net/2027/spo.bbp2372.1994.081

Permissions: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact [email protected] to use this work in a way not covered by the license.

For more information, read Michigan Publishing's access and usage policy.

Page 319 ï~~The Effects of Multi-Channel Signal Decorrelation in Audio Reproduction Gary S. Kendall Center for Music Technology School of Music Northwestern University [email protected] Keywords: Audio Signal Processing Techniques, Psychoacoustics, Decorrelation, Image Shift, Precedence Effect I. Introduction There are numerous ways in which naturally occurring acoustic signals or electronically processed audio signals become decorrelated. Usually this decorrelation is a by-product of other acoustic or electronic processes like chorusing or reverberation. Why then focus on decorrelation as a viewpoint on these kinds of effects? In the field of spatial hearing, signal decorrelation has dramatic impact on the perception of sound imagery and the correlation measure has proven to be significant predictor of perceptual effects both in natural environments and in audio reproduction. In the discussion that follows, a particular signal-processing technique is described that enables the user to design multiple FIR (finite impulse response) filters which will produce replicas of a source signal that are timbrally invariant with the source signal and which are nonetheless decorrelated. The choice of filter coefficients enables the user to set the level of correlation between any two audio channels in a continuous range from one to zero to minus one. This technique is also useful in producing an unlimited number of signals with near zero correlation. While the decorrelation technique usually has little impact on timbre, it does has significant impact on spatial perception and has been found to be especially useful in audio reproduction. There are at least five effects of decorrelation in audio reproduction which can be itemized as follows: 1. the timbral coloration and combing associated with constructive and destructive interference of multiple delayed signals is perceptually eliminated, 2. decorrelated channels of sound produce diffuse sound fields akin to the late field of reverberant concert halls without actually adding reverberation to the original signal, 3. decorrelatedchannels produce externalization inheadphone reproduction, 4. the position of the sound field does not undergoes image shift with changes in the position of the listener relative to stereo loudspeakers, and 5. the precedence effect which inhibits the perception of delayed sound is defeated enabling one to present the same sound signal from multiple loudspeakers without the collapse of the sound image into the nearest loudspeaker. II Special Acknowledgments Much of the key work described here was performed by Marty Wilde, William Martens, and myself in close collaboration both at Northwestern University and at Auris Corporation. The purpose of tis paper is to survey and summarize work that has not been reported in print so that others may use these techniques and make further contributions to our community's knowledge. (Commercial application of this work is covered by U.S. patent #5,235,646 assigned to Northwestern University and U.S. patent #5,121,433 assigned to Auris Corporation). ICMC Proceedings 1994 319 Audio Signal Processing

Page 320 ï~~IL Technique for Creating Decorrelated Signals Definition of correlation measure. The correlation measure of two signals, y1(t) and y2(t), can be determined by computing the cross-correlation function, 2(At): +T Q(At) = lim 1 f Y1(t) y2(t + At) dt. T ->o 2T -T For the purposes of most discussions, the correlation measure (also called the cross-correlation coefficient) is expressed as a single number and is taken to be the value of the peak in the cross-correlation function with the greatest absolute value. If yl(t) and y2(t) are identical, they will have the highest possible positive correlation measure, + 1.0. If yjl(t) and Y2(t) are identical except for being 180-degrees out of phase, they will have the highest possible negative correlation measure, -1.0. If y1(t) and y2(t) are very dissimilar, they are said to be "uncorrelated" and their correlation measure will approach zero. Converting input to decorrelated output. In the most basic audio applications of deconvolution, the input will be a monophonic signal (or a multi-channel signal summed to form a monophonic input signal). The user will specify the correlation measure for each pair of output signals in a range from 1. through minus 1. For many applications, the optimal correlation measure is zero and for some of these applications there will be multiple decorrelated output channels. In order to produce a pair of output signals with a specified correlation measure, the input can be convolved with each of two exemplar signals that themselves are correlated at the specified level. This convolution operation itself can be re-envisioned as an FIR filter and the exemplar signals as the filter's coefficients. A illustration of this technique for use with a monophonic input and stereo output is shown in Figure 1. The digital input signal, x(nT), is applied to a pair of FIR filters with coefficient sequences, h 1(nT) and h2(nT). The output of the FIR filters, y 1(nT) and Y2(nT), is the convolution (denoted by "* ") of the input signal with each coefficient sequence: Y1(nT) = x(nT) * h1(nT) and Y2(nT) = x(nT) * h2(nT). Convolution of x(nT) Iyl(nT) with hl(nT) " Monophonic Convolution of x(nT) y2(nT) Input Signal with h2(nT) Monophonic Convolution of x(nT) yl(T Input Signal with hl(nT) Sterophnic x(nT) O N Channels of Stereophonic 0 Output Signals x(nT) Output Signals 0 Convolution of x(nT) y2(nT) Convolution of x(nT) yN(nT) with h2(nT) with hN(nT) Figure 1. Decorrelation with a monophonic Figure 2. Decorrelation with monophonic input and stereo output. input and multiple output channels. Building a library of paired FIR filter coefficients. The correlation measure of the two output signals is determined by the pair of FIR filter coefficients. In order to provide a complete range of correlation measures, a library of coefficients.for the paired filters must be created. The filter coefficients are computed from a frequency domain specification of both magnitude and phase via the inverse Fourier transform. The magnitude part of the specification will be set to unity across all frequencies, that is, all of the FIR filters will be all-pass. The phase part of the specification will be constructed from combinations of random number sequences. The resulting correlation measures are determined only by the phase information and, in fact, the entire range of possible correlation measures is attained merely through inter-channel phase manipulations! Output signals will be timbrally invariant with the original input signals because the filters are all-pass and because single-channel phase changes do not impact on the perception of timbre (with the exception of a few special circumstances). Inter-channel phase relationships though are very important in spatial hearing and audio reproduction. Audio Signal Processing 320 ICMC Proceedings 1994

Page 321 ï~~In order to construct a complete library of paired filter coefficients, we start with two independent random number sequences (A and B) whose amplitude values are scaled to the range of +n to -i. The process of creating the library varies with the specified correlation measure, 9'. The library can be constructed in the following steps (as paraphrased from [Wilde, et al. 1989]): 1) S2' = + 1. Only one of the A or B scaled sequences is used as the phase specification for both the left and right channel. Coupled with each of these identical left and right channel phase specifications is a unity magnitude specification. Each of these two pairs of magnitude and phase specifications in the frequency domain is then inverse Fourier transformed generating the two sets of FIR coefficients, hl(nT) and h2(nT). The correlation measure of hl(nT) and h2(nT) is + 1. 2) S2' = -1. One of either the A or B sequences is again used as the phase specification for the left channel while n is added to that same sequence and used as the right channel phase specification. These two phase specifications are each coupled with a unity magnitude, and each of these two pairs of magnitude and phase specifications are inverse Fourier transformed to create the two sets of FIR coefficients, hl(nT) and h2(nT). The correlation measure of h l(nT) and h2(nT) is -1. 3) S2' = 0. The scaled A sequence is the left channel phase specification and the scaled B sequence is the right channel specification. Coupling each of those sequences with a unity magnitude spectrum and inverse Fourier transforming both pairs of frequency domain specifications yields the two sets of coefficients. The correlation measure of hl(nT) and h2(nT) is very near to0. For all remaining correlation levels, the two scaled sequences A and B are first summed to form the left channel phase specification. (Note that if a phase value should exceed trd, it is "wrapped" back around into the range +/- n.) 4)_0< Q2' < 1. Sequence A is weighted by a scaling coefficient, k, before being summed with the full-scale B sequence to become the right phase specification. The value of k is limited to the range 0 < k < 1, and is dependent upon the desired output correlation level. Coupling these two phase specifications with unity magnitude spectra, and inverse Fourier transforming each of the pairs of frequency domain sequences yields two sets of coefficients. The correlation measure of hl(nT) and h2(nT) is very near to + k. 5) 0 > S2' > -1. Sequence A is again weighted by some scaling coefficient, k. But unlike the procedure for positive correlation levels, 7n is added to the A sequence before being summed with the B sequence to become the right phase specification. Coupling these two phase specifications with unity magnitude spectra and inverse Fourier transforming both pairs of the frequency domain specifications yields two sets of coefficients. The correlation measure of hl(nT) and h2(nT) is very near to - k. Building a library of multi-channel FIR filter coefficients with correlation measures near zero. It is easy to see from the above description that three or more channels with correlation measures near zero are easily constructed as in step 3 by the use of three or more independent random number sequences. A library that can support N channels of output will contain coefficients for N filters. The library is created by starting with N independentrandomnumbersequences (A1, A2,Â~..., AN) whose amplitude values are scaled to the range of +n to -a. The scaled A 1 sequence is the phase specification for the first filter, the scaled A2 sequence is the phase specification for the second filter, and soon. Coupling each of those sequences with a unity magnitude spectrum and inverse Fourier transforming both magnitude and phase specifications yields N sets of coefficients, h 1(nT) through hN(nT). A illustration of this technique for use with a monophonic input and multiple output channels is shown in Figure 2. The output of the FIR filters, y 1(nT) through yN(nT), is the convolution of the input signal with each coefficientsequence: Y(nT)=x(nT) *h1(nT)through YN(nT) = x(nT) * hN(nT). The correlation measure of the output of any pair of filters will be very near to zero. Limitations and perceptual concerns. This particular method for constructing the libraries of coefficients attempts to avoid any alterations in the timbre of input sound by maintaining constant magnitude across frequency. This is not as easy as it first appar. The points speified in the frequency domain for magnitude and phase are ICMC Proceedings 1994 321 Audio Signal Processing

Page 322 ï~~linearly spaced in frequency and the magnitude spectrum that results from using the inverse Fast Fourier Transform to produce the FIR coefficients will not be constant in between the specified frequency points. Therefore, one expects that timbral neutrality is improved by specifying a higher number of points and producing a higher number of coefficients. However, the number of coefficients is approximately twice the number of points specified in the frequency domain and the temporal duration of the filter's impulse response must be shorter than around 20 msec. in order to avoid diffusion in the time domain which smears the transient properties of the input signal. Timbre obviously depends on the temporal aspects of the source signal as well as the spectral. Consider too that the magnitude of the potential phase shift on low-frequency components of the input signal is diminished by decreasing the number of coefficients. Consequently, for any given sampling rate, there is a tradeoff between timbral neutrality and the saliency at low frequencies. Practical experience has shown that sound sources that contain significant transient information (like speech) must be processed with fewer coefficients. Experience has also shown that timbral coloration is less noticeable when applied to the individual tracks rather than to an entire mix. (An alternative method for constructing the frequency domain specification in terms of auditory critical bands was found to improve timbral neutrality, but a complete description is beyond the scope of this paper.) Another limitation on the filter design is that the finite length of the random number sequences causes the match of the prescribed correlation measure to that measured with the output signals to be imprecise. A practical solution was found by generating several candidate filter pairs with different root random number sequences and selecting the pairs that produced the best match to the prescribed correlation measures. Then too, when the input is processed so as to create a correlation measure near zero or within the range between positive.4 to negative.4, the actual crosscorrelation function may exhibit positive and negative peaks with similar absolute magnitude. The auditory system does not discriminate very well among correlation measures near zero and so the variance between prescribed and measured correlation is of little consequence. The auditory system easily discriminates among correlation measures near positive and negative one and here the match between prescribed and measured correlation is quite good. IV. The Effects of Multi-Channel Signal Decorrelation in Audio Reproduction A. Effect #1: Elimination of the perception of constructive and destructive interference Constructive and destructive interference may affect listening in a variety of audio circumstances. In room acoustics, strong reflection paths often lead to interference patterns that are perceived as part of the acoustic character of a room. In sound reinforcement, multiple loudspeakers and loudspeaker stacks create interference patterns that can be heard especially clearly when.the listener is moving in relationship to the loudspeakers. In both these cases, acoustic waves of a single sound source (whether acoustic or recorded) arrive at different times and with different intensities. The composite magnitude spectrum will exhibit spectral peaks and notches that result from the constructive and destructive interference of the acoustic waves. The frequency of these peaks and notches is dependent on the difference in arrival times of the acoustic signals at the measurement position. When the arrival of these multiple acoustic waves is integrated into a single perceptual event by the listener, the acoustic constructive and destructive interference gives rise to two interrelated perceptual qualities, "coloration" and "combing". Although these terms are often used by professionals with a variety of meanings, "coloration" will be used here to mean changes in the overall perceived spectral shape or equalization of a sound and "combing" will be used to mean the induction of a pitch percept by the delay of a replicant signal. An important constraint on the detection of changes in coloration is that spectral alterations will only be detected when the average across critical bands exceeds a differential threshold. That is say that there have to be significant changes in the smoothed spectral envelope. On the other hand, the perception of combing seems to result from the auditory system's particular proficiency at picking up the temporal periodicity between the original and delayed signals from which it creates a pitch percept. Both coloration and combing can be eliminated when the delayed signals are decorrelated from the leading signal. When the decorrelated signal has random phase changes spaced more closely than critical bands, the resulting, composite magnitude spectrum will exhibit spectral peaks and notches which are narrower than a critical band and the smoothed spectral envelope is much more likely to retain its original shape. Combing itself is impossible with decorrelated signals because the decorrelated signal is smeared in time and the temporal periodicity between the original and delayed signal varies with frequency. It is interesting to note that acoustic constructive and destructive interference still is present, but the perceptual effects are eliminated. Implementation of the decorrelation can follow Figure 1 for stereo loudspeaker reproduction and FHgure 2 for multiple loudspeaker reproduction. Audio Signal Processing 322 ICMC Proceedings 1994

Page 323 ï~~B. Effect #2: Creation of diffuse sound fields without reverberation Diffuse reverberant sound fields are one of the most important features of concert hall acoustics. The perceived quality of spatial diffuseness is strongly correlated to the "interaural cross correlation" (or IACC), a statistical measure of the similarity of the acoustic signals arriving at the left and right ears of a listener in the concert hall. A low IACC is strongly correlated to the desired sound quality of "spaciousness" ([Schroeder, et. al, 1974] and [Ando, 1977]). For the sound reaching the listener directly from the stage, the IACC will be close to + 1, meaning that the signals are highly similar (though not identical due to the asymmetry of head acoustics). For the sound reaching the concert hall listener during reverberation, the IACC will approach zero, meaning that the sound reaching the left and right ears with a separation of just nine inches is uncorrelated! In fact, almost any point to point measurement inside the hall would yield similar results. And although the reverberant sound is uncorrelated, it is still clearly from the same source! The impact of the decorrelation is that the sound image does not appear to emanate from any one direction. The most commonly used signal-processing technique for creating a diffuse sound field is multi-channel reverberation which mimics the acoustics of a concert hall. Although a spatially diffuse soundfield occurs naturally only in the context of reverberation, decorrelation makes it possible through electronic means to create a spatially diffuse soundfield without reverberation. For reproduction over loudspeakers, the diffuse soundfield is perceived as emanating broadly from around the listener. (A complete surrounding image including the rear will only occur when the listener is very close to the loudspeakers.) An important insight into the impact of IACC on stereo loudspeaker reproduction was supplied by Kurozumi and Ohgushi [1983] who demonstrated that the cross-correlation coefficient of two noise signals presented to listeners over stereo loudspeakers was strongly correlated with two perceptual dimensions---image width and image distance. Image distance is correlated to the value of the cross-correlation coefficient; image width is inversely correlated to the absolute value of the cross-correlation coefficient. For example, the widest image occurs when the cross-correlation coefficient is close to zero; this image is also at a medium distance. The closest sound image occurs when the crosscorrelation coefficient is -1., but this also creates a narrow image. In addition, Kurozumi and Ohgushi found that the absolute effect of cross-correlation coefficient is greater for low frequencies (below 1 KHz) than for high frequencies (above 3 KHz). This was the starting point for a study by Wilde [1989] in which he demonstrated that the image width and image distance of decorrelated sound sources was essentially the same as Kurozumi and Ohgushi had found for noise sources. Fig. 3 represents a multi-dimensional scaling solution from pooled subject data for a string quartet chord with correlation levds ranging from + 1 to -1. in.2-increment steps. Dimension 1 captures image distance and dimensionm2 captures image width. Thus, following the implementation of Figure 1 and selecting the correlation measure associated with each pair of filter coefficients in the library will determine the width and distance of a diffuse sound field. 1'0.2 Figure 3. Two-dimensional MDS solution - 21 of all pooled subject data fro string quartet chord at correlation levels from + 1. to -1. in.2 increments. Dimension I captured image distance and dimension 2 captures image width [Wilde, 1989].... ___ ___ _,___ ___ _, _ __ '-Is -a.43 0.5 A C. Effect #3: Externalization of sound in headphone reproduction In typical headphone reproduction, sound sources are perceived as originating inside the listener's head. Even very low levels of decorrelation cause these sound images to becomne externalized over headphones and to have a much more natural image quality. Thus, following the implementation of Figure 1 and selecting a correlation measure in the range between + 1 and -l1 will cause the headphone images to become externalized. ICMC Proceedings 1994 323 Audio Signal Processing

Page 324 ï~~D. Effect #4: Reduction of image shift of diffuse sound fields The effect of the surrounding diffuse soundfield is not limited to listening positions on the center line equi-distant from the pair of loudspeakers. The effect is salient for listening positions at the extremes of the loudspeaker coverage. Kendall and Wilde [1989] report an experiment that used a combination of time delays and level differences typical of stereo reproduction in a small room to capture the threshold for the collapse of the sound image into one loudspeaker for both correlated and decorrelated sound sources. The threshold is a function of both time delay and level difference. In order to relate these along a single continuum, they chose the sequence of simulated listening locations in a small room illustrated in Figure 4. The resulting time delay and level difference pairing are a subset of all the pairing in the room and a subset of those in any larger room. 1 2 3 4 5 6 7 mN pmno t.O".. 0.17 -0.1.uaneI" 0.30 -0.2 0.45.0.3 1 2 3s 4 5 6 7 one 0.2 -0.7 m --e.. a -07OO105 -1.0 *md'lI 1.32 -1.3 Uo f I-C" 1.63 -1.6 2.00 -2.4 t 2 3 a S 7 -s1c 2.96 -4.4. I 357 -6.1 -\I I I - 5.10 -11.6 1 2 3 S s 7 moC 6.03 -17.g _______n_,_ I, 7.06 - _ _no......... _ 6 eL tetweecl uteIoudOseakers wm Ovanet 1. '.... Figure 4. Simulated listemer locations Figure 5. Thresholds for collapse of sound into with associated time delays and intensity oneloudspeaker. squares-correlated, circesdifferences [Kendall and Wilde, 1989]. uncorrelated [Kendall and Wilde, 1989]. The stimuli were constructed from four monophonic sound sources which varied considerably in their transient and sustained qualities: snare (single snare drm hit without reverberation), piano (single chord in the middle register), speech (recording of the sentence "I'm Batman"), and quartet (single sustained chord extracted from a CD recording of a Beethoven String Quartet). The stimuli were the original monophonic and decorrelated stereo versions of these sources to which a time delay and level difference hadbeen added to one channel. Stimuli were equalized beforehand for overall level differences. Subjects were seated in a small listening environment which had sound absorption that removed early reflections from the walls near the loudspeakers [Kendall et al., 1990]. Subjects were asked whether the sound image was primarily located in one loudspeaker or not. The experiment was run as an adaptive twoalternative forced choice method. Two randomly-interleaved staircases tracked independent estimates of the point at which the subject gave a 50% "one" response. The results are shown in Fig. 5. Subjects obviously used different criteria to judge this threshold. Regardless of the criterion used, subjects judged the decorrelated stimuli as collapsing into a single loudspeaker at much farther off-center than the original stimuli. E. Effect #5: Elimination of the precedence effect Also called "the law of the first wavefront" and the "Haas effect," the "precedence effect" is the phenomenon in which a sound source in a natural environment is localized at the original source location while its reflected sound is ignored. The effect is particularly relevant for transient sounds. The precedence effect has typically been studied by delaying one sound source relative to another when reproduced with two loudspeakers. The effect became most familiar through the papers of Haas [1951] and Wallach, Newman and Rosenzweig [1949]. A description of recent models of the precedence effect given by Rakerd and Hartmann [1985] states that it is a result of "A neural inhibition process which prevents the processing of binaural difference following an onset. There are indications that this inhibition is quite general,... there is some release from this binaural inhibition after approximately 10 r6s and almost complete release within 50 ms [Zurek, 1980]." While most discussions of the precedence effect relate it to the perception of reflected sound in natural environments, it is also a key factor in the perception of sound imagery over loudspeakers. In fact, the auditory system "interprets" loudspeaker reproduction in exactly the same way that it does environmental sound. This is most clearly illustrated Audio Signal Processing 324 ICMC Proceedings 1994

Page 325 ï~~when the auditory system inhibits the perception of sound arriving from a second, more distant loudspeaker. Our understanding of the conditions under which precedence operates is complicated by two factors. The first is that the precedence effect is more pronounced for transient sound sources such as struck or plucked musical instruments than it is for continuous sound sources such as blown or bowed musical instruments. The second complicating factor is that differences in arrival time are accompanied by a differences in intensity and that in any reproduction environment the ratio between time delay and intensity difference varies tremendously across the range of potential listening positions. Kendall and Wilde [1989] report a study that compares listener's judgments of correlated and decorrelated sound sources under conditions that would generally invoke the precedence effect Subjects were asked to rate on a 10-point scale the degree to which sound images were collapsed into one loudspeaker. The stimuli were based on the same four monophonic sound sources for the experiment reported above under "Effect #4". In this experiment, the simulated listener locations were distributed widely across a 50 x 100 foot area as shown in Figure 6. Each location represents a unique combination of delay time and level differences that could be anticipated to occur in a practical reproduction setting. Time delays range from 2 to 23 msec. Seating locations are identified with letters moving alphabetically from the center of the room toward the outside wall. Figures 7a and 7b show averaged ratings from all subjects for the piano and the quartet stimuli respectively. The lower-case and upper-case letters are associated with each seating location shown in Figure 6 and represent responses for correlated and uncorrelated stimuli respectively. These responses are somewhat scattered due to the variations in intensity difference with each seating location. Trends clearly emerge in the averaged rating for each delay. The broken and solid lines represent averaged ratings for correlated and decorrelated stimuli respectively. In the range of short delays (from 2 up to 6 or 7 msec.), precedence clearly dominates the correlated, but not the decorrelated stimuli. Correlated stimuli are heard mostly in one loudspeaker, while the decorrelated stimuli are heard in two. Ratings vary with each sound source. Precedence affects the piano stimuli somewhat more than the quartet, most likely, because it is more transient. Above 10 msec precedence is released and both the correlated and decorrelated stimuli begin to collapse toward one loudspeaker as intensity differences approaching 15 dB begin to dominate. In large space reproduction, the use of decorrelation implemented following Figure 1 or Figure 2 will aid in defeating precedence and enabling listeners to perceive sound from all loudspeakers. SÂ~ 1 E.d d juU"'-... # 1+ *c fro ff tin. I b b tia O N b Â~1,.. 1.....'........., o,,, Ml, k,,d -rs o n (7a) (7 ) os*lenc. from CoderUn, (F***) Figure 6. Simulated seating positions Figure 7. Averaged ratings fromall listeners for (a) piano [Kendall and Wilde, 1989]. and (b) quartet [Kendall and Wilde, 1989]. V. Conclusion Figure 8 provides a sunmmr of listener's subject impressions of both correlated and decorrelated sound sources. The vertical axis represents the listener's subjective judgment of whether the sound image is located primarily in one loudspeaker or not. The horizontal axis represents difference in arrival time between the nearer and farther loudspeaer. The spatial imagery of correlated sound sources varies tremendously with time delay. When the time delay is less than approximately 1.0 msec., listeners describe hearing a single sound image that is located between the loudspeakers. This represents the region of "image shift" [Barron, 1971]. When the time delay is greater than approximately 1.0 msec., listeners describe hearing a single sound image that is located at the closer loudspeaker. At ICMC Proceedings 1994 325 Audio Signal Processing

Page 326 ï~~some higher time delay, the precedence effect is released and the sound will be heard in both loudspeakers. (The exact delay at which the precedence effect is released depends upon the transient qualities of the sound source.) When the loudspeakers are separated by a sufficiently great distance, listeners report that the delayed sound is like an echo. As the time delay further increases, the intensity difference increases until at approximately 15 dB, the more distant loudspeaker becomes difficult to hear and listeners report that the sound image is located in one loudspeaker. These sorts of radical changes in sound imagery show up quite vividly when audio material is moved from one reproduction setting to another, for example, from the studio to the concert hall. As shown in Figure 8, decorrelation minimizes these radical changes and promotes spatial imagery that will remain invariant in divergent reproduction settings. It even provides for externalization in headphone reproduction. While decorrelation may not be compatible with each artist's aesthetic goals for sound imagery, it does improve the consistency of sound imagery in the wide variety of reproduction settings encountered every day so that the artist's intentions are more likely to be communicated to the audience. "not one" image shift precedence effect listener report "one++ I I I"time delay 1 msec. precedence possibly 15 dB release echoic difference correlated decorrelated Figure 8. A summary of listener's subject impressions of both correlated and decorrelated sound sources [Kendall and Wilde, 1989]. VI. References [Ando, 1977] Y. Ando. Subjective preference in relation to objective parameters of music sound fields with a single echo. Journal of the Acoustical Society of America, 62: pp. 1436-1441, 1977 [Barron, 19711 M. Barron. The Subjective Effects of First Reflections in Concert Halls---The Need For Lateral Reflections. Journal of Sound and Vibration, 15: pp. 475-494, 1971. [Haas, 1951] H. Haas. Uber den Einfluss eines Einfachechos auf dir Horsamkeit von Sprache. Acustica, 1: pp. 49 -58, 1951. [Kendall et al., 1990] Gary S. Kendall, Martin D. Wilde, and William L. Martens. A Spatial Sound Processor for Loudspeaker and Headphone Reproduction. Presented to the Audio Engineering Society Fight International Conference, Washington, D.C. Also appearing in The Proceedings of the AES 8th International Conference: The Sound Of Audio, Audio Engineering Society, New York, 1990. [Kendall and Wilde, 1989] Gary S. Kendall and Martin D. Wilde. Production and Reproduction of ThreeDimensional Sound. Presented to the Audio Engineering Society 87th Convention, New York, 1989. [Kurozumi and Ohgushi, 1983] K. Kurozumi and K. Ohgushi, K. The relationship between the cross-correlation coefficient of two-channel acoustic signals and sound image quality. Journal of the Acoustical Society of America, 74: pp. 1728-1733, 1983. [Rakerd and Hartmann, 1985] Brad Rakerd and W. M. Hartmann. Localization of sound in rooms, II: The effects of a single reflecting surface. Journal of the Acoustical Society of America, 78 (2), pp. 524-533, 1985. [Schroeder et al., 1974] M. R Schroeder, D. Gottlob, and K. F. Siebrasse. Comparative study of European concert halls: correlation-of subjective preference with geometric and acoustic parameters. Journal of the Acoustical Society of America, 56: pp. 1195-1201, 1974. [Wallach et al., 1949] H. Wallach, E. B. Newman, and M. R. Rosenzweig. The Precedence Effect in Sound Lnclization," The American Journal of Psychology, 62: pp.315-336, 1949. [Wilde, 1989] Martin D. Wilds, "The Psychoacoustical Effects of Interaural Crossco~rrelation" unpublished Masters thesis, 1989, Northwestern University. [Wilde, et al. 1989] Martin Wilds, William Martens and Gary Kendall. Apparatus for Creating Decrrelated Audio Output Signals With a Specified Cross Correlation Coefficient. Unpublished patent disclosure to Northwestern University, 1989. [Zurek, 1980] P. M. Zurek. The precedence effect and its possible role in the avoidance of interaural ambiguities. Journal of the Acoustical Society of America, 67 (3): pp. 952-964, 1980. Audio Signal Processing 326 ICMC Proceedings 1994

Top of page