Page  93 ï~~DSP Approach to Multichannel Audio Mixing Ville Pulkki1 Jyri Huopaniemi1 Tommi Huotilainen1'2 Matti Karj alainen' Matt i. Karj alainenthut. f i 1 Helsinki Univ. of Technology, Acoustics Laboratory FIN-02150 Espoo, Finland 2 ABB Industry, P. 0. Box 94, FIN-00381 Helsinki, Finland Abstract A new method for two- and three-channel amplitude mixing is presented. The method is equivalent with the tangent law in two-channel case. In three-channel case the three loudspeakers are treated as a triangle. The phantom source can be placed inside the triangle using the method presented in this paper. A DSP tool which is capable to mix eight input channels to eight output channels simultaneously is presented. The tool is easily configured to different loudspeaker placements and to different number of loudspeakers. 1 Introduction The problem of multichannel audio mixing arises, when the number of output channels exceeds two. Conventional panpots used in mixing consoles are traditionally designed for stereophonic audio processing, and mixing of multiple output channels is very time-consuming and quite difficult. This paper presents an approach to meet the computer music composers' lack of a tool which could mix multiple input channels to sound field formed by multiple loudspeakers placed around and above the listener. The localization of produced phantom sound sources should be at any place between the loudspeakers, and it should be possible to update the positions of the phantom sources independently. A new method for two-channel amplitude panning is presented in section 2. The method is generalized to case of three loudspeakers in section 3, which makes it possible to create three-dimensional sound fields using amplitude panning. The hardware which was used in this project is presented in section 4, and in section 5 the constructed tool is described. 2 Two-dimensional Mixing There are several methods to mix a monophonic signal to two loudspeakers in order to produce a single auditory event in the desired direction in the virtual acoustic space. Sound source localization in twospeaker reproduction is dependent on two major phantom source channel 2 listener Figure 1: A typical stereophonic configuration. cues, the interaural time difference (ITD) and the interaural amplitude difference (IAD) [Blauert, 1983]. This summing localization can be divided to categories that use the ITD, the IAD, or a combination of both cues [Blauert, 1983]. The most frequently used method is amplitude panning (also: intensity panning) in which two loudspeakers are driven with the same signal with different gain factors [Blauert, 1983]. It is, however, widely understood that amplitude stereo is only a crude approximation to the actual source localization problem [Cooper, 1987]. A typical stereophonic loudspeaker configuration is illustrated in Fig. 1. Typically the angle between vectors bl and b2 which are pointing to the loudspeakers is about 600. The monophonic signal can be placed to directions between these two loudspeakers with amplitude panning [Blauert, 1983], [Theile et al., 1977]. The direction from where the ICMC Proceedings 1996 93 Pulkki et at.

Page  94 ï~~sound is heard to come follows the sine-law originally proposed by Blumlein [1931] and reformulated in phasor form by Bauer [1961]: sinc _ xl-x2 sin o0 xl + x2' (1) where Sp represents the angle between the x-axis and the direction of the phantom source, Â~-o0 is the angle between x-axis and loudspeakers, and x1 and x2 are the gain factors of channels 1 and 2, respectively. This equation is valid for lower frequencies if the listeners head is pointing directly forward. If the listener turns his head following the phantom source, a tangent law may be more correct: tan S x 1 - x2 tan 0 x(2) Equations differ in practice very little from each other. If the phantom source is moving between loudspeakers, the sound level must be kept constant. In a typical semi-reverberant listening situation, a diffusefield approximation is sufficient. Thus we keep the sound power constant, xi + x2 = constant. (3) In the mixing method proposed in this section, we consider the stereophonic loudspeaker configuration as a two-dimensional vector base defined by unitlength vectors b1 = [b11, b12]T and b2 = [b21, b22]T. The vector a= [ai, a2]T, which points towards the phantom source, can be treated as a weighted linear sum of loudspeaker vectors a=xlbl -+ x2b2, (4) in which x l and x2 are gain factors which can be treated as non-negative scalar variables. We may write the equation in matrix form aT =xB, (5) where x= [x1,x2] and B = [b1, b2]T. This equation can be solved x=aT B-1, (6) if B-1 exists. In this case B-1 exists when co: 0Â~ and po 90Â~, both of which correspond to quite uninteresting stereophonic loudspeaker placements. The equation gives us a computationally efficient way to calculate gain factors when vector a and matrix B-1 are known. The method projects the vector a to vector base B. The calculated gain factors x1,2 satisfy the tangent law (Eq. 2), which is proved in [Pulkki et al., 1996]. The gain factors must be scaled after calculation using e.g. Eq. 3. listener channel 2 Figure 2: An example of loudspeaker placement around the listener. Each triangle on the surface of the unit sphere defines a loudspeaker triplet which can be used in mixing. The number of loudspeakers is not limited by the method. 3 Three-dimensional Mixing There are several known approaches to multichannel mixing. In most approaches the loudspeakers lie in a horizontal plane, but there are some three-dimensional exceptions such as the periphonic Ambisonics system (for a good review of different techniques, see, e.g., [Malham et al., 1995]). In certain situations such as Dolby Surround large-screen film or home theater applications the speakers are placed in a two-dimensional plane, but not at an even distribution around the listener [Gerzon, 1992]. By taking into account frequency-dependent IAD and ITD cues (also referred to as the head-related transfer function, HRTF) and proper crosstalk cancellation, it would be possible also to place sounds outside the plane of the two loudspeakers [Cooper et al., 1989]. These methods are, however, only readily applicable to two-speaker configurations, and their validity for sound reproduction with a three-dimensional multi-speaker layout has not been thoroughly verified. The two-dimensional panning law presented in previous section can be generalized to a threedimensional amplitude mixing law. The twodimensional law was able to place the phantom source to a line between loudspeakers, with method described here the phantom source can be placed inside a triangle defined by three loudspeakers (later: loudspeaker triangle). Let the loudspeakers be situated on the surface of a three-dimensional unit sphere, equidistant from the listener who sits in the center of the sphere. The three-dimensional unit vector b1 = [b11, b12, b13]T, the starting point of which is the center of the sphere, defines the direction of the loudspeaker 1. The unit vectors b1,...,n then define the directions of Pulkki et al. 94 ICMC Proceedings 1996

Page  95 ï~~loudspeakers 1,Â~Â~., n, respectively. The direction of the phantom sound source is defined as a threedimensional unit vector a. A simple configuration is presented in Fig. 2. We express the phantom source vector a as a linear sum of three chosen loudspeaker vectors bk,,m analogically to the two-dimensional case and express it in matrix form: a = xkbk + xlbj + xmbm, (7) a = xBklm. (8) In equations xk, Xj and xm are gain factors, x = [xk, Xl, Xm] and Bktm = [bk, b1, bm]T. The vector x can be solved x =aB-1, (9) if Bm exists (det Bklm - 0). The equation makes a projection from the vector a to a vector base defined by Bklm in a similar way as in the twodimensional case. The components of vector x can be used as gain factors after scaling with the desired law. Thus Eq. 9 gives us an efficient DSP tool for multi-channel amplitude panning. The gain factors can be calculated in 10-20 instruction cycles using a single signal processor when a and B-ktm are known. The calculated gain factors must be scaled to satisfy equation i 2+x x3 = constant (10) as in the two-dimensional case. When the number of loudspeakers is greater than the number of dimensions, the selection of active loudspeakers has to be carried out after the desired phantom source direction vector a has been defined. Before the run time, all loudspeaker triplets that are to be used in the mixing are selected. A part of the neighborhood of each loudspeaker is covered with non-intersecting loudspeaker triangles. An example selection to loudspeaker 5 is illustrated in Fig. 2. Bases that correspond to selected triangles are thus B514, B543, and B532. A matrix Bklm is calculated to each selected base and stored to the signal processor 's memory before running the system. During the run time the loudspeaker triplet selection is started by finding the loudspeaker which is nearest to the phantom source in the sound field. Dot products are calculated between the phantom source vector a and the loudspeaker direction vectors bl,...,n. The maximum is selected to be the best match and the corresponding loudspeaker to be the closest one. This equals to finding the smallest angle between the phantom source vector and the loudspeaker vectors. The phantom source vector a is projected to all preselected vector bases around the best match ing loudspeaker using Eq. 9. The base which produces not a single negative gain factor (x1,2,3 > 0) is chosen to be the base into which the incoming sound is mixed. The method thus selects the base which corresponds the triangle on which the phantom source lies. 4 Hardware The system hardware is based on Loughborough Sound Images MDC40S modules which adhere to Texas Instruments TIM-40 specification [Weir, 1992]. The host computer (Apple Macintosh and Nubus interface board) communicates with the first TMS320C40 processor in the DSP processor array. All DSP processors use their communication ports to discuss with their neighbors. Two hardware channels for messaging are provided: 1) channel for fast messages and 2) channel for arbitrary size block transfers. Some of the processors may support special tasks, such as the A/D and D/A converters at the end of the processor array. In the present system we have 10-channel high quality stereo A/D and D/A converters available to support multi- channel audio and measurement experiments. The multi-DSP operating system kernel includes support for message passing between processors and processes. In the current configuration we are using two TMS320C40 signal processors running at 40 MHz. 5 DSP Tool for Eight-channel Audio Mixing The sample rate of the DSP tool is currently 32 kHz, but higher rates (44.1 kHz and 48 kHz) are also supported. A single signal processor was used in data processing, the second processor in our system is reserved for future extensions such as a reverberator. The software implementation was carried out in an object-oriented manner using the QuickSig and QuickC30 DSP programming environments [FIarjalainen, 1992]. QuickSig and QuickC30 are based on Common Lisp and its objectoriented extension CLOS, and they combine lowlevel TMS320C3x assembly language with highlevel objective programming. Each input channel is mixed to three output channels. Mixing is performed additively. When multiple input channels share same output channels, the signal values are added together. During the run time the phantom source direction vectors a,,...,8 of input channels 1,".-.", 8, respect ICMC Proceedings 1996 95 Pulkki et al.

Page  96 ï~~ively, are updated at a rate of approximately 50 Hz. After the update, loudspeaker triplets are selected, new gain factors are calculated and normalized using the method described in section 3. The gain factor calculation is carried out to all eight input channels during approximately 40 sample intervals. The previous gain factors are cross faded to calculated factors linearly. The interpolation is completed with equal steps during 100-200 sample intervals. All eight gain factor triplets are faded simultaneously. The gain factors do not exactly satisfy Eq. 10 during fading, but when the angle between the start point and end point direction vectors is small ( 10), no disturbing effects can be heard. The speed of the phantom sound source could be even 360Â~/s without disturbing effects. The movements of virtual sound sources can be designed beforehands. The phantom source direction vectors a1,...,8 are then calculated to each wanted arrangement of phantom source directions and stored to the signal processor's memory. Movements can also be controlled in real time from the host computer. The desired direction vectors a1,..., are written to signal processor's memory during run time. This enables three-dimensional multi-channel live mixing. 6 Conclusion We have developed a real-time digital multichannel audio system implementation which is capable of producing sound fields with multiple virtual moving sound sources. As a result of this project, multichannel digital mixing of computer music has been performed in an existing concert hall, the Chamber Music Hall at the Sibelius Academy in Helsinki, Finland. This soft- and hardware tool has been designed mainly for overcoming the problems in multichannel mixing of audio signals. It allows the user to place sounds in a defined three-dimensional acoustic space for multiple loudspeaker output. The proposed method can easily be implemented in existing computer music software packages such as CSound. A similar tool for two-dimensional mixing is under construction. Using the method described in this paper, the number and the placement of loudspeakers can be different in mixing and listening situations. This opens a wide range of future applications. Using a multichannel sound and data storing system, threedimensional sound fields of independent loudspeaker placement can be produced for uses in domestic and larger scale applications. References [Bauer, 1961] Benjamin B. Bauer. Phasor analysis of some stereophonic phenomena. Journal of the Acoustical Society of America,33(11):1536 -1539, November 1961. [Blauert, 1983] Jens Blauert. Spatial Hearing. The MIT Press, Cambridge, Massachusetts, 1983. [Blumlein, 1931] A. D. Blumlein. U.K. Patent 394,325, 1931. reprinted in Stereophonic Techniques, Audio Eng. Soc., NY, 1986. [Borenius, 1977] Juhani Borenius. Moving sound image in the theaters. Journal of the Audio Engineering Society, 25(4):200-203, 1977. [Cooper, 1987] Duane H. Cooper. Problems with shadowless stereo theory: Asymptotic spectral status. Journal of Audio Engineering Society, 35(9):629-642, September 1987. [Cooper et al., 1989] Duane H. Cooper and Jerald L. Bauck. Prospects for transaural recording. Journal of Audio Engineering Society, 37(1/2):3-39, January/February 1989. [Gerzon, 1992] Michael A. Gerzon. Panpot laws for multispeaker stereo. In The 92nd Convention 1992 March 24-27 Vienna. Audio Engineering Society, Preprint No. 3309, 1992. [Karjalainen, 1992] Matti Karjalainen. Objectoriented programming of DSP processors: A case study of QuickC30. In International Conference on Acoustics, Speech, and Signal Processing, pages 601-604. IEEE Signal Processing Society, 1992. [Malham et al., 1995] David G. thony Myatt. 3-d sound ing ambisonic techniques. Journal, 19(4):58-70, 1995. Malham and Anspatialization usComputer Music [Pulkki et al., 1996] Ville Pulkki, Jyri Huopaniemi, and Tommi Huotilainen. Dsp tool for 8-channel audio mixing. In Nordic Acoustical Meeting 96. The Acoustical Society of Finland, 1996. [Theile et al., 1977] Gunther Theile and Georg Plenge. Localization of lateral phantom sources. Journal of the Audio Engineering Society, 25(4):196-200, 1977. [Weir, 1992] Ralph Weir. TIM-40-TMS320C4X module specification, 1992. Texas Instruments Inc. Pulkki et al. 96 ICMC Proceedings 1996