Page  1 ï~~AMBISONICS EQUIVALENT PANNING Martin Neukom, Jan C. Schacher Zurich University of the Arts Institute for Computer Music and Sound Technology Baslerstrasse 30 8048 Zuirich, Switzerland {martin.neukom, jan.schacher} ABSTRACT In Ambisonics sound is encoded and stored in multichannelI sound files and is decoded for playback. In the en- and decoding process complicated functions are used. In this paper panning functions equivalent to the result of en- and decoding are presented which can be used for real-time panning in an arbitrary high ambisonic order. In the function equivalent to the socalled in-phase decoding the order of ambisonic resolution is just a variable that can be any positive number not restricted to integers and that can be changed during playback. The limitations and advantages of the technique are mentioned and real time applications are described. 1. INTRODUCTION For a project where hundreds of virtual moving sound sources have to be rendered in real-time, we have been looking for a simplification of the complicated and computational costly calculations of Ambisonics. The theoretical results were published in an AES paper in October 2007 [1]. The main results of this paper are recapitulated in a less technical stile in paragraph 4. Paragraphs 2 and 3 give a short introduction to the concepts of panning and Ambisonics. In paragraph 5 more practical issues such as implementations, limitations and advantages of the technique are addressed. 2. PANNING Panning is the technique of the positioning of a single (monophonic) source within a stereophonic image. Vector Base Amplitude Panning (VBAP) was introduced by Ville Pulkki [2] for two dimensions. In VBAP loudspeaker arrays are treated as arrangements of subsequent stereo pairs or, when extended to three dimensions, as triples of loudspeakers. Panning normally uses only level differences and feeds only the loudspeakers nearest to the virtual sound source. In contrast to other panning techniques ambisonic panning functions normally produce signals for all speakers at the same time. The functions are defined on the whole horizontal circle or the whole sphere. The sum of all speaker gains equals 1. 3. AMBISONICS Ambisonics is a surround-system for encoding and rendering a 3D sound field. In Ambisonics the room information of the recorded or synthesized sound is encoded together with the sound itself in a specific number of channels independent of the final speaker setup. The encoding can be carried out in an arbitrary degree of accuracy. The accuracy is given by the socalled order of Ambisonics. 3.1. Encoding The formulas for ambisonic encoding are derived from the solution of the three-dimensional wave equation in the spherical coordinate system where a point P is described by radius r, azimuth 0 and elevation 6. Assuming that the sound waves are plane and that the listener is located at the origin of the coordinate system the formulas can be simplified dramatically. In practice the arising infinite series is truncated and only a finite number of components are calculated and saved in the so-called ambisonic B-format. After all these simplifications a signal S is encoded by multiplying the signal with the first spherical harmonics in 3D and with the first harmonics in 2D [3][4]. The order of resolution m defines the accuracy of the encoding and the number of channels in the B-format, namely 2m+1 in 2D and (m+1)2 in 3D. 3.2. Decoding From a B-format file with n channels and a given set-up of at least n speakers the signals for the speakers can be calculated. They are a weighted sum of the B-format channels. The vector S of the speaker signals for symmetrical setups of n speakers can be calculated from the matrix of the B-format B and the matrix of the Bformat of the speaker signals C as [3] S = C-'B= CT B n (1) Since solutions for asymmetrical setups often are unusable (for 5.1 surround see [5]) normally this symmetric solution is used, even if the speaker set-up is not exactly symmetric. 3.3. Corrections The truncation of the infinite series causes side effects such as signals on speakers far away from the original sound position and inverted phases (see figure 1). By weighting the ambisonic channels according to their order these side effects can be reduced at the cost of the precision of the directivity. Figure 1 shows two level

Page  2 ï~~functions for a speaker at position 0 (sound at 0 = 0, order m= 3) the first without correction fbas(O), the second for in-phase decoding finph(0). The bars indicate the levels of 13 symmetrically positioned speakers. Thus for a sound source at position 0s we get the gain for the speaker i at position Oi as f(0s-0i). This function is exactly equivalent to ambisonic en- and decoding in 2D. For higher orders the number of calculations is reduced dramatically with this technique. The function depends only on the angle between speaker and sound source and so it can also be used in 3D as an approximation of the ambisonic panning function [1]. 4.2. In-phase Decoding In [1] we showed that the panning functions (3) and (4) with the gains (5) and (6) are equivalent to the simple function m 3 Iinph m 3 n fih, = + COS = (cos 2p f. 2 (8) Figure 1. decoding. Level functions for basic and in-phase Putting the correcting gains into equation (1) yields 1T S = 1 CDiag[].B (2) n For symmetric set-ups this equation can be simplified and we get the ambisonic panning functions [6] f(0,m) = (go + 2 gkcoskO) in2D (3) 1 = f(O,m) =- (2m + 1)gkPk(cosO) in 3D (4) n k= 1 The correcting gains for in-phase decoding are m!2 gk g0 (m + k)!(m - k)! in 2D where 0 is the angle between the speaker and the position of the sound source and p corresponds to the ambisonic order. While ambisonic encoding is only possible with integer orders the exponent in the panning function finph(O,p) can be an arbitrary positive number. Figure 2 shows the function for orders 1, 3 and 6.7. f. Figure 2. AEP-function for order 1, 3 and 6.7 For fractional orders the sum of the speaker signals does not exactly equal one but the deviation is very small, so that it is possible to change the exponent continuously without perceivable inaccuracies. Figure 3 shows the sum A of the speaker signals for 8 speakers as a function of order p and the angle 0 of the sound source. It is nearly constant between p= 2 and p = n-1. 1. 0001 A1 0.9999 1 2 - -- --...,. 8 0 Figure 3. Sum A of 8 speakers signals as function of the order p and the angle 0 of the sound source. gk!(m 1)! in 3D (6) (m + k + 1)!(m- k)! with normalizing factors go [3]. 4. AMBISONICS EQUIVALENT PANNING (AEP) For basic and in-phase decoding the formulas (3) and (4) can be simplified. 4.1. Basic Decoding For basic decoding (i.e. without correcting gains) the panning function (3) can be written in the simplified form sin( 2m+1 0 f(O,m) = sin( 21) (7) nsin( 0)

Page  3 ï~~Since with increasing order p the function finph(O,p) narrows more and more slowly, fewer speakers per order are necessary. With as few as 20 speakers it is possible to use orders up to approx. p = 60. The same panning function can be used in 3D. The only five symmetrical speaker setups correspond to the five platonic solids. For these setups it can be shown that the sum of the speaker signals is independent of the position of the sound source for small integer orders [1]. The sum can be normalized by the factor (p+ 1)/n p+ finph (Oi,P) =1 (9) n where Oi is the angle between the sound source and the speaker i, p the order and n the number of speakers. 5. IMPLEMENTATION AND APPLICATIONS 5.1. Implementation The implementation of the panning functions finph (8) is straightforward. In order to produce the signal for a certain speaker at position Ps = (xs, ys, zs) a sound at position P= (x, y, z) is multiplied by f(O,p) where 0 denotes the angle between the sound source and the speaker. The cosine of this angle is calculated as the scalar product (x, y, z).(xs, ys, zs). For a speaker setup on a sphere or a circle with radius 1 and a sound source at distance r we get in Cartesian coordinates fifnph(P,P,,p)=(xx +yy r)p in2D (10) 2r fnph (P,Pp) = (xx, + yy, + zz, + +r )p in3D 2r and in spherical coordinates finph(P,Ps,P) = (1+cos( s) )p in2D (11) 2 1 +cos(O- es)cos(6)cos(6) + sin(6)sin(6))p 2 ) in3D 5.2. Computational Costs without Look-up Tables In [1] the complexity of ambisonic en- and decoding and AEP have been estimated and compared for implementations that do not use look-up tables. There are about 0, 3, 16, 45, 96, 177, 300,...(approx. (m+.5)3) multiplications for the orders 0, 1, 2,... for the encoding of each signal. In the decoding process the matrix CT is multiplied with the B-format. This takes additional n*(m+1)2 multiplications. The panning function (10) takes only 3 or 4 multiplications and 1 function call for every sound and speaker. 5.3. Look-up tables For applications in computer music where often a great number of independent sound sources are treated and a great number of speakers are used it is reasonable to use look-up tables instead of repeated calculations. In order to estimate the complexity of the implementation and the computational costs we have to take into account the dimensionality of the tables, the number of elements in the tables and the type of interpolation. 5.3.1. Two dimensional tables Using tables with more then one dimension poses some problems. 1) The most common sound synthesis languages as Csound and Max (with the exception of extensions such as Jitter, FTM or language bindings like Java, Python etc) do not support them. 2) Tables with length n per input variable need nd entries for dimensionality d. For a resolution of for example 1024 points for two variables 1 MB of RAM is used. Thus RAM limits the table size and using interpolation becomes imperative. 5.3.2. Ambisonic en- and decoding In 2D-Ambisonics the harmonics are just sine and cosine functions. They can be used as table look-up functions in the same way as in a standard oscillator. In 3D ambisonic en- and decoding spherical harmonics are used. Since they are functions of two variables 0 and 6 we need 2 dimensional arrays. Corresponding to the number of B-format channels (m+1)2 tables are used. Since the higher order spherical harmonics are complicated a good resolution or high order interpolation is necessary. 5.3.3. AEP-function of difference of angles The input variable in the panning function (8) is the angle Os-Oi between sound source and speaker i. If this angle is calculated and the order is constant a onedimensional table is used. A two dimensional table is needed if the order is variable. Since the order normally changes slowly, and the changes of the function are accordingly small, the resolution of this parameter does not need to be very good and no interpolation is needed. Since the function is symmetric the size can be halved if we need abs(0-0i) as input. Because the difference of the angles between sound source and speaker is used, only one table is used and it is independent of the speaker setup. 5.3.4. AEP-function of angles of the sound source In order to avoid the calculation of the angle between sound source and speaker we can produce a table for the amplitude of every speaker as function of azimuth 0 and elevation 6 of a sound source. Whilst the difference of the angle between a sound source and a speaker can not be calculated from the difference to an other speaker, the

Page  4 ï~~functions of the two variables 0 and 6 are similar but shifted in direction of the axes according to the angles of the speakers. Thus instead of calculating and storing as many functions as speakers only the function for one speaker can be stored and the values for the other speakers can be read out by adding the difference of the angles between the speakers to the input values of the function. 5.4. Applications We implemented and tested AEP in different environments. A very simple implementation and some examples in Csound are available from [7]. In the context of the I-S-O project (Interactive Swarm Orchestra [9]), we implemented AEP and tested it in 3D i.e. in a set-up of 20 speakers arranged in a dodecahedron. The implementation of AEP as a patcher in MaxMSP is fairly straightforward. The expression for the calculation of the signal amplitude for one speaker takes a source position, a speaker position and the order. It is necessary to use more optimized forms of the process for a large number of sources; this is made available as a MaxMSP external programmed in C [7]. The external is based on interpolated two-dimensional look-up tables and is optimized for static speakers and a multitude of sources. It implements an n by m signal matrix and includes position input in either Cartesian or polar coordinates, distance correction on both the source and speaker amplitudes and delay-time correction for the speaker feeds. The control syntax is the same as for the other ICST ambisonics tools in order to facilitate interchanging the processes [8]. 5.5. Explaining Ambisonics The mathematics used in ambisonic theory is beyond the skills of non-scientists or non-engineers. Since panning functions are familiar and easy to visualize they provide a good didactical means for explaining Ambisonics to laymen and for deriving encoding formulas and gains for in-phase decoding. For example for order m = 3 we expand the powers of the panning function (8) ambisonic B-format. If the sound need not be stored, for example in testing environments, real time performances or playback of multi-channel sequencersessions AEP is easier to implement and takes less computing power than ambisonic en- and decoding. Only in applications where a great number of independent sounds occur, each with its own position or movement and the spatial resolution need not to be very high ambisonic en- and decoding outplays AEP. In most other cases AEP performs better. An open theoretical question is, whether there are simple formulas for ambisonic panning with other correcting gains of ambisonic decoding as for example basic decoding in 3D and so-called max rE decoding [3]. A lot of work has still to be done to implement AED and especially 2D table look-up for different sound synthesis languages and to create plug-ins for commercial software. 7. REFERENCES [1] Neukom, M. "Ambisonic Panning", AES 121st Convention, New York, USA, 2007. [2] Pulkki, V. "Virtual sound source positioning using Vector Base Amplitude Panning", J. Audio Eng. Soc., 45, June 1997. [3] Daniel, D. Representation de champs acoustiques, application a la transmission et a la reproduction de scenes sonores complexes dans un contexte multimedia. Ph.D. Thesis, University of Paris VI, France, 2000, [4] Sontacchi, A., Hdldrich, R. "Konzepte zur Schallfeldsynthese und Schallfeldreproduktion", Jahrestagung der OPG FA-Akustik, 2000, [5] Neukom, M. "Decoding Second Order Ambisonics to 5.1 Surround Systems", AES 121st Convention, San Francisco, CA, USA, 2006 [6] Daniel, D., Nicol, J. R., Moreau, S. "Further Investigations of Higher Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging", AES 114th Convention, Amsterdam, The Netherlands, 2003 [7] http// [8] Schacher, J.C., Kocher, P. "Ambisonics Spatialization Tools for Max/MSP", Proceedings of the International Conference on Computer Music 2006 (ICMC'06) New Orleans, November 6-11, 2006 [9] finph (O,p) = ( + COSO)3 = (1+ 3cos0 + 3cos2 0 + cos3 0) (12) and replace the powers of the cosine function by cosines of multiples of the angle to get the cosine part of the Bformat together with the in-phase coefficients (5). 2 (10 + 15cosO + 6cos2O + cos 38) (13) 6. CONCLUSIONS AND FURTHER INVESTIGATIONS In order to store 3D sound independently of the speaker set-up either sound and position of the sound source can be stored separately or they can be stored together in the