Page  00000074 Second-order Recursive Oscillators for Musical Additive Synthesis Applications on SIMD and VLIW Processors Todd Hodes Computer Science Division & CNMAT University of California, Berkeley hodes@cs.berkeley.edu ABSTRACT This paper summarizes our work adapting a recursive digital resonator for use on sixteen-bit fixed-point hardware. The modified oscillator is a two-pole filter with error properties expressly matched to use in the range of frequencies relevant to additive synthesis of digital audio. We present the algorithm, error analysis, performance analysis, and measurements of an implementation on a fixed-point vector microprocessor system. 1. INTRODUCTION The most widely used digital sinusoidal oscillator structure employs a first-order Fecursion to accumulate and wrap a phasor, followed by a sinusoidal functional evaluation - usually by indexing precomputed tables. Second-order recursions are an attractive alternative because the sinusoid is computed within the structure, avoiding the need for a large table or specialized hardware function evaluator. Unfortunately, numerous theoretical and practical challenges have to be overcome to use second-order structures. We are motivated to overcome these challenges because upcoming generations of microprocessors are hostile to efficient implementations of first-order recursive oscillators. Even when primary cache memory is large enough to hold a sine table, no locality of reference is observed for arbitrary frequencies. Getting good performance from vector microprocessors such as the Analog Devices TigerSharc and the Motorola Altivec and from VLIW (Very Long Instruction Word) machines such as the Intel IA-64 requires playing to the strengths of these SIMD (Single Instruction, Multiple Data) and VLIW machines: large multiply/add rate for on-chip data. The techniques described in this paper [1] were originally developed for the TO fixedpoint vector microprocessor [2]. Although these techniques were developed because TO has no floatingpoint unit, the highest multiply/add rate for many of the aforementioned machines is achieved when they operate on vectors of short fixed-point operands rather than vectors of floating-point numbers; thus, our approach is directly applicable. Adrian Freed Center for New Music and Audio Technologies (CNMAT) University of California, Berkeley adrian @cnmat.berkeley.edu The primary challenge with digital recursive structures is managing long-term stability, since rounding and truncation errors accumulate. In most applications, frequency precision is more important than phase stability, and therefore the slow, long-term phase drift of first-order oscillators is usually ignored. Error accumulation has more serious consequences in second order structures and, unfortunately, the most computationally efficient second order structure, the direct form I, is the most susceptible to debilitating instability at musically useful frequencies. An additional challenge of moderate-precision fixed-point arithmetic machines is providing sufficient frequency coefficient resolution. 2. METHOD DESCRIPTION Our novel approach to these challenges is two-fold: we keep individual oscillators short-lived, periodically recomputing state variables for each; and we modify the actual recursion calculation to increase frequency accuracy. Our algorithm shares some features of transformdomain additive synthesizers [3]. The first similarity is that the output is computed by overlapping short vectors of signals. Each vector contains the sum of many fixed frequency, fixed amplitude sinewaves. Keeping these vectors short and choosing an appropriate window guarantees a seamless interpolation from frame to frame. The second similarity is that the frequency and amplitude of each sinewave are transformed into a new domain which offers efficiency advantages for the processor at hand. In the case of traditional transform domain additive synthesizers, a spectrum is built which can be transformed efficiently into a signal block using an inverse orthogonal transform, e.g., FFT. We choose to transform into the domain of second order recursive oscillator coefficients. Advantages over FFT methods include fewer constraints over the frame size (i.e., lower latency), no cross modulation distortion components, lower memory requirements and lower algorithmic complexity. The major disadvantage is a higher cost per partial. The general form of the digital resonator, with no - 74 - ICMC Proceedings 1999

Page  00000075 damping or initialization impulse function, is:,= 2confs(L) n..-X2 with f8 as the sampling frequency, f E (0, f3/2) as the desired (constant) frequency of oscillation. To implement this equation using only sixteen-bit fixed-point multiplies, we needed to manage the fixedpoint units with enough precision to maintain accuracy across the entire audible frequency range, while taking special care to provide sufficient frequency coefficient resolution to account for human ability to distingouish subtle differences in low frequencies. Accuracy musrt be maintained across a broader range and with more precision for low-frequency partials than a simple sixteen-bit fixed-point representation supplies. Additionally, because the frequency coefficient multiplication is in the critical path, we want to minimize the computational overhead of the changes.?To quantify the issue, we specify the minimum perceptibile musical interval and then calculate the resolution necessary to maintain relative frequency accuracy. Doi~ng so indicates that the low-frequency components require more precision than higher ones - which is intuitive, since we are calculating relative accuracy. Thus, to minimize perceived error, we remap the freqlue~ncy coefficient representations in two ways: we employ an exponent internally fo emulate floatingpoint range extension, and invert the bit representation to bias accuracy toward low frequencies rather than high frequencies. These changes require two new operations per filter per sample: an add with constant shift and a1 variable shift. To understand the modifications to the filter, recall the or~iginal recurrence relation for our sine wave generator (with w = 21/I. At low frequency, the coefficient 2 cos(w) is very close to 2, and so in a floating-o-nint format, lower frequnc~~nies synthesizedf usingo the formula will have less accuracy than higoherfrequencie~s due to the need to explicitly represent the leading ones in the mantissa. Numbers closer to zero be~nefit frolm the implicit ncod~ing of leadingo zeros via a smaller exponent. In other words, larger values require bits with larger "significance"' (absolute value),~ forcing the~ leastC significant bits in the_ sa7me word to also have higher significance, thus forcing higher worst-case quantization error. W~e can more effectively use the bits of the mantissa by reversing this relationship, recasting the. equation as: 5,= 2 cos(w)x,..1 -X...2: = 2(1 - e/2)xni1 - Xn.2 (I): = 2Xn~ 1EXCn tX -21, i.e., where cos(w) = (1 - /) To represent e, an unsigned sixteen-bit mantissa. m is combined with an unsigned exponent e, biased so that the actual represented value is c - a22em. Thus. the exponent is also the right shif't amount necessary to correct a l6bx 16b -e 32b multiply with e as an operand. The 2 in the exponent allows e to range from o to 4 w~h~T m is interpreted as a fractional amount and f ranges between zero and the Nyquist frequency. What is achieved with this remapping of number representation is the ability to represent lower frequencies with more significant bits by mapping higher frequencies to representations with less significant digits. In particular, as 2 cos(2irf /h) varies from -2 to 2, we define r to vary from 4 to 0. Smaller frequency vallues produce smaller values of e, helping to satisfy our asymmetric accuracy requirements. 3. ERROR ANALYSI[S 3.1. Relative Error Relative frequency discrimination is based on the ratio of adjacent frequencies. To determine worst-case relative error, we wish to determine the maximum ratio between two adjacent e values. Call these frequency coefficients e1 and C2, and their corresponding frequencies fi and f2. From the definition of w and ~the equation cos(w) = (1 - /)!f cos~'(1 - e1/2) /2 cos-'(1 - E2/2) (2) Taking any two adjacent numbers in the rangee, we can compute fl/f2 with Eq. 2. Evaluating this ratio for all possible adjacent pairs of es values allows us to determine that it is maximiz:ed for e~ = 4 214 and C2 = 4 - 2-3 where f1/f2 ~z 1.0010337. Tihis ratio is lower than the minimum frequency ratio humans are able to differentiate, a pitch difference of approximately four to five cents (about 1/25-1/20 of a semitone) (4]. The maximum error of our algorithm is actually less than two cents: d02 1.001156. This calcutlation illulstratesr that by desi~gning higher e values to coincide with higher represented frequencies, we achieve a good match of the numerical representation to the asymmetric accuracy requirements of human logarithmic pitch~ perception. 3.2. Absolute error Two tones that are meant to have an exact ratio in their frequencies may instead gene~rate beatr frequnciesP due to frequency quantization. This effect, caused by absolute error, should be minimized. Worst-case absolute error due to epsilon quantization is shown in Figure 1, which contains both a side-by-side comparison below 2000 Hz and a detail for the modified form. As expected, the recast filter maintains more precise aIbsolute frequency than the original form. Fundamentally, more than 16 bits of t'ractional coefficient are necessary to obtain 1 Hz absolute precision across the audible spectrum (51. Our method ICMC Proceedings 1999-75 - 75 -

Page  00000076 I o,,,.n** - 00 0 too to10 o. o 0.01 0.001 0.0001 0 Ooo 1000 10000 fcrufv ly (Nl Figure 1: Worst-case absolute error due to frequency coefficient quantization. Top: Comparison of original and modified. Bottom: Modified form detail. maintains reasonable error bounds in sixteen bits of mantissa by scaling these bits with the exponent. 3.3. Rounding error At each iteration of the recursive form, a small error is introduced due to rounding of the multiply result. Due to the recursion, this error isn't corrected until reinitialization of the state variables. Possible effects of this include degradation of the signal-to-noise ratio, degradation of the long-term phase accuracy, and a lack of amplitude stability - all of which can cause audible artifacts. As has been illustrated elsewhere [6], this can be corrected via additional computation. To avoid this, we exploit our ability to reinitialize the computation at overlap-add frame boundaries, thereby allowing use of a non-self-correcting (but higher-performance) oscillator form. 4. FAST INITIALIZATION The resonator can be initialized to a desired frequency and phase at sample Xz by properly choosing the two state variables X-2 and x-1 using function evaluations in place of an initialization forcing function. The lookup values for a sinusoid with phase p and frequency f are: ( r 7rf x-_ = sin p - 2; x-2 = sin p - 4f \ Js / \ fs / These initializations must be accurate down to the loworder bits in a 32-bit fixed point representation, with the binary point set between the third and fourth bit positions in order to support a phase in the range (0. 2-r. Additionally, we need to compute the frequency coefficient 2.- 2 cos(w) to 32-bit accuracy. ttheta (32 bit fixed point) alpha beta alpha cos(thea) Figure 2: Calculating both sin(9) and cos(9) with a hybrid technique: table-lookup for a and Taylor expansion for P combine to give an accurate 32-bit result efficiently. These initial evaluations can be computed more quickly by rewriting the equations for xi and zx2 in a form that requires only the computation of sin(p), cos(p), sin(w), and cos(w): z-i = sin(p-w) = sin(p) cos(w) - cos(p) sin(w) z.2 = sin(p - 2w) = sin(p) cos(2w) - cos(p) sin(2w) = 2 cos(w) sin(p - w) - sin(p) = 2 cos(w)zx1 - sin(p) It may seem that this has actually increased the amount of work we need to perform because there are now four trigonometric evaluations rather than three (two initialization sines plus the cosine in the recursive form). However, this approach turns out to be more efficient by allowing for the judicious sharing of intermediate values in a tandem sine and cosine generation procedure. The tandem subroutine returns both sin(9) and cos(9) for 09 [0, 2xr] to full 32-bit fixedpoint precision using a hybrid technique combining table-lookup and Taylor expansion. This keeps both the table size manageable (2048 entries of 32 bits) and the number of terms in the Taylor expansions small (two). It is implemented by separating 9 into a and 3 as shown in Figure 2; a is the high-order 11 bits of 0, and / the remaining low-order bits. a is used in an exact (to one LSB) 11-bit - 32-bit table-lookups, while (guaranteed small) 0 is used in Taylor expansions: cos(/) 1 - and sin(/3): 3 1 - ) 6 ) The accuracy of expanding each to only two terms is guaranteed by limiting the size of 3 to only the loworder 21 bits of 8: the sum of the remaining terms in each expansion sequence, for all.3, is less than the LSB. Finally, a and 3 are combined using the relationships: sin(a +,3) = sin(a) cos(j3) + cos(a) sin(,3) cos(a +,3) = cos(a) cos(3) - sin(a) sin(3) - 76 - ICMC Proceedings 1999

Page  00000077 5. IMPLEMENTATION AND PERFORMANCE We implemented our approach on the SPERT neural network and signal processing accelerator board (7]. The SPERT houses a TO chip [2], which tightly couples a general-purpose scalar MIPS core to a highperformance vector coprocessor. TO is representative of digital signal processing architectures in its use of fixed-point arithmetic. Computing summations of oscillators for additive synthesis with an overlap-add approach using Eq. 1 and our pseudo-floating-point format, we need four multiplies, two variable shifts, two fused (constant) shifts and adds, and two regular adds: Xn - 2an- - en-1 + Xn-2 A, = An-1 + AA out, = out, + An x Xn On TO, this requires a total of 9 + - vector arithmetic operations per sinusoid when unrolled n times. Unrolling four times due to trade-offs in register file pressure on TO, we achieve best-case performance of about 1.15 cycles/partial: two fixed-fiequency sinusoids are required per variable-frequency partial because of overlap-add, 91 operations are required per sine, two cycles are required per vector operation on TO, and the vector length is 32 elements. Thus, for our prototype SPERT board performing 8 operations per cycle (peak) with a 40 MHz clock, rate and at a 44.1 kHz sampling rate, a theoretical maximum of 768 partials can be achieved in real time excluding all overhead. The current implementation supports up to 608 simultaneous real-time partials with frame lengths of 5.8 ms or greater, or about 1.5 cycles per partial per sample. 6. RELATED WORK Work on variable representations for direct-form oscillators and their associated error properties has been reported in the literature (8, 9], but none specifically matched for use with audio synthesis or SIMD/VLIW architectures. Low-frequency recursive oscillation was implemented on the TI TMS32010 DSP (10] using three 16-bit multiply stages and self-modifying code to hard-code shift values. As we have shown, the additional multiplies are unnecessary for audio applications. 7. CONCLUSIONS We have summarized our work adapting a recursive digital resonator for use on sixteen bit fixed-point hardware, where the error properties are expressly matched for use in the range of frequencies relevant to additive synthesis of digital audio. The new technique allows for reliable computation of frequencies impossible to handle with the direct form, while keeping additional frequency precision costs down to only two additional operations per filter sample. We presented the algorithm, an error analysis, a performance analysis, and measurements of an implementation on a fixed-point vector microprocessor system. More detailed information is available elsewhere [1]. 8. ACKNOWLEDGEMENTS The authors would like to thank John Wawrzynek, John Hauser, David Wessel, Matt Wright, and Tristan Jehan for their assistance with this work and/or comments on this paper. 9. REFERENCES [1] T. Hodes, "Recursive Oscillators on a Fixed-Point Vector Microprocessor for High Performance PhaseAccurate Real-Time Additive Synthesis," Master's thesis, University of California, Berkeley, 1997. (2] K. Asanovic, J. Beck, B. Irissou, B. Kingsbury, N. Morgan, and J. Wawrzynek, 'The TO Vector Microprocessor," Proceedings HOT Chips VII, August 1995. (3] A. Freed, "Real-time Inverse Transform Additive Synthesis for Additive and Pitch Synchronous Noise and Sound Spatialization," Proceedings of the Audio Engi. neering Society 104th Convention, 1998. [4] J. 0. Pickles, An Introduction to the Physiology of Hearing. Academic Press: Orlando, Florida, 1982. (5] J. Wawrzynek, VLSI Concurrent Computation for Music Synthesis. PhD thesis, California Institute of Technology, 1987. (6] J. Gorden and J. Smith, "A Sine Generation Algorithm for VLSI Applications," Proc. Int. Conf. Computer Music, 1985. (7] J. Wawrzynek et al., "SPERT-II: A Vector Microprocessor System," IEEE Computer, vol. 29, pp. 79-86, March 1996. [8] B. Gold and C. M. Rader, Digital Processing of Signals. New York: McGraw-Hill, 1969. [9] J. I. Acha, J. Payan-Somet, and A. Civi, "Design of Very-low-frequency Digital Oscillators," IEEE Proceedings G (Electronic Circuits and Systems), vol. 131, pp. 93-102, 1984. [10] A. I. Abu-El-Haija, M. M. Al-lbrahim, and M. A. AlJarrah, "Recursive Digital Sine Wave Oscillators Using the TMS32010 DSP," IMTC WEPM 7-6, 1994. ICMC Proceedings 1999 -77 -