Page  496 ï~~Multirate Additive Synthesis Desmond PHILLIPS * d.k.phillips @ Iboro.ac.uk Alan PURVIS alan.purvis @dur.ac.uk Simon JOHNSON simon.johnson @dur.ac.uk Durham Music Technology * School of Engineering, University of Durham, Science Laboratories, South Road, Durham City, County Durham, DHl 3LE, UK. Abstract The advantages of musical signal modelling in frequency domain via sinusoidal additive synthesis are well known, as is the prime disadvantage of high computational overhead. This paper proposes a reduction in overhead by the application of standard multirate DSP techniques to optimise the sample rate of oscillators, with a view to VLSI implementation. The optimisation works by adapting computation to oscillator frequency ranges, which are often known a priori for partials in note-based music, and places few restrictions on functionality. The cost of signal interpolation up to an industry-standard sample rate is minimised by grouping oscillators, spatially related in the final sound image, to common synthesis QMF filterbanks which have a hierarchical sub-band decomposition that balances interpolation cost with goodness-of-fit to expected oscillator frequency ranges. Finally, an overview is provided of the mapping to a hypothetical "single chip" coprocessor. 1. Introduction Additive Synthesis (AS) is a low-level transformation that underpins all spectral modelling paradigms [Serra & Smith, 1990]. The potential wide application and simplicity of AS make VLSI an attractive solution. Intriguingly, a problem is over-generality in the context of musical signals which are computed with substantial redundancy. However, optimising AS to assumed signal properties (e.g. harmonicity) restricts the application of AS to that particular signal class, compromising generality and making VLSI less attractive. To this end, a fresh solution is proposed - Multirate Additive Synthesis (MAS) - that has a graduated trade-off between generality and cost. 2. Background AS computes the Inverse Fourier Transform (IFT) of spectra comprising discrete lines, which is a property of tonal sounds such as musical timbres. In eqn. (1) y[n] is the sum of l<i<_N sinusoids, each modulated by individual amplitude and frequency envelopes A[n] and Fn]; n is a time index at a sample rate of f>40kHz. By this means, the spectral evolution of any tonal sound can be described over time. N y[n] =, Ai[n]sin(D1[n]) (i [n] = (Di[n-1]+21tFi[n]/fs The prime disadvantage is that for N oscillators, 2N envelope data streams are required at f. For example, a 100-voice ensemble with 40 sinusoids per voice (to ensure synthesis quality) at f,=44.lkHz requires a net oscillator update rate of 176.4MHz and an envelope data bandwidth of 352.8Mhz. Fortunately, envelope data may be short-time averaged with little perceptible loss of quality using a piecewise linear approximation described by a compact breakpoint set. This technique is reported as achieving data compression ratios of 100:1 and is central to AS and many spectral modelling paradigms [Serra & Smith, 1990]. However, for playback, uncompression is required in real-time for AS via an oscillator bank. Control data bandwidths of the magnitude discussed previously recur and swamp the throughput of CPUs optimised for the execution of high complexity instruction streams. AS is a fine-grain data-parallel algorithm and a traditional "form follows function" solution is to map eqn. (1) into a direct-form dedicated coprocessor exploiting deep pipelining, thus freeing CPU throughput for high-level tasks more suited to its design. A recently proposed alternative is to simulate an oscillator bank in software via the Inverse Fast Fourier Transform (IFFT) in an overlap-add scheme [Rodet & Depalle, 1992]. The algorithm is characterised by the data conditioning required to generate smooth, linear inter-frame changes in A;[n], Fi[n] [Goodwin & Kogon, 1995]. Durham Music Technology is a collaboration between the Schools of Engineering and Music. Desmond Phillips is now at the Dept. of Electronic Engineering, Loughborough University, Loughborough, Leicestershire LE11 3TU, UK. Phillips et al. 496 ICMC Proceedings 1996

Page  497 ï~~3. Multirate Additive Synthesis Direct-form oscillator banks have lost research momentum because they inherit the over-generality of AS, but a consideration of multirate DSP techniques permits their revision. In music synthesis, a chief requirement is the synthesis of notes which have properties of pitch, finite lifespan and timbres definable as an AS partial series. Assuming, for simplicity, that partial frequency ratios are pitchinvariant, then the upper and lower frequency bounds {f,,i,(x), fm(,x(X)} of each partial x are related by the constant c=f,,.1a(x)/fn,,,,(x) which represents the frequency modulation range required by the note over its lifecycle for e.g. vibrato. If the note is subject to real-time modulation (e.g. pitch bend) (a is chosen to provide sufficient headroom. MAS exploits a reinterpretation of Nyquist's sampling theorem in that the critical time-invariant sample rate for alias-free synthesis of x is fo,,n=2(fax(x)fin(x)) rather than the uniform rate off, in eqn (1). However, x as synthesised by a digital sine oscillator executing at foil, is a baseband signal that spans from DC to (fmax(x)fmn(X)). Sidebands appear around harmonics of f01,t within the audio spectrum representing audible quantisation noise which must be eliminated by interpolation up to f,. x is then heterodyned by fmin(x) to its desired spectral location, bounded by {fmin(x), fn,(x) }. For a single oscillator, this procedure is extravagant, but since AS sums N sinusoids into a single stream, a single multirate synthesis filterbank can be applied to a sum of oscillators with identical {f,,i(x), fmax(X)). A problem for filterbank design is that {fmin(x), fmax(x) } is unique to each partial x and therefore N incommensurate sample rates are generated, each requiring separate interpolation and creating a scheduling nightmare. At the other extreme, a uniform f, in eqn. (1) is shown to have computational redundancy. A compromise is to employ a filterbank such as the QMF that has a finite sub-band set with a structure that matches expected values of {f,i,(x), f,,,,(x) }. MAS reduces the update rate of an oscillator and also the control bandwidth of piecewise linear envelope uncompression. Potential savings are indicated by the ratio f1,t. If a note is fixed pitch (e.g. from a keyboard instrument) then ct is small and large savings are made. However, savings diminish as ct increases passing through for example, from flute vibrato to trombone glissando. A factor that places a lower bound on fopt' is Heisenberg's time I frequency inequality, which in MAS terms, states that oscillator control bandwidth is related to operating frequency. High frequency attack transients require a greater peak control rate than lower frequency partials (AS via the IFFT imposes a frequency-invariant rate). 4. QMF Filterbanks xO[n] 12 HO(z) xl[n] 12--1(z) H 1 (z)=HO(Tt-z) z=O Z=7t *-"- "Af HO(z) ( HI(z) O~p Os Figure 1. General Model of a QMF Stage Synthesis Quadrature Mirror Filterbanks (QMF's) are constructed from a prototype stage illustrated in Figure 1. Output y[in] comprises the sum of input signals x0[n], xl [n] applied, respectively, to half-band interpolation filters HO(z) (low-pass) and H1 (z) (highpass) which have mutual symmetry about z=id2. This property permits efficient implementation via FIR, or allpass IIR, polyphase structures. Non-ideal filters have passband and stopband ripple (s,, 5s), and most significantly for MAS, a non-infinitesimal transition region between passband and stopband Af=-,,)/2n. 6, reduces quantisation noise in the audio band for high-fidelity synthesis e.g. 8.1=-80dB. 53,1SS1,2 52,1S'2,2 S2,4 52,3 53 S'3,2 S'3,4 S3,3 S'3,8 SV 3,7 S5 S'3,6 DSl,1 DC SO_1 D bS,2 \Deadbands 2 Figure 2. Sub-band Hierarchy for K=3 A QMF stage is a two-input node which divides the output band into two sub-bands. A binary tree filterbank of arbitrary completeness may be constructed from QMF stages to provide a recursive sub-band decomposition. For MAS, consider a complete tree of depth K, with levels k=O..K, and at each level k, an integer series of sub-bands denoted by Skl where 1=1..2k The result is a sub-band hierarchy illustrated in Figure 2. for the case of K=3. HI causes a frequency inversion in each xl sub-band such that the mapping of filterbank to hierarchy is indirect: s'k,l indicates that the sub-band requires pre ICMC Proceedings 1996 497 Phillips et al.

Page  498 ï~~inversion during normalisation of Fx[n] to the associated sub-band such that x appears in y[m] correctly modulated by Fx[n]. The hierarchy has many desirable properties. Signals comprising summed oscillator sets allocated to each i are introduced mid-filterbank as well as at the terminal sub-band series at level K (used for conventional analysis / synthesis applications) and thus intermediate sub-bands are interpolated 'for free'. For oscillator allocation, if x is not accommodated by level K - because sub-bands are too narrow or the range of x spans two adjacent sub-bands - then each level K-i.. I is tested in turn. Failing that, any {fmi,,(X), fmax(X)} may be accommodated at k=O which represents classical AS. The integer-spaced series sK.I.2 facilitates efficient computation of fixed pitch notes with low a, whereas increasing a causes a partial series to be promoted ultimately into the octave-spaced series SO..KI. A constraint is that x may not be allocated into a transition region because it is attenuated and amplitude becomes, undesirably, a function of frequency. This results in 'dead-bands' in the hierarchy, which are inherited by sibling QMF stages because of cascaded filtration. Dead-bands are 'logically excluded' by a simple revision of the bounds of each Ski. If Af<<2K, then dead-band influence on the hierarchy is slight. MAS is efficient when net savings in oscillator computation exceed filterbank costs. The ratio in computation r between AS and MAS - interpretable as speedup' - is summarised in eqn. (2) where nfl, is the number of filterbanks, Cqmf is the cost of a single filterbank (due to multirate operation, Cq,,fK for a complete QMF tree), c,,,(,,, and c,,., are the costs of, respectively, MAS and AS oscillator updates and nk.i is the total number of oscillators allocated across subband sk.l of all nfl, filterbanks. The 2-k term represents the sample rate reduction of allocation to level k in MAS relative to AS and also facilitates cma-cas. lpf filterbanks are necessary to provide sufficient streams for building up the subsequent sound-image via a DSP post-processor: for instance in locating orchestral sections within an acoustic volume. K 2 r = k=01=0(2 K (2) llfl, Cq,,!f + C,,,as _ - n~ k =0 1=0 Clearly, r is dependent upon the statistics of nkIt. The largest proportion of energy in an audio signal is concentrated below about 5kHz, and that above is perceived as treble 'detail'. This infers that most oscillators in MAS will be allocated below this threshold. Low-Q sub-bands above (e.g. s3,5..8) are likely to contain few oscillators and be uneconomic to compute, particularly those badly affected by deadbands. A solution is to prune filterbanks, in the light of nk,!' so that the greatest concentration of sub-bands is in the lower audio frequencies where economisation potential is greatest. 5. Filterbank Design Issues If FIR QMF's are used, then the latency of each stage is a constant M samples, where Moc 1/Af Latency from level k is M(2k-1)/f, and to ensure that envelope features between partials allocated in different levels are time-aligned, delay-lines of length (2P-k-1)M are introduced to each sk.I oscillator set to normalise latency to the maximum from level K. The relationship CqmfoCl/Af implies that reducing deadbands (permitting greater optimality in oscillator allocation) increases filterbank costs. For a given nk.1, r has a global maximum against Af which determines optimum Af, though the requirement for imperceptible real-time latency of <lOms (dominated by coprocessor delays) puts an upper bound on K and favours high Af. In respect of the proposed coprocessor, K=3, 0.05<Af<O.1 seem optimal. Polyphase all-pass IIR QMF's are characterised by narrow Afs in comparison to FIR's of the same complexity, and are the best solution for minimising dead-bands [Hart et al, 1993]. A disadvantage is phase non-linearity which precludes latency normalisation with delay-lines. Indeed, a wider problem with the QMF structure is that HI's pass only sidebands of x requiring non-trivial control of baseband phase to give x desired phase at y[rn]. However, sub-bands s0..Kl comprise a pure cascade of HO's which preserve baseband phase in an FIR implementation, enabling the maintainance of partial phase relationships in the lower audio frequencies. The QMF MAS strategy relies upon logical exclusion of Af. Another interesting solution is 'physical exclusion' with an oversampled complex-signalled filterbank. The first MAS proposal employed the technique, without hierarchy [Phillips et al, 1994], but the principle is easily extended to a perfect dead-band free hierarchy. Due to oversampling, and the overheads of complex signal processing, it is more expensive than the QMF approach, but as only HOstyle filters are used, complete phase transparency from all sub-bands is facilitated, if at all required. 6. A MAS Coprocessor A VLSI MAS coprocessor therefore comprises two processing systems: (i) a multirate oscillator bank (MOB) and (ii) a filterbank processor (FBP). Figure 3. illustrates the architecture under consideration, which uses a memory hierarchy external to the host to contain AS control data proliferation. It is a multiprocessor system using a shared memory (SM) Phillips et al. 498 ICMC Proceedings 1996

Page  499 ï~~model of interprocessor communication which is single-port permitting the MOB and FBP to share a multiplexed bus. Though the architecture is applicable to AS, MAS minimises the control bandwidth taken by the critical MOB-SM bottleneck. With eqn. (2) expressed in terms of SM bandwidth, tentative simulations of orchestral synthesis with real music, arrangements and timbres [Sandell, 1994] determine a baseline speedup of about r_3. The maximum number of oscillators envisaged is N=fmodlfr where fmoh is the MOB clock rate. For example, with fmob=50MHz and fr=44.1 kHz, N=3400. MAS Coprocessor Host HotMOB FBP Output CPU Oscillator Summed Descriptors Osc Sets Shared Breakpoint Data Memory Filterbank State Figure 3. MAS Coprocessor The SM is memory mapped into the host address space so that the host may write piecewise linear breakpoint data directly into oscillator descriptors (SM size is 218..220 words for sufficient descriptor size and number). Each descriptor comprises oscillator state variables and breakpoint caches for Ai[n] and Fi[n]. The MOB sequentially fetches, updates and writes descriptors. The SM bandwidth required is very high, but by quantising breakpoints to a multiple of oscillator sample rate (c.f. IFFT AS synthesis), oscillators are updated in bursts (determined as b=12 cycles for a 32-bit SM) and interleaved with SM fetch / write-back such that the MOB operates at maximum throughput. sin( i[n]) is computed by linear-interpolation: a pipelinable algorithm requiring a small ROM and multiplier. Descriptors are stored in a linked list to facilitate resource allocation, and enable descriptors in a set allocated to a particular sub-band to be contiguous. This permits a concurrent burst accumulation of the set which is buffered on completion via SM to the FBP. Sub-band sets associated with a particular filterbank are also required to be contiguous to facilitate scheduling, which is driven by the FBP executing filterbanks 'round-robin' at a fixed frame rate with a period of b/(f2-). QMF topology is hostconfigurable and is executed depth-first by the FBP, with caching of intermediate results before generation of the final output to a DSP post-processor. QMF state is saved via SM to minimise on-chip RAM. The MOB is loosely coupled to the FBP, but is synchronised at frame level using a 'busy-waiting' algorithm to ensure mutually exclusive access to the SM buffers. Multirate operation is implemented by extending MOB bursts to b2K-k cycles to generate the higher sample rates for levels k=K-1..0. The MOB has lowest priority access to the SM as it may operate with a degree of asynchronicity to the FBP. It can be blocked by the host and FBP which require much a lower SM bandwidth. Predicted MOB throughput is not seriously degraded as a result. The FBP has a hard real-time schedule and therefore has priority over the host as bus conflicts are infrequent. Mutually exclusive host / MOB descriptor access is ensured via an MOB-driven interrupt to the host making worstcase system latency from e.g. host MIDI event to FBP output as 3 frames + filterbank latency. 7. Summary The proposed MAS algorithm using a QMF sub-band hierarchy allied to an optimised coprocessor offers a substantial performance improvement over a traditional oscillator bank, without compromising the fundamental generality of AS. However, further research is required in two areas: (i) a methodical investigation of the statistics of eqn. (2) to refine aspects of the MAS algorithm, and (ii) a coprocessor simulation to verify its design and identify areas for further optimisation. It is hoped that MAS is a contribution of interest to the ongoing debate about AS and spectral modelling. References [Goodwin & Kogon, 1995], M. Goodwin & A. Kogon, "Overlap-Add Synthesis of Nonstationary Sinusoids", Proc. International Comp. Music Conf., pp. 355-356, 1995. [Hart et al, 1993], J. Hart, P. Naylor & O. Tanrikulu, "Polyphase Allpass Structures for Sub-band Acoustic Echo Cancellation", EUROSPEECH'93, Berlin, Germany, Vol. 3, pp. 1813-1816. [Phillips et al, 1994] D. Phillips, A. Purvis & S. Johnson, "A Multirate Optimisation for Real-Time Additive Synthesis", Proc. International Comp. Music Conf, pp. 364-367, 1994. [Rodet & Depalle, 1992], X. Rodet and P. Depalle, "A new additive synthesis method using inverse Fourier transform and spectral envelopes", Proc. international Comp. Music Conf, pp. 410-411, 1992. [Sandell, 1994], G. J. Sandell, "SHARC Timbre Database", release 0.90, Sussex University, UK, November 1994. [Serra & Smith, 1990], X. Serra & J. Smith, "Spectral Modelling Synthesis; A Sound Analysis System Based on a Deterministic plus Stochastic Decomposition", Comnp. Music Jnl., vol. 14, no. 4, pp. 12-24, 1990. The principal author would like thank his co-authors and EPSRC for their support during his Ph.D. study (1992-95) at the School of Engineering, University of Durham, UK. ICMC Proceedings 1996 499 Phillips et al.