Page  301 ï~~Critically Sampled Third Octave Filter Banks Scott N. Levine Stanford University Center for Computer Research in Music and Acoustics (CCRMA) scottlmccrma. stanford. edu http://www-ccrma. stanford. edu Abstract This paper introduces the design of a critically sampled, third octave filter bank. Filter design methods are shown for the octave band filter bank and the third octave sections. In addition, the trade-offs in the design are explained among frequency selectivity, regularity, complexity and memory. 1 Introduction The front end of most current audio data compression algorithms use some sort of time-frequency representation to transform the time domain signal to some other domain more closely matched to the human ear. One accepted model is often called the critical band model, which shows how signals psychoacoustically mask one another, as long as they are within a critical band [Zwicker, 1990]. A close approximation to the critical bands of hearing is a third octave filter bank, which is designed in this paper. Thus, if a signal were bandlimited within a critical band, and then quantized, the resultant quantization noise would remain within the band and would be perceptually masked by the original signal. This is the reason that tight frequency selectivity will become an important parameter to optimize later in the paper. To implement this third octave filter bank, the system first decomposes the input signal into octaves. Then, the octave band signals are further split into third octave sections. Along the way, we will discuss various filter design methods, along with the resulting trade-offs. 2 Octave Band Filter Banks The first step in designing a critically sampled third octave filter bank is to split the input signal into critically sampled octaves. A filter bank is critically sampled if the data rate of the original input signal is equal to the sum of the data rates of the transform subbands. Octave band filter banks are also known in the literature as tree-structured filter banks or discrete-time wavelet transforms (DTWT). In an octave band decomposition, the input signal is first split into a lowpass and a highpass sig nal, and both filtered signals are downsampled by two. The lowpassed, downsampled output is iterated through the same filtering process. For a K octave filter bank, this two channel filtering process is repeated K times. See figure 1 for an example of a two octave structure. x - H1(z) Figure 1: A two octave analysis filter bank The optimal Ho(z) would be a brick wall lowpass filter, with a cutoff frequency of 7r/2. Since this would be an unrealizable filter, we must make several trade-offs among frequency selectivity, regularity, and filter length. 2.1 Frequency Selectivity For the sharpest transition between passband and stopband (tightest frequency selectivity) regions, we first try designing a low pass filter using the remez exchange algorithm. But, the smaller the transition region (the tighter the frequency selectivity), for a given length FIR filter, the higher the equiripple sidelobes level will rise. As can be seen in the filter's magnitude responses in figure 2, the aliasing noise floor is at -70dB and the transition width is approximately 0.077r, for L = 64. We attempt to reduce the transition width while satisfying the QMF flatness condition, IH0(ei')12 + IHo(e(w+'))I2 = 1, within some error tolerance. In this design, the error ripple was bounded by Â~0.05dB. ICMC Proceedings 1996 301 Levine

Page  302 ï~~Figure 2: Magnstude responses of Ho(z) and H1 (z) 2.2 Regularity Loosely, a filter bank with regular filters will have a smooth impulse response for the lowest frequency subbands. That is, most of the lowest subbands' energy will be bounded within only the low frequencies. Daubechies has shown that a sufficient condition for a filter to be regular would be having some amount of zeros at 7r [Daubechies, 1988]. If a filter is of length 2N and there are N zeros at 7r, then this becomes a maximally fiat, regular, Daubechies filter. The effects of regularity in the frequency domain can be seen in figure 3. The lower figure corresponds to the magnitude response of the lowest subband of a DTWT using the Daubechies maxflat filter, which corresponds to its scaling function. Notice how the energy for this subband is bounded by 0.17r. But, for the upper plot which corresponds to the remez exchange designed lowpass filter (which has no regularity), the scaling function has higher frequency sidelobes of equal height. The longer the remez exchange filter, the lower the sidelobes will be. Both of the shown plots were calculated by iterating length 32 filters. scaling function of a renez echang. 1iter high regularity. A filter designed with the remez exchange algorithm will have high frequency selectivity, but will also have sidelobes in the subbands' iterated filters. The filter with high regularity has poor frequency selectivity, but has very small sidelobes in the iterated filters. The design chosen in this system was to generate long remez exchange filters (64 taps), such that the iterated filters' sidelobes were pushed down 70dB. Thus, we maintain high frequency selectivity, while the sidelobes are low enough to be considered the noise floor for the application. It will be seen in section 3.2 that the noise floor for the third octave sections will also be placed down 70dB, thus matching the characteristics of the DTWT filters. 3 Third Octave Filter Banks Once the signal has been split into critically sampled octaves, it now gets split into third octave sections. These third octave sections, which can be thought of as uniform bandpass filters on a logarithmic frequency axis, unfortunately have irrational bandwidths. For example, the lowest third octave section (in normalized frequency) will be r= = 21/3 - 1. The middle section r1 is of width 22/3 - 21/3, and the highest section r2 = 2 - 22/3. The topic of constructing perfect reconstruction filter banks with more general rational sampling factors was more recently introduced [Kovaevic and Vetterli, 1993]. In this same paper, they mention that third octave sections could be rationally approximated by the factors [1/4,1/3, 5/12]. Notice that these factors will add to unity to guarantee critical sampling. In this paper, they state two different methods of realizing filter banks with arbitrary rational sampling rates: direct and indirect. The direct method is simpler in complexity and design, but only works for certain rational rates. Unfortunately, the previous third octave fractional sampling rates make this method impossible due to aliasing problems. The indirect method works for any rational rates, but produces frequency shuffling for certain cases. Shuffling denotes the process in which a part of the signal's spectrum has been translated to another part in the spectrum. In order to avoid spectral shuffling, the rates (6/24, 8/24, 10/24] will now be used. 3.1 Indirect Method The indirect design, as stated in [K~ova~evid and Vetterli, 1993], first splits a signal into a number of subbands equal to the lowest common denominator of the three third octave sections. In this case, there will be 24 subbands. Then, three groups of subbands will 9V i..1. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 frequency [p4 raMdlans) scal ng function of a Daubecheo rmaxflat filter 50l 0"....................... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 frequency [p retians] Figure 3: The magnitude responses of remez exchange (top) and Daubechies maxflat (bottom) iterated filters Unfortunately, this high degree of regularity of the maxflat Daubechies filter reduces its frequency selectivity, which only has a transition bandwidth on the order of-1. This filter can be contrasted to the remez exchange filter designed in the previous section with no zeros present at ir, but with good frequency selectivity and a transition bandwidth on the order of. In conclusion, the trade-offs in this DTWT filter design is between high frequency selectivity and Levine 302 ICMC Proceedings 1996

Page  303 ï~~be recombined to form the three, third octave sections. In this. design, groups of 6,8, and 10 subbands will be recombined to form the three output signals. This scheme is graphically shown in figure 4. boundaries, there is a small amount of ripple, on the order of 1dB. In most applications, this should not be a problem. low -4 t6 Z) t 8 Guddl(z) x 0 - flfl.. G,,(z) Figure 4: The indirect method of generating third octave sections Each of the shaded boxes in figure 4 are uniform filter banks. The tall box on the left is a 24 channel uniform analysis filter bank, while the three shorter boxes on the right are uniform synthesis filter banks, of 6,8, and 10 channels, respectively. Therefore, it is evident that non-uniform filter banks can be designed from a cascade of uniform banks, while maintaining critical sampling. 3.2 Filter Design The next design choice is how to design the uniform filter banks shown in figure 4. Because of the low complexity and simple design, we chose to implement pseudo-QMF cosine modulated filter banks for all the uniform filter banks in this structure. For these filter banks, one needs to only design a single lowpass filter prototype, and all other filters will be cosine modulations of this prototype. The prototype lowpass filter for the 8 channel uniform cosine modulated filter bank was designed using the method of [Nguyen, 1994]. The prototype filters for the 6,10, and 24 channel filter banks were all sample rate converted from this single 8 channel lowpass prototype [Cox, 1986]. The ratio of the prototype filter length to the number of filter bank channels is kept constant at a factor of 8 in this design (also known as the overlap factor). The magnitude response of this third octave section can be seen in figure 5. Notice that at the third octave filter bank Figure 5: Third octave section magnitude response, for a single octave With all the pieces now in place, they can be combined to form a critically sampled, third octave filter bank. Initially, we begin with a five octave DTWT structure similar to the one pictured in figure 1. On the output of all the rightmost downsamplers, an individual third octave section, as in figure 4, is placed. An example of the system design of a three channel, two octave case can be seen in figure 6. The magnitude response of the complete five octave filter bank can be seen in figure 7. t - --high loow H-(z) 42-mid: low Figure 6: Three channel, third octave, critically sampled analysis filter bank =l o.............................:..... i........................:......... Figure 7: Third octave filter bank magnitude response 4 Complexity While this third octave structure may seem cumbersome, all of the building blocks can be implemented around FFT's and DCT's, which reduces the complexity. First, the complexity of the third octave sections will be discussed, then the DTWT. 4.1 Third Octave Complexity The complexity of splitting many critically sampldotv bn inl it hr ctv inl niubr ofTh octave preset.r Ink fat~d isesos ICMC Proceedings 1996 303 Levine

Page  304 ï~~equivalent to the complexity of splitting a single fullrate signal into third octave sections. The complexity of a single PQMF cosine modulated filter bank (analysis or synthesis), CQMF(M, N) is on the order of N+MlogM where N is the length of the lowpass prototype filter and M is the number of channels in the filter bank. The M in the denominator is because of the decimation by M, N is the cost of FIR convolution, and MlogM is the cost of the cosine modulation using fast DCT's. In the forward third octave splitter, as pictured in figure 4, there is not just one QMF bank, but four. The leftmost bank splits the input into 24 signals, each downsampled by 24. To compensate for this different sampling rate, the complexities of the three synthesis filter banks must be scaled by a factor of 24 i E {low,med,hi}. For example, CQMF,2OW = CQMF(6,61), where 1 is the overlapping factor defined in section 3.2 (1 = 8 is this design). Therefore, the total third octave complexity for all octaves is: Cthird = CQMF+CQMF,low+CQMF,mid+CQMF,hi = CQMF(24, 24l) + 6CQMF(6, 6l) 810C +4CQMF(8,8l) + -CQMF(IO, 101) In order to synthesize the third octave signals into the original, the same complexity is required. In this system, the total complexity for analysis and synthesis is 47 operations per input sample. 4.2 DTWT Complexity A convenient fact about the complexity of an octave band filter bank such as the DTWT is that it is bounded by 2CDTWT, where CDTWT is the complexity involved in one stage of highpass filtering, lowpass filtering, and decimation by two. Using tricks such as polyphase filtering and FFT overlap-add convolutions, the complexity for a K octave forward and reverse DTWT is on the order of l6logL(1- 2-K), using length L FIR lowpass and highpass filters. In this design, the DTWT complexity amounts to 95 operations per input sample. 5 Memory Requirements In this system, the amount of memory required comes from two places: the delay lines inserted into the DTWT to ensure subband time synchronization, and the FIR filters inside the both fil ter banks. For the DTWT, the memory needed for the inserted delay lines is (2K - (K + 1))[L + Dthird]. Notice it increases exponentionally for the number of octaves K, and is also linearly dependent upon Dthird, which is the third octave latency. In this design, Dthird = 336 samples, and the delay line total is 10,374 samples. The number of memory required by the lowpass and highpass filters for the DTWT is 4L(K + 1), which is 1,260 samples for this system. For the third octave sections, memory is needed for the polyphase filtering within the QMF banks. A cosine modulated QMF analysis bank of length N prototype lowpass filter requires N elements of delay, while a synthesis bank requires twice as much memory, 2N samples. For K octaves, it can be shown that this memory tallies to 6.241(K + 1), which amounts to 6,912 samples. Thus, for all of the previous sources of memory, the entire system requires 18,546 samples of memory. 6 Conclusion This paper introduces the critically sampled, third octave filter bank. This structure closely models the critical bands of human hearing, at the expense of higher complexity and memory requirements. Filter design methods were shown for the both octave and third octave filters, relating regularity versus frequency selectivity. Acknowledgments The author would like to thank Dana Massie and the Joint E-mu/Creative Technology Center for the original research project idea, along with their support. References [Zwicker, 1990] E. Zwicker and H. Fastl. Psychoacoustics, Berlin: Springer-Verlag, 1990. [Daubechies, 1988] I. Daubechies. Orthonormal Bases of compactly supported wavelets, Commun. on Pure and Applied Math., 41:909-996, Nov. 1988. [Kovaevi and Vetterli, 1993] J. Kovaevi6 and M. Vetterli. Perfect Reconstruction Filter Banks with Rational Sampling Factors, IEEE Transactions on Signal Processing, Vol. 41, no. 6, June 1993. [Nguyen, 1994] T.Q. Reconstruction Transactions on pp 65-76. Nguyen. Near-PerfectPseudo-QMF Banks, IEEE Signal Processing, Jan. 1994, [Cox, 1986] R. Cox. The Design of Uniformly and Nonuniformly Spaced Pseudoquadrature Mirror Filters, IEEE ASSP, Vol. 34, No. 5, October 1986. Levine 304 ICMC Proceedings 1996