Page  26 ï~~Implementation of an analysis/synthesis system on a DSP56001 for general purpose sound processing Rupert C. Nieberle and Michael Warstat Skalitzer Str. 62, D-1000 Berlin 36 Technical University of Berlin, EE Department, D-1000 Berlin 10 nieberle opal.cs. t u-berlin.de Abstract The following paper discusses the implementation of the digital phase vocoder on a DSP56001 system. The digital phase vocoder represents an analysis-synthesis system providing separation from frequency and time domain information using short-time Fourier transformation (STFT) and a special conversion technique. Due to this technique the system offers several possibilities of parameter modification of the input signal such as pitch transpose and time scaling. If no parameter modification is performed the system acts as an identity system. I Theory 1.1 Analysis The analysis part of the system can he regarded as an implementation of a digital bandpass filterbank using STFT to represent the input signal. Analysis is performed using a weighted overlap-add structure and the time aliasing technique. The input signal is first weighted by an appropriate window function by convolution. The window function is designed by multiplying a Kaiser window' wKAIS(n) with a si-function h(n). The resulting analysis window has the form: hANA(n) = h(n). WKAIS(n). (1) The si-function is the time domain representation of a lowpass filter in the frequency domain. The lowpass filter is used to separate a single spectral band of the bandpass filterbank. The analysis technique is to modulate each band of the filterbank back into the baseband thus being able to decimate the sample rate. Sample rate decimation is performed by skipping M samples between successive analysis blocks. For ideal filters the maximum value of M can be set to K. For the present case of non-ideal filters M should be set to a somewhat lower value. We found out that M = K/2 is a suitable maximum value for M. The analysis window is divided into I groups of K Samples where K is the number of channels. If ngroups denotes the number of blocks of K samples in one half of the symmetric window then the window length N for an odd number of samples is: N = 2.ngroups. K + 1. The weighted input sequence for a block of N samples with block index m is of the form: (N-1)/2 ym(f) Z hANA(-n)"x(n+ mM). (2) Time aliasing the weighted input sequence means to add all corresponding samples of each group thus creating an appropriate input sequence with length K for the STFT. Therefore the time aliased sequence has the form: ngroups- 1 K- I YrE) i Z mfl.(3) This transformation is performed using a FFT algorithm. The resulting short-time spectrum is: K-1I' Xk(m)- = ym~(r)" WKkr with k-=O,l1,..., K-i1 and WK - ej2/K. (4) r01 The modulation of the higher frequency bands back into the baseband is automatically realized by multiplying the time aliased input sequence ym (r) with the sine and cosine coefficients of the FFT algorithm represented by Wffkr.After calculating the first block (m - 0) the analysis window is shifted forward M samples to calculate the next block (m + 1). 1For our purposes the Kaiser window has prooved to be best, but for other applications any other window function like Hanming, Hanning, etc. can be chosen. 26

Page  27 ï~~I THEORY 1.2 Frequency computation The instantaneous frequencies for each channel are computed by converting the real and imaginary outputs of the FFT into polar coordinates. This is done using the well known formulas: A -)= R()+ 2n)(5) for calculating the magnitude A(n) and -(n) arctan [n)](6) for calculating the phase o(n). Instantaneous frequency can be calculated using the first order approximation of the first derivative of the phase: do(n) AO(n) =O (n) - 0(n - 1). Therefore the instantaneous frequency components are: f,.(n) = dt i A(n)- arctan (n)J -arctan [(n-1) J (7) Eq. 7 means to substract the phase values of successive blocks in the corresponding channels. Before calculating the above parameters, one additional step has to be considered. Due to the analysis process M samples have been skipped between successive blocks. Regarding the calculated phase values as the differences in phase between successive samples of the input sequence a large error might occur. The calculated differences do not relate to successive input samples but to samples M spaces apart. For the magnitude values this is usually not very important because they normally change slowly. Phase values might change very rapidly in contrast. So if the calculated phase differences are regarded as "true" values, significant phase changes might be lost, leading to erroneous instantaneous frequencies. Prior to converting to polar coordinates an interpolation to a somewhat higher sample rate has to be performed. If the original sample rate is denoted srate the interpolation leads to a higher rate srate. Q/M. Sirnilarily the number of skipped samples M can be lowered alternatively. The phase vocoder provides information about the difference between the center frequency of a given channel and the frequency of the component found in that channel. Therefore only the amount of change in the difference in phase between successive samples in that channel is considered. Frequency values are determinated adding the corresponding center frequency 2irk/K of channel k to the phase differences in that channel. This means that to determine the frequency at a given point in time you have to sum all the preceding phase differences and add it to the center frequency of that channel. In analogy to an oscillator, the amount of change in phase difference equals the change of increment in a wavetable-lookup oscillator. So if the change of phase difference for a given channel is zero for all samples, the frequency in that particular channel is exactly equivalent to the corresponding center frequency. 1.2.1 Refinements to the conversion process Before calculating the final phase differences some special cases have to be considered. If the magnitude of a component in a given channel equals zero, there is no sense in calculating the frequency at all. Therefore the frequency value is set to zero in that case. Because of the structure of the analysis process, all phase values are bandlimited to -fmaz/K 0(7n) _ fmar/K. if the amplitude envelope of the channel signal changes sign, large phase jumps may occur. These jumps are so called glitches. If a glitch occurs the phase difference between adjacent samples will probably have a higher value as under normal circumstances. In fact an additional phase jump of =Â~=r will take place. To compensate for this change the calculated phase value for the component is limited to the allowed bounds. This process is called phase unwrapping. Whenever the absolute phase difference is greater than <r2,:!=r is added to the calculated phase value until it lies in the allowed bounds. To adjust the envelope the sign of the magnitude of the component is toggled each time r is added or substracted[3]. Finally the calculated phase differences have to be divided by M/Q because all values were determined for the decimated sample rate state. Q/M. 27

Page  28 ï~~2 IMPLEMENTATION 1.3 Synthesis Prior to resynthesizing, the interpolation of the calculated values to the original sample rate has to be performed Furthermore the center frequencies Wk have to be calculated: 27rk Wk- -. (8) According to the preceding section a phase value for each sample in each channel has to be determined: 4k(n)=4k(n - 1) Â~+Ptr [8k(n) +;'k1.(9) The phase value is recursively calculated by summing up all preceding phase values and adding the center frequency of the channel. To start the calculation, the values for magnitude and phase have to be initialized. Each channel is supposed to start at its center frequency with a magnitude of 1.0. Therefore the initialization is: A(-1) = 1.0 and Ck(-1)=-wk" Pr. Ck(-1) has to be initialized with negative sign. The center frequency of each channel is added each time eq. 9 is calculated. According to the oscillator analogy, to start at the center frequency means to output the first sample in the wavetable. If 4k(-1) were initialized with positive sign the center frequency would be added twice thus starting the output with the second sample. In analogy to the oscillator Ok(n) + Wk corresponds to the increment in the wavetable lookup. The factor Ptr is the pitch transpose factor and is described in the next section. The resynthesized samples are then given by: K/2 i(n) = Z_.[s(k) " Ak (n)COs Fk (n)]. (10) k=O The factor s(k) is used for scaling purposes. Due to the fact of processing only real input signals and the resulting symmetries, half of the FFT output can be skipped. The information in channel 1 is identical to that in channel (K/2) + I and so on. Only channel 0 and K/2 contain exclusive information. Therefore the information in channel I up to (K/2) - 1 counts twice. So s(k) is set accordingly: 01 for k-,K/2 s)- 2 else 1.4 Parameter modification Modification of the time varying parameters can easily be done. Alteration of the pitch of the input signal is performed by changing pt,. to an appropiate value. For ps,. = 1.0 no modification of pitch is taking place. Setting pt, = 2.0 transposes the input signal up by one octave. As can be seen in the preceding section, ptr affects the center frequencies and the calculated instantaneous phase values. For pt, = 2.0 all values are doubled thus transposing the signal one octave up. Appropiate values for pt, will transpose the signal by the desired amount. Time scaling is performed by interpolating less or more samples prior to the synthesis algorithm thus changing the number of samples to be resynthesized and therefore the length of the output signal respectively. 2 Implementation For computational efficiency a so-called Real FFT algorithm was used thus calculating two input blocks with one FFT algorithm instead of filling the imaginary inputs with zeros for each block. So the analysis procedure computes two input blocks for the FFT algorithm. After finishing the analysis, interpolation to an intermediate sample rate is performed. The next step is to calculate the magnitudes and phase differences due to the tricky conversion process. Prior to synthesis, interpolation to the original sample rate is performed. Finally synthesis is processed by summing up all the channel information for each output sample. After resynthesizing two blocks the program can loop back to the analysis process to compute the next two blocks. 28

Page  29 ï~~REF ERENCES 2.1 Performance Fig. 1 shows an example of the programm's performance. For simulation an AM signal was generated as shown in Fig. 1(a). The generated signal consists of 384 samples. The parameters of the programm were chosen as follows: ngroups = 3, K = 32 M = K/8. Because of the relatively low value of M no interpolation after analysing the signal needed to be performed. Fig. 1 (b) shows the resynthesized output signal. The input signal was time scaled to 75% and pitch transposed one octave up simultaneously. As can be seen the programm performs quite satisfactorily. I- (a). (b) Figure 1: Generated input signal (a) and performed output signal (b) References [1] Michael - R. Portnoff: "Implementation of the digital phase vocoder using the FFT"; IEEE Trans. ASSP, Vol. ASSP-24, No. 3, June 1976 [2] Ronald E. Crochiere & Lawrence R. Rabiner: "Multirate digital signal processing"; Prentice-I-hall, Inc., Englewood Cliffs, New Jersey 07632, 1983 [3] John Gordon & John Strawn: "An introduction to the phase vocoder"; CCRMA, Department of Music, Stanford University, February 1987 29