Page  00000001 REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time implementation of a broadband noise reduction method based on non-linear spectral subtraction is proposed. To prevent processing distortions in form of musical noise, over-subtraction is applied to the degraded signal spectrum. Furthermore, time-averaging is used to reduce the variance of the estimated signal-to-noise ratio (SNR). A masking threshold obtained by spectral smoothing leads to further reduction of audible processing distortions. 1 Introduction Old audio recordings (e.g. on tape, record or wax cylinder) are corrupted by different signal degradations. Digital signal processing (DSP) is used for restoration, where each kind of degradation is processed independently. In this paper the emphasis will be on broadband noise reduction. Broadband noise is common to all forms of recording. Usually the noisy signals are processed in the frequency domain using short-time spectral attenuation techniques, where components are attenuated according to their signal-to-noise ratio (SNR). These techniques were firstly applied to speech signals within the area of speech enhancement [Lim,83], befoe they were adopted for noise reduction in musical recordings [Vaseghi,92]. Although the restoration of speech signals is closely related to the restoration of musical signals, the emphasis is on different criteria. While in speech enhancement intelligibility is the main criteria, sound quality is the dominant parameter in restoration of musical recordings [Cappe,95]. One well-known method for achieving the broadband noise reduction is based on spectral subtraction ([Boll,79],[Vaseghi,96]). The degraded signal x(n) is modeled as a pure audio signal s(n) and a superimposed broadband noise d(n). The signal degradation is processed in the frequency domain using the short-time Fourier transform. For each frequency bin k and each frame m, a specific amount of the noise spectrum ID(k)lb is subtracted from the short-time spectrum IX(m,k)lb ( b= for magnitude subtraction, and b=2 for power subtraction). The noise spectrum ID(k)I has to be estimated from a noise only signal segment. For re-synthesis the denoised signal spectrum IS(m,k)l is combined with the original phase spectrum arg[X(m,k)]. The spectral subtraction method can be modeled as time variant non-linear filter. Its transfer function depends on the signal-to-noise ratio SNR(m,k) which is estimated from the short-time spectrum IX(m,k)l and the noise spectrum ID(k). The noise variance within a single frame causes the SNR to be overestimated for some frequency bins. This results in a residual noise consisting of short sinusoidal impulses whose frequencies vary from frame to frame. This phenomenon is known as musical noise. The noise variance can be reduced by using a time-averaged signal spectrum instead of IX(m,k)I ([Boll,79], [Vaseghi,96]). This reduces the musical noise but without completely eliminating it. Vaseghi et al, in [Vaseghi,92] exploit the musical noise's behavior for its detection and partial elimination.

Page  00000002 An alternative method to prevent this processing distortion in spectral subtraction is to apply over-subtraction [Berouti,83] to the degraded signal spectrum. According to the estimated SNR, more than the average noise spectrum has to be subtracted. This leads to a strong attenuation of small signal components. If the factor of over-subtraction is high enough, the musical noise will be completely eliminated, but audible distortions in the audio signal can be generated. The noise suppression rule proposed by Ephraim and Mallah ([Ephraim,84],[Ephraim,85]) allows significant noise reduction without causing musical noise. This is mainly due to using a non-linear time-averaged SNR [Cappe,94], which exhibits a lower variance than a SNR estimated without averaging. Furthermore, the masking properties of the human auditory system can be used to reduce processing distortions (e.g. [Tsoukalas,93]). The proposed method combines spectral over-subtraction with several smoothing strategies in both in time and in frequency domain to reduce the SNR variance. This leads to little audible processing distortions. Section 2 discusses the methods used within the denoising scheme. In section 3, the new noise reduction filter is described. 2 Review of Denoising Filter Methods 2.1 Spectral Subtraction Berouti et al., in [Berouti,83] present a variant of the spectral subtraction method for speech enhancement. An overestimate of the noise magnitude or power spectrum is subtracted from the degraded signal spectrum. In each short-time frame m, the amount of over-subtraction depends on the SNR of the incoming degraded signal. A subtraction factor a(m)> 1 is determined in order to reach good noise reduction and little processing distortions. If the resulting denoised signal spectrum IS(m,k)l falls below a minimum level, it is replaced by a scaled version of the input spectrum plX(m,k)l. The scaling parameter P determines the residual noise floor after re-synthesis. The denoising algorithm is expressed through the following relationship: IS(m,k) = ( IX(m,k)Ib - a(m) ID(k)lb )1b 1 and IS(m,k)- IS(m,k)l, for IS(m,k)l > lX(m,k)l, 2 PIX(m,k)l, else. p is set to 0 < p << 1. The remaining noise floor can be used to mask the generated musical noise. Thus the setting of the two parameters a and p determines the tradeoff between the amount of residual broadband noise and the level of perceived musical noise. For a fixed value of P, increasing the value of a reduces both the broadband noise and the musical noise. Increasing a above a certain limit leads to audible distortions because of the strong attenuation of small signal components. We adopt this denoising algorithm but use a subtraction factor C(m,k), which is calculated for each frequency bin

Page  00000003 k in each short-time frame rn. The spectral subtraction equation 1 changes to IS~r,k)I = (IX(rn,k)I" - ca(r,k) ID(k)Ib )1/b 3 The subtraction factor ca(r,k) is a function of an estimated oa(m,k) signal-to-noise ratio SNRprijrn,k) (see figure 1), which is controlled by the parameter ca,,the over-subtraction factor6 for SNRprijrn,k) 0OdB. For magnitude subtraction, 0%4t~ should be within the range of 1.8 to 2.5. -5 0 5 10 15 20 25 SNRprio(m~k) (dB) Figure 1: Subtraction factor a(m,k) as a function of SNRprio(m,k) 2.2 Reducing the Noise Variance Due to the noise variance, the power of the noise components within a single fr-ame can deviate from the estimated mean value. This leads to the musical noise phenomenon in spectral subtraction. A number of methods to reduce the noise variance are suggested in literature. Boll ([Boll,79]) uses a time-averaged signal spectrum IX(rn,k)I= 1/MC~{0' IX(rn -i,k)I instead of the current signal frame IX(rn,k)I, from which the average noise spectrum is subtracted. This averaging procedure does not only reduce the variance of the superimposed noise spectrum ID(rnz,k)I, but also introduces temporal smearing of short transients in the signal spectrum IS(rn,k)I Therefore, the averaging is limited to a small number of adjacent frames. Ephraim and Mallah, ([Ephraim,84Ml Ephraim,85]), use a time-averaged SNR for the reduction of the influence of noise variance. Because of a non-linear averaging procedure, this method has a better performance than the averaging method described previously. To reduce the variance of the SNR, Ephraim and Mallah introduce a recursive evaluation scheme for the SNR in the current frame which takes into account information from previous frames: SRr( k (1-ri) P[SNRiocai(rn,k)] + rl ISm 2) 4 SN~pri~rn~k)ID(k)12 with P[x]=x for x>O and P[x]=O else. SNRiocai(rn,k) = IX(rn,k)12 / ID(k)12 - 1 is a signal-to-noise ratio estimated from the data in the current frame rn. Ir-1,kI is the denoising result of the previous fr-ame. Therefore, the term IS~rn-1,k)12 / ID(k)12 is an estimate of the SNR in frame rn-i. Due to the noise variance, SNRiocai(rn,k) is likely to be overestimated for some bins k but but it's influence on SNRprijrn,k) is reduced because of the weighting factor with (i-ri). For small signal components, the variance of is SNRprijrnZ,k) much smaller than the variance of SNRiocai(rn,k), as long as the parameter rl is close enough to 1. For signal

Page  00000004 component levels well above the noise level, SNRprio(m,k) is not longer a smoothed estimate of the local SNR. It just follows SNRiocal(m,k) with a delay of one frame. Using such a time averaged SNR helps reducing the musical noise phenomenon even for recordings whit non-stationary background noise. For details see [Cappe, 94]. The behavior of the smoothed SNR is similar, if the Ephraim-Mallah suppression rule is replaced by the Wiener suppression rule [Cappe, 94]. If the power subtraction rule is used, the attenuation for SNR values ariund 0 dB is too small. Therefore, SNRprio undergoes less smoothing. In our implementation we use over-subtraction. Thus, the attenuation around SNR = OdB is high enough and leads to a temporal smoothed signal-to-noise ratio SNR prio(m,k). As shown in figure 1, SNR pri(m,k) is used to determine the subtraction factor a(m,k) 2.3 Reduction of Processing Distortions using Psychoacoustic Criteria Tsoukalas et al, in [Tsoukalas,93] propose a denoising scheme that takes into account psychoacoustic criteria [Zwicker,90]. Only noise components above an estimated masking threshold are removed from the noisy signal spectrum. So, the audio signal is little affected by the denoising process and distortions are reduced. The simultaneous masking property of the human auditory system can be modeled as a spectral smoothing procedure along a non-linear frequency axis. The masking bandwidth increases with increasing center frequency and depends on the absolute sound pressure level. The shape of the smoothing filter is asymmetric with a steeper descent to lower than to higher frequencies. To determine the masking threshold, the signals are usually transformed from the linear Hz scale to the critical-band-rate scale [Zwicker,90] or to the equivalent rectangular band rate scale [Moore,83]. The authors use a filter Hmask with a "frequency impulse response" that takes into account some of the auditory masking properties. The degraded signal spectrum IX(m,k)12 is filtered in the linear frequency domain (along index k). The filter output is Xmask(m,k) = Hask(k) * IX(m,k)12 5 where * stand for convolution along k. The duration of the masking filter's "frequency impulse response" increases for high frequencies which simulates the increasing masking bandwidth. The non-linear level dependence and the absolute threshold contour are not taken into account. The smoothed spectrum IXmask(m,k)12 is used instead of IX(m,k)12 to calculate the local SNR. This results in a reduced variance of SNRiocal(m,k) and subsequently of SNRprio(m,k) (see equation 4). 3 Proposed Noise Reduction Method 3.1 Denoising Filter The proposed denoising scheme uses the methods discussed in section 2. Non-linear spectral subtraction is combined with non-linear smoothing strategies in both the time and the frequency domain. The smoothing operations are used to estimate the SNR which exhibits a small variance and are used for determining the transfer function HFil(m,k)

Page  00000005 of the spectral subtraction filter. The resulting noise reduction method generates small audible processing distortions. The spectral subtraction equation 3 can be expressed as a non-linear time-variant filter with a zero-phase fr-equency response HFl,(m,k). The denoised signal frame is obtained with S~m,k) =X(m,k) ~H,,mk 6 and H,,mk= (1 - ca(m,k) ) ~SN~ i~lZ~~+ '7 The exponent b equals 1 for magnitude subtraction and 2 for power subtraction. The transfer function HFl,(m,k) depends on the subtraction factor ca(m,k) and the signal -to-noi se-rati o SNRFl,(m,k). The local SNR is calculated from the data in the current frame m: IXmask(m'k)12 SNRiocai(m,k) Dk1 8 IXmnask(m,k)12 is a smoothed version of the degraded signal spectrum IX(m,k)12, which is passed through the masking filter described in section 2.3. Refering to equation 4, the local SNR is used to calculate the two time-averaged signal-to-noise ratios SNRprij~m,k) (eq.4) and SNRFl,(m,k) SNRl(mk)= (l-y) P[SNRiocai(m,k)] + y __ 9 The amount of time-averaging for each of the two SNRs can be controlled independently by the parameters y and rl ( y, rl =0 means no averaging). The subtraction factor ca(m,k) in equation 6 is a function of SNRpri(m,k). It is obtained via the non-linear function in figure 1. The denoising filter of equation 6 and 7 is controlled by four parameters: The noise floor parameter P3, which determines the remaining broadband noise floor. * The over-subtraction factor c, for SNRprij~m,k) = 0dB * The averaging parameters y and ri The block diagram of the proposed noise reduction scheme is shown in figure 2.

Page  00000006 x(n) Calculation of the subtraction factor a(m,i Figure 2: Block diagram of the proposed spectral subtraction filter 3.2 Implementation The algorithm was tested with a real-time implementation in MAX1 on SGI R 5000 and Next-ISPW. In each implementation, the following parameters were fixed: The window length, the window type (Hanning or Tukey), the the FFT length, and the window overlap. The sampling frequency was set to 16kHz, 22.05kHz, 32kHz or 44.1kHz. 1Signal processing software developed by Miller Puckette at IRCAM-Paris.

Page  00000007 Each implementation works either with magnitude spectrum subtraction (b=] in equation 6) or with power spectrum subtraction (b=2). 3.3. Influence of Parameter Settings Beside the main filter parameters 13, a, y and ri, the parameters of the short-time transform (window length, window type, FFT length, and overlap) affect the noise reduction results. The influence of each parameter shall be briefly discussed in the following. Window Length: The window length must be chosen in order to ensure a good frequency resolution and to prevent smearing of signal transients. Increasing the window length decreases processing distortions but also introduces smearing. A length of 30 to 40ms (e.g. 1024 points at 32kHz sampling fr-equency) was found to be optimal for many signals. FFT Length: The proposed filtering operation in the frequency domain can cause long impulse responses. Therefore, the length of the FFT should be longer than the analysis window. If the zero-padding is to short, time domain aliasing may occur. The zero-padding factor should be set to 2. Window Overlap: In implementations with a Hanning window an overlap of four windows (75% overlap) is used. This helps to reduce the spreading of signal transients due to the time averaged SNR (reported in [Cappe,94]). In implementations with Tukey window (rectangular window with cosine fade in and fade out) two windows are overlapped. The amount of overlap depends on the length of the fading portions. Noise Floor Parameter 13: The parameter 13 determines the remaining noise floor. 1st value depends on the noise level of the input signal. For high noise levels, 13 should be in the range of 0.1I to 0.01. Especially for old recordings, it was found that the remaining noise floor must not be too small for a natural sound quality. Over-Subtraction Factor a,: A higher amount of over-subtraction leads to a stronger attenuation of components with a low SNR. This prevents musical noise, but suppresses too many small signal components. A range between 1.8 and 2.3 was found suitable for magnitude subtraction. For power subtraction, the optimal range is between 3 and 6 (see also [Berouti,83]). Averaging Parameters y and rl: These parameters control the amount of time averaging for the SNR estimation. The averaging reduces the SNR variance, but also introduces some smearing of signal transients (see [Cappe,94]). Both parameters should be within the range of 0.9 to 0.98. 4 Conclusion A broadband noise reduction method based on non-linear spectral subtraction was presented. Non-linear smoothing operations in both the time and the frequency domain are used to determine low variance estimates of the SNR. From the estimated SNR, the transfer function of the spectral subtraction filter is calculated. Over-subtraction and the averaging procedures prevent musical noise. Listening tests show that proper adjustment of the control parameters leads to excellent restoration results. Using a short-time transform with nonuniform fr-equency resolution, and taking into account information from the short-time phase spectrum might lead to further improvement of the proposed method.

Page  00000008 References [Berouti,83] M.Berouti, R.Schwartz,and J.Makhoul: Enhancement of speech corrupted by acoustic noise. In Speech Enhancement. Edited by J.S.Lim, Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632, 1983, pp.69-73. [Boll,79] S.F.Boll: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol.ASSP-27, No.2, pp. 113-120, April 1979. [Capp6,94] O.Capp6: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. on Speech and Signal processing, vol.2, no.2, pp.345-349, April 1994. [Capp6,95] O.Capp6, J.Laroche: Evaluation of short-time spectral attenuation techniques for the restoration of musical recordings. IEEE Trans. on Speech and Audio Processing, vol.3, no. 1, pp.84--913, Jan. 1995 [Ephraim,84] Y.Ephraim and D.Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.32, no.6, pp. 1109-1 121, December 1984. [Ephraim,85]Y.Ephraim and D.Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.33, no..2, pp..443-445, April 1985. [Lim,83] J.S.Lim (editor): Speech Enhancement. Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632, 1983. [Moore, 83] B.C.J.Moore and B.R.Glasberg: Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J.Acoust.Soc.Am., vol.74, no.3, pp.750-753, September 1983. [Tsoukalas,93] D.Tsoukalas, M.Paraskevas, J.Mourjopoulos: Speech enhancement using psychoacoustic criteria. Proc. IEEE, ICASSP-1993, vol. II, pp. II-359--II-362. [Vaseghi,92] S.V.Vaseghi and R.Frayling-Cork: Restoration of old gramophone recordings. Journal AES, vol.40, no.10, pp.791-801, October 1992. [Vaseghi,96] S.V.Vaseghi: Advanced Signal Processing and Digital Noise Reduction. Wiley & Sons Ltd. and B.G. Teubner, 1996. [Zwicker,90] E.Zwicker, H.Fastl. Psychoacoustics: Facts and Models. Springerverlag, 1990.