An alternative method to prevent this processing distortion in spectral subtraction is to apply over-subtraction
[Berouti,83] to the degraded signal spectrum. According to the estimated SNR, more than the average noise spectrum
has to be subtracted. This leads to a strong attenuation of small signal components. If the factor of over-subtraction is
high enough, the musical noise will be completely eliminated, but audible distortions in the audio signal can be
generated.
The noise suppression rule proposed by Ephraim and Mallah ([Ephraim,84],[Ephraim,85]) allows significant noise
reduction without causing musical noise. This is mainly due to using a non-linear time-averaged SNR [Cappe,94],
which exhibits a lower variance than a SNR estimated without averaging. Furthermore, the masking properties of the
human auditory system can be used to reduce processing distortions (e.g. [Tsoukalas,93]).
The proposed method combines spectral over-subtraction with several smoothing strategies in both in time and in
frequency domain to reduce the SNR variance. This leads to little audible processing distortions. Section 2 discusses
the methods used within the denoising scheme. In section 3, the new noise reduction filter is described.
2 Review of Denoising Filter Methods
2.1 Spectral Subtraction
Berouti et al., in [Berouti,83] present a variant of the spectral subtraction method for speech enhancement. An
overestimate of the noise magnitude or power spectrum is subtracted from the degraded signal spectrum. In each
short-time frame m, the amount of over-subtraction depends on the SNR of the incoming degraded signal. A
subtraction factor a(m)> 1 is determined in order to reach good noise reduction and little processing distortions. If
the resulting denoised signal spectrum IS(m,k)l falls below a minimum level, it is replaced by a scaled version of the
input spectrum plX(m,k)l. The scaling parameter P determines the residual noise floor after re-synthesis. The
denoising algorithm is expressed through the following relationship:
IS(m,k) = ( IX(m,k)Ib - a(m) ID(k)lb )1b 1
and
IS(m,k)- IS(m,k)l, for IS(m,k)l > lX(m,k)l, 2
PIX(m,k)l, else.
p is set to 0 < p << 1. The remaining noise floor can be used to mask the generated musical noise. Thus the
setting of the two parameters a and p determines the tradeoff between the amount of residual broadband noise
and the level of perceived musical noise. For a fixed value of P, increasing the value of a reduces both the
broadband noise and the musical noise. Increasing a above a certain limit leads to audible distortions because of the
strong attenuation of small signal components.
We adopt this denoising algorithm but use a subtraction factor C(m,k), which is calculated for each frequency bin