Page  00000001 IMPROVING THE EXTENDED KALMAN FILTER METHOD FOR THE RESTORATION OF ELECTRO-ACOUSTIC MUSIC1 Andrea Bari, Sergio Canazza, Giovanni De Poli, Gian Antonio Mian DEI - University of Padova - Via Gradenigo 6/a - 35100 Pd - Italy, {canazza, depoli, mian } Abstract In this paper we present some results on audio restoration obtained with an algorithm that, in principle, solves the problems of broadband noise filtering, signal parameters tracking and impulsive noise removal by using the Extended Kalman Filter (EKF) theory. We show that, to achieve maximum performance, it is essential to optimize the EKF implementation. For this purpose, to cope with the non stationarity of the audio signal, we used two properly combined EKF filters (forward and backward), and introduced a bootstrapping procedure for model tracking. The careful combination of the proposed techniques and an accurate choice of some critical parameters, allows one to improve the performance of the EKF algorithm. Different audio examples validate the effectiveness of the presented procedure. 1. INTRODUCTION The introduction of high quality digital media, combined with an increasing awareness of the historical importance of "audio heritage", has led to a growing requirement for the preservation and restoration of old recordings. Little attention is currently paid to "modern" electro-acoustic music restoration. On the other hand, since the tapes are exposed to deterioration (often the performances are recorded on semi-professional media and preserved in a precarious state) the preservation/restoration problem is significant. The main issue is to rely on good signal models to perform a high quality separation of signal from noise (to be attenuated). Do to the fact that existent devices and developed methodologies are oriented to classical and popular music restoration, few results are available in the field of electro-acoustic music. The types of degradation common in electro-acoustic tapes can be broadly classified into localized and global degradations [D. Schueller. (1991)]. The former are finite duration defects which occur at random in the waveform and include clicks, scratches, clipping, etc. The latter affect all the audio recording and include background noise (perceived as "hiss"), wow, flutter and some types of linear and nonlinear distortion. In this context, we consider the problem of the reduction of impulsive and background noise from audio signals. This task is usually carried out using different methods for detection/restoration of impulsive noise and for broadband noise reduction. In this work we employ an algorithm whose objective is, in principle, to simultaneously solve the problems of filtering/parameter tracking/elimination of the outliers ("clicks") by using the Extended Kalman Filter theory (EKF), as proposed by [M. Niedzwiecki (1989), (1997), and M. Niedzwiecki and K. Cisowski (1996)]. In particular the algorithm in [M. Niedzwiecki. (1997)] can be interpreted as the nonlinear combination of two Kalman filters: the first is used to follow the slow variations of the signal time-varying AR model parameters, while the second takes part in the reduction of background and impulsive noise. At medium/high signal-to-noise (SNR) ratios, the performance of such filter is superior to that of other standard methods like Spectral Attenuation, nevertheless its plain use does not guarantee the best results. One reason for this is that the nonstationarity of audio signals leads to errors in parameter and tracking noise filtering, especially during fast transients. In order to achieve maximum performance from the EKF, it is essential to optimize its implementation. For this purpose, to cope with the non stationary nature of the audio signal, we used two properly combined EKF filters (forward and backward), and introduced a bootstrapping procedure for model tracking. The careful combination of the proposed techniques and an accurate choice of some critical parameters, allows one to improve the performance of the EKF algorithm. 2. PROBLEM STATEMENT Let the audio signal s(t), t = 1, 2,..., be modelled by a p order time varying autoregressive (AR) model s(t + 1) = a (t)s(t-i+ 1)+e(t) (1) driven by the gaussian zero-mean white noise sequence e(t) with variance a. The time evolution of the time varying coefficients ai(t) is modelled by the random walk model ai(t+1)=ai(t) + wi(t) (i = 1,.., p) (2) with wi(t) zero mean gaussian white processes of variance omutually uncorrelated, i.e., E[wi(t) wj(t)]=0 for isj, and independent of e(t). Moreover, let us assume that the original signal s(t) is corrupted by a mixture of a broadband noise z(t) and impulsive noise v(t) (independent of e(t) and wi(t)), so that the available signal y(t) can be written as y(t)=s(t)+z(t)+v(t) (3) The noise z(t) is assumed gaussian zero-mean white noise (see later for relaxing this hypothesis) of variance cr, while v(t) is assumed gaussian zero-mean noise with ar = oo if a click is present, or &r = 0, otherwise. As a consequence, if a click is revealed at time t, the corresponding sample y(t) must be discarded since it not bears information on s(t) and s(t) must be recovered from {..., y(t-1), y(t+),...}. In [M. Niedzwiecki and K. Cisowski. (1996)] it is shown that under the hypothesis made, the problem of recovering the signal s(t) based on the noisy measurements Y(t)= {y(t), y(t-1)..., 1Work supported by CNR - Progetto Finalizzato Beni Culturali, under contract CT9700629PF36

Page  00000002 y(1) } can be optimally handled by the Extended Kalman Filter (EKF). To this purpose it is convenient to represent signal s(t) in (1) in the non-minimal state-space form cov[u(t)] b bb 0 eoL I, (7) sq(t+l)=Aq[ap(t)]Sq(t)+bqe(t) (4) 0 -with E = -. The EKF equations become for the prediction 20 where Sq(t)=[s(t),..., s(t-p),..., s(t-q+l)]T, q 2 p, is the signal vector, a' (t)=[ai(t),..., ap(t)]T is the vector of the AR model coefficients, b (t)=[1, 0,..., 0], and Aq(t) is the companion matrix associated with the extended parameter vector a (t) = [a (t), OT_ ]. The provision of a non-minimal statespace description: q > p will allow one for two-sided reconstruction of up to q-p samples corrupted by impulsive noise. Notice that to remove noise an accurate signal model is needed and to obtain a reliable signal model the signal should be noiseless. The problems of filtering and parameter tracking are strictly tied and are to be jointly solved. The solution to their combined treatment is obtained by combining the unknown AR model coefficients and the signal vector in a p + q "state vector" x (t) = [ST (t),a (t)] and by rewriting (1-3) as step: x (tlt-1) = f [x(t -lt -1) Y(t t -1) = F(t -1)l(t -11t -I)FF' ( t -1)+ Q and for the update step: tit )= -tt - 1)+ L(r() ltlt)= (I -4X--L(tl) tt (8) (9) where Z(tIt) = E[(x(t) - (tlt))(x(t) - (t It)) ] is the state estimation error covariance and ~(tt-l)=E[(x(t)-x(tt-1))(x(t)-x(tt-1))] is the state prediction error covariance. Moreover in (9): s(t) = y(t)-c_;(t It _1) = y(t)-s(t It-1) Sx(t+)= f [x(t)] +u(t) y (t) = c'x(t)+ (t) is the (5) prediction error (Kalman filter innovation) and L(t) is the Kalman gain, whose value depends from the click indicator function d (t) where A b) b e (t) [x(t)1 0 I- x(t), u(t) = w(t) with w (t)=[w(t),...,w (t)] and (t)= z(t)+v(t), cC =[bT,0T]=[1,0,...,0]. The problem of estimating the model parameters ap(t) and the noisefree signal s(t) is reduced to a nonlinear filtering problem in the state space. A (suboptimal) solution to the problem can be based on the theory of extended Kalman filter (EKF) [M. Niedzwiecki and K. Cisowski. (1996), M. Niedzwiecki. (1997)] and is obtained linearizing (5). Let us denote with x(t t) the estimate of the state at time t from the measurements y(r): r<t and with x (tIt -1) the state prediction at time t from the measurements y(r): r<t-1. Let F(t) denote the state transition matrix of the linearized system 1 (tIt -1)c L(t) = c'(tIt -I)c +k(t) o0 if d(t)=0 if d(t)~0 and k(t)= - The EKF can be started with the values.x(00) = 0, Z(00)= 0 0 01P (10) F f [ x] F(t) - x= (l|), A ( L 0 t t) s (t It) 0 q-lxp Ip (6) with 6 a large positive constant (-100) to account that nothing is known in advance about a(0). The corresponding algorithm has a complexity O((p+q)2). In [M. Niedzwiecki. (1997)] a clever reduced complexity split EKF algorithm is presented that was used in the actual experiments referred to in the following. It can be noticed that it is not difficult to drop the hypothesis of a white noise z(t). In case of colored noise z(t) it suffices to model it as an AR process and to increase the state dimension accordingly [B.D.O. Anderson, J.B. Moore, (1979)]. Such a provision was found quite effective for the noise reduction of some old vinyl and tape records. 2.1 Click detection The detection of clicks is based on the value assumed at each t by the prediction error 0 if Ie(t)> i, (t) d (t) = (11) where x (tlt) = [ (tlt), a (tit)] is the filtered state trajectory given by the EKF algorithm, Aq (t t)= A [a (tIt)] and s (t t) is the vector made up with the first p components of s (t t). Moreover, let:

Page  00000003 In (11) a (t)= r 0(t)0- (t -1) with r (t) = CT (tIt - 1)c + k (t), is the estimated innovation variance, |t is the parameter determining the threshold for impulsive noise detection (in practice p = 4-6) and a2 (t) is the local estimate of the model input noise e(t) variance a (t-1l)+(1-_) W if d(t)=0 e(t= (t) (12) 2 (t-1) if d (t)0 In (12) 0<,A < 1 determines the adaptation speed. In actual experimentation we used 2 = 0.97. 2.2 Smoothing and reconstruction It can be noticed that s(tIt) = [(tlt),...,s (t- q +l1t)], represents the optimal (mean square) smoothed estimate of s(t),..., s(t-q+l) given Y(t), i.e., all the measurements available up to time t. To make full use of the available information, it is convenient to use, at time t, s^(t - q + lt) as an estimate of s(t-q+l), i.e., it is convenient to introduce a delay of q samples. As a result, for signal smoothing it is enough to use q = p. In presence of clicks it can be shown that, for a p order AR process, a block consisting of at least p "good" future successive samples is needed for good reconstruction [6]: for a group of n successive samples corrupted by a click, a value q 2 p + n is required. This consideration can be exploited to derive a variable state space algorithm [M. Niedzwiecki. (1997)], which usually uses q = p and, in presence of clicks, increases q until the filter innovation corresponding to the "corrected" signal becomes "white" noise or q does not reach a predetermined threshold. Thus the length of the replaced signal is incremented until this condition is true. During the interpolation step the dimension of the state space is temporarily increased, in order to allow for better estimation, and both past and "future" measurements are employed, so as to carry out a local "forward-backward" interpolation. Such a provision offers a significant computational reduction over the use of a fixed large q value without performance decrease. 3. IMPLEMENTATION TUNING To achieve maximum performance from the EKF, it is essential to optimize its implementation. To this purpose, let us first notice that starting the algorithm from scratch, i.e., with the initial conditions given by (10), implies an initial transient of the parameter tracker during which the EKF noise reduction capabilities are greatly reduced. To solve the problem we found it useful to introduce a bootstrap procedure: the first, say, 100 ms of the signal are time-reversed and fed on the filter. This way, parameters for a proper initialization of the model are estimated and restoration of the "true" signal will use such values as the initial conditions. The estimated AR-model is not guaranteed to be stable at all times but, in practice, instability is a quite rare event. Since, however, it may severely interfere with proper click detection, we preferred to test stability at each t via the Schur algorithm (the computational overload is negligible). In the case of an unstable AR-model the parameter update is skipped. The nonstationary nature of the audio signals poses a challenging problem to EKF: it has both to filter the noisy components and to continuously track the signal model parameters. As a consequence of signal nonstationarity, errors in parameter tracking are small in high SNR regions but can severely affect signal filtering during fast transients and in regions with a low SNR. A provision that improves the algorithm performance, is given by the use of two properly combined EKF's operating forward and backward on the signal. The comparison of the prediction error variances (-+ (t) and 2- (t)), shows that the two variances are not equal. Moreover, there is a trend in their behavior: on the average the forward variance & (t) is higher than the backward one a2 (t) at the beginning and lower at the end, i.e., tracking improves as time goes on at least on a small time scale (few seconds). Moreover a finer scrutiny shows that, quite often, during attacks the backward filter has a lower innovation variance. This fact can be justified noticing that such filter traverses such transitions going from a high SNR region, where parameter tracking and noise filtering are less critical, to a low SNR region, where the contrary happens. A similar behavior is exhibited by the residual errors s (t) - ^ (t) and s (t) - S (t). Since the two filters give different signal estimates, ^ (t) and S (t), on the lack of a better procedure we found it effective to combine them according to s (t) +-2 (t) (,) +a- (t)(13) S (t) + / 2 (t) Notice that the performance of the procedure improves, if the procedure is started from an high SNR region instead than from the beginning and the end of the chosen excerpt: details are given in [A. Bari, (1999)]. With such a provision it is possible, in the authors experience, to effectively remove broadband noise for (average) SNR 2 30 dB. In addition, since we have two different click detectors, the effectiveness in removing impulsive disturbances is also improved. 4. EXPERIMENTAL RESULTS To evaluate the performance of the EKF algorithm it is necessary to identify the noise on the input recording. As a preliminary step, different noise free recordings were intentionally corrupted with different scratches and broadband noises. The noise free recordings were selected such as to cover a wide range of musical genre (Electronic, Jazz, Western Classical, Pop); the stimuli were played with different instruments (piano solo, orchestra, voice, and so forth) in order to analyze the effectiveness of the EKF related to the different physical characteristics of audio signal (bandwidth, transients, timbre). This helped us to gain some insight into the method and confidence on the choice of parameters <, p, p, and cry, which determine the ultimate performance of the algorithm. In addition, the starting value of o2 has to be estimated. We found it useful to estimate 2 (0) via Youle-Walker equations (assuming a

Page  00000004 local time-invariant AR model and solving for it) and then update it according to eqn. (12). The parameter = I/ a should be chosen in accordance with the degree of nonstationarity of the signal at hand. We found a constant value E = 10, as in [M. Niedzwiecki and K. Cisowski. (1996)], adequate in most examples. As for parameter u, a small u value (say 4) allows one to detect small clicks but introduces many false alarms. This gives rise to the substitution of many samples that would be better dealt with by the EKF smoother. As a rule of thumb, we found it preferable to use a high ýi value (i.e., u = 4-6) and to iterate the de-clicking process decreasing the values of u, after careful listening to the filtered signal s (t) and to the difference signal y (t) - s(t). To evaluate the white noise reduction performance, the amount of "white" noise added to the "clean" recordings was varied. The SNRo of the output signal produced by the smoothing algorithm was measured and related to the SNRi input signal. It was found that equation SNRo m SNRi + q (dB) well represents the measured values for 20 ~ SNRi < 40 dB. The m and q values depend on the particular piece: representative values for the typical value p = 12 are m / 0.8 and q 12 that provide an average SNR gain of about 6 dB for SNRi 30 dB, the minimum average SNRi value at which we found the algorithm to give fine results. Another index used in order to evaluate the performance of the algorithm is the Spectral Distance (SpD) measure, related to the harmonic components of the signals: the white noise power density equals that of the clean recording. In addition, differently from what would be obtained by a simple low-pass filter or by spectral subtraction, the restored version "follows" the original spectrum even for higher frequencies. This property results to be perceptively important and appreciated by experienced listeners. Notturno: original -30 -40 so ag -50 S-60 2 -70 so " -80 S-90 -100 -- - - - - - - - - - - - - - - - - - - - - - - - I - - I - -. -1101 1 1 1 1 1 1 1 1 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Frequency 0.8 0.9 SpD= - 2 log S(o) 2 log 10 S (0)) 2z (dB); where Si(O) and S,(0) are the (Welch) periodograms of input and output signals respectively. The experience gained led us to the following protocol. The recording is segmented into frames up to, say, 10 s long and each frame is restored according to a three step procedure. At each step the algorithm is tuned to obtained a particular restoration effect. To this purpose, it is worth noticing a nice property of the EKF: it can be forced to perform only click detection (setting o2 = 0) or only noise filtering (setting a very high p value). In addition, in noise filtering it is possible to modulate the "true" c2 value in order to make the filter more effective against noise in medium/low SNR regions. The three step procedure used was: 1. declicking, to remove the most significant clicks: s(t) = ( (t)(+ (t))/2; 2. noise only removal: (t) is obtained according to (13); 3. declicking, to remove the residual small clicks: s(t)= ( (t)+o (t))/2; where step 3) can be skipped. The protocol was exercised on electronic-music recordings supplied by the RAI. As an example, let us consider the restoration of an excerpt taken from B. Maderna, Notturno. The piece was restored according to the parameters describe above. Fig. 1 shows the comparison among the (Welch) periodograms of the original (noisy) and restored recordings (limited to 0 - 1 = Fs/2 = 22 kHz). From the figure it can be appreciated that the restored version spectrum follows that of the original up to frequencies at which Notturno: restored by EKF 2 -70 -80 60 - - - - - - - - - - - - - - - - - - - - - - - 0 -0 -0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frequency Fig. 1: Comparison between the (Welch) periodograms of the original and restored recordings (limited to 0 - 1 = Fs/2 = 22 kHz). 5. REFERENCES B.D.O. Anderson, J.B. Moore. (1979). Optimal filtering, Prentice-Hall. A. Bari. (1999). "Performance analysis of the Extended Kalman Filter in audio restoration field". Tesi di Laurea, Dept. of Electronics and Informatics, University of Padova (in Italian). S. Godsill, P. Rayner, 0. Cappe. (1998). "Digital audio restoration". M. Kahrs and K. Brandenburg (Eds), Applications of digital signal processing to audio and acoustics, Kluwer. M. Niedzwiecki. (1989). "Steady-state and parameter tracking proprieties of self-tuning minimun variance regulators", Automatica, pp. 597-602. M. Niedzwiecki and K. Cisowski. (1996). "Adaptive scheme for elimination of broadband noise and impulsive disturbances from AR and ARMA signals", IEEE Trans. Signal Processing, vol. 44, pp. 967-982. M. Niedzwiecki. (1997) "Identification of time-varying processes in the presence of measurement noise and outliers". Proc. 11th IFAC Symp. System Identification, pp. 1765-1768, Tokyo. D. Schueller. (1991). "The ethics of preservation, restoration and re-issues of historical sound", J. Audio Eng. Soc., vol. 39, pp. 1014-1016.