Page  00000453 FO ESTIMATION OF INHARMONIC PIANO TONES USING PARTIAL FREQUENCIES DEVIATION METHOD Jukka Rauhala and Vesa Vdlimliki Laboratory of Acoustics and audio signal processing Helsinki University of Technology jukka.rauhala@acoustics.hut.fi, vesa.valimaki@tkk.fi ABSTRACT This paper proposes a new method for F0 estimation of inharmonic piano tones. Piano tones are significantly inharmonic and, hence, F0 estimation using conventional methods may lead to biased results. In the proposed method, which is based on the partial frequencies deviation method, the estimation is done in three stages. The first stage prevents octave jumps and the second stage gives a rough F0 estimate. In the last stage, the partial frequencies deviation method is used for determining an accurate F0 estimation. The results show that the method is efficient and produces accurate results. 1. INTRODUCTION Piano tones are highly inharmonic due to dispersion phenomenon [1]. This leads to problems in F0 estimation. For instance, autocorrelation based methods tend to produce F0 estimations that are higher than the accurate values due to inharmonicity. In addition, the piano has a very rich spectrum with lots of spectral components, which make the F0 estimation task challenging. Also, F0 might not be the spectral peak with the highest magnitude. A lot of work has been done in the area of pitch detection. Reviews have been done by, for example, de Cheveign6 and Kawahara [2] and G6mez et al. [3]. Most of the algorithms have been dealing with human voice, such as YIN [4], which can be considered as one of the best algorithms for pitch estimation using harmonic signals [2]. In order to improve the F0 estimation of inharmonic tones, it is important to estimate the inharmonicity coefficient B as well. The relation between B and partial frequencies is defined as [1] fk=fok 1~Bk2(1) where k is partial index and fo is a nominal fundamental frequency of an ideal, non-dispersive string. Galembo and Askenfelt [5] have combined the F0 estimation and B estimation in application to analysis of piano tones with good results. In this work, a new method based on the recently introduced partial frequencies deviation (PFD) technique [6], which was originally designed for B estimation, is proposed for F0 estimation. In PFD, a deviation curve is evaluated indicating possible biases in B and Fo estimates. By analyzing the trend of Input signall Octave estimation Rough FO estimation Preliminary FO estimation I FO estimation PFD Final FO estimation Figure 1. Block diagram of the proposed method. the curve, B and F0 estimates can be modified in an iterative fashion. The proposed method consists of three stages. The first stage deals with the octave detection, while the second stage determines a rough estimate of F0. In the third stage, PFD is used in an iteration loop to improve the F0 estimation, while estimating B value. 2. PROPOSED METHOD In this section, the three stages of the proposed method are introduced. A general level block diagram of the method is shown in Figure 1. 2.1. Octave estimation The main purpose for the first stage is to prevent octave jumps, which is a common problem with F0 estimation algorithms. If there is an octave jump in the estimation, the resulting estimate will be significantly biased. Hence, it is important to take it into account and to try to prevent it. The first stage is separated into four blocks, as seen in Figure 2. First, a 2'1 point FFT is determined from the source signal (1 second of the signal is used in the estimation process). In the second phase, all significant spectral peaks in the signal are determined recursively. The recursive function looks for the highest amplitude within the range given as an argument. Then, it will divide the range into two ranges separated at the frequency of the highest amplitude. This is continued until the range width is less than 25 Hz. In the end, the recursive function returns all significant spectral peaks. 453

Page  00000454 InutFind 20 C Ipt FFT significant -+ Diff peak 0 Histogram -oRough EQ o 60 -signalpek indices estimation a) peaks-0 40 -Figure 2. Block diagram of the octave estimation 2z stage. 0 200 400 600 800 1000 1200 1400 Frequency (Hz) Out of these peaks, 20 peaks with highest magnitudes are selected. Next, the differences of consecutive peak indices are computed using the selected 20 peaks. Then, a histogram is calculated from the differences with Matlab' s hist function [7]. The differences are separated into 20 equally-spaced bins. The bin with largest value is selected out of bins with center point between 20 Hz and 200 Hz (the target range for this method is 20-200 Hz, because the reference method [5] is targeted to this range as well). Finally, the F0 estimation of the first stage is defined to be the center frequency of this bin. The main parts of the octave estimation stage are illustrated in Figure 3. 2.2. Preliminary F0 estimation In the next stage, a preliminary F0 estimation is determined. The purpose for this stage is to produce an estimate for the PFD method, as it needs to know, where the correct F0 lies approximately. A block diagram of this stage is shown in Figure 4. First, a 220 point FFT is calculated using zero-padding technique and Blackman window from the same source signal as in the first stage. Then, the peak with the highest amplitude within a specific range is determined. The range is specified as ~ major third from the rough F0 estimation obtained from the first stage. Finally, a three-point interpolation is used for refining the estimate. 2.3. F0 estimation refinement with PFD The last stage produces the final F0 estimation using the PFD algorithm [6]. The PFD curve is calculated as a deviation between the estimated partial frequencies computed with Eq. (1) using the F0 and B estimates, and frequencies of highest spectral peaks within narrow windows centered at estimated partial frequencies. It is a powerful tool for evaluating F0 and B estimates. This is illustrated in Figure 5, where PFD curves are shown in eight cases with correct and biased F0 and B estimates. Basically, positively biased B and F0 estimates lead into decreasing PFD curves, while negatively biased estimates produce increasing curves. When refining B estimate, PFD tries to minimize the sum of derivative signs computed from the curve [6]. If the Fo estimate is biased, then the final PFD curve after the B refinement will resemble either Figure 5(c) or Figure 5(g), depending on the sign of Fo estimate bias. Hence, convexity in the curve indicates a negative bias in the Fo estimation and concavity indicates a positive bias in the Fo estimation. Thus, the form of the PFD curve can be used for refining the Fo estimate. 1 00 -0 I I??I, TTTTT T f I, Ia, 200 400 600 800 1000 Frequency (Hz) 10 10 1200 1400 a) z 20 E cz 10 -0 2 I 0 20 40 I I I I I I 160 180 60 80 100 120 140 Frequency (Hz) Figure 3. An example of the octave estimation process. In the top figure, the spectrum of the signal (Fo=28.9 Hz) is shown with the peaks selected using the recursive function (thick line). The selected 20 peaks are marked with circles. The middle figure shows the calculated differences for each partial, and the bottom figure shows the final histogram computed using the differences. The highest peak in the histogram indicates the selected value (33.5 Hz). FO estimate from the first stage Input Find highest Inter- Preliminary signal FFT amplitude polation esiFO estimation Figure 4. Block diagram of the Preliminary Fo estimation stage. The PFD method consists of four blocks as seen in Figure 6. First, a 216-point FFT is computed using zeropadding technique and Blackman window. Then, the signal spectrum is divided into subbands and 10 peaks with highest amplitudes are selected from each subband reducing the number of spectral peaks. The width of subbands is defined to be 5fp, where fp is the preliminary Fo estimation value. Hence, approximately two peaks per partial are selected across the meaningful bandwidth. After that, the core PFD algorithm shown in Figure 6 is performed to refine first the B estimate and then the Fo estimate. By repeating B estimation and Fo estimation, the Fo estimate is improved. In this work, both B estimation and Fo estimation are done twice. In the PFD Fo estimation process, convexity or concavity is detected from the PFD curve by calculating the average of first half of deviation values. Then, positive average is determined as convexity and vice versa. At each iteration loop, the Fo estimate fo is modified by fo = fo (1~p), where p is an adaptive step size. The sign of p is positive if the curve is detected to be convex and negative if it is detected to be concave. The initial value of p is set to 0.005. At each point, where the sign of p changes, p will be updated by 454

Page  00000455 I I C ca LI C ca I Q 20 0 -- -20 -400 0 40 20 0 - -20 -400 -0 40 (g 20 -20 40 - BFO- 0- (e)B FO N" 0 20 3 S-20 ------------ _ l -------- I a Partial index Partial index 40 ) B+FO- 0 (h) B+FO N S-20 10 20 30 0 10 20 30 Partial index Partial index 40 (f)B FO+ 20 0 -...... -20 -40 0 10 20 30 Partial index 40 (i) B+ FO+ 20 -40 0 10 20 30 Partial index Figure 5. Examples on how the PFD curve behaves in different situations. In these examples, B estimate gets values 0.0001 (accurate, denoted as B), 0.00005 (denoted as B-), and 0.00015 (denoted as B+). Similarly, F0 estimate gets values 100 Hz (accurate, denoted as Fo), 99 Hz (denoted as F0-), and 101 Hz (denoted as F0+). dividing it by 2. Finally, the iteration process is stopped, when it has been run over 100 times, or Ijul < 10-5. As the PFD method is independent of the underlying Fo estimation method, it can be used with any Fo estimation algorithm to refine the Fo estimate. Moreover, the PFD method produces a B estimate as a side product of the Fo estimation [6]. Additionally, the final PFD curve offers a good error indicator, as deviation close to zero indicates successful estimation. On the other hand, if the deviation curve is dispersed, it suggests that the results are not reliable. 3. RESULTS In order to evaluate the proposed method, it was tested against two reference methods: YIN [4] and inharmonic comb filter (ICF) method [5]. YIN is a method that combines autocorrelation and average magnitude difference function techniques. The main idea is to normalize the autocorrelation function, which reduces the number of free parameters. In these test cases, a window size of 186 ms was used and the threshold parameter was set to 0.1. Moreover, hop size between consecutive windows was 16 samples. Since the YIN does not specify how the Fo is detected from a long signal, such as a piano tone, the estimation was done for a piece of the signal with length of 1 second. Then, the YIN was performed at each window, and a histogram with 20 bins was computed from the values. The bin with the maximum value was searched for, and the final Fo estimate was determined to be the average of Fo estimates within the range from center frequency of previous bin to center frequency of next bin. Galembo and Askenfelt proposed the ICF method for B and Fo estimation [5]. It was designed specifically for piano tones. In the ICF, the estimation is done in three stages by using a multiband filter with bands centered at FO estimate from the second st-age Iteration-- Select Input FFT prominent PFD B PFD FO Final FO signal spectral estimation estimation estimation peaks Iteration Selctaed Calculate te to the Modify Check stop Improved spectral PFD curve trendofthe estimation conditions estimation peaks curve Figure 6. Block diagrams of (top) the PFD stage and (bottom) the core PFD estimation process. the estimated partial frequencies. First, a preliminary Fo estimate is obtained by using a predefined B value. Then in the second stage, Fo and B are varied within a specific range and power spectrum using the multiband filter is determined for each case. The more precise B and Fo estimates are determined based on the power spectrum maximum value. In the third stage, the B and Fo ranges are narrowed down and the final estimates are determined. The three methods were tested in two test cases. The first test case used processed recorded tones, where the determined partials had been filtered out with narrow bandstop filters. Then, the partials were reproduced with additive synthesis and the produced signal was mixed with the filtered residual signal. Hence, the exact partial frequencies in the test tones were known and the tones were very similar to the real tones. In the second test case, recorded piano tones were used (Steinway grand piano samples from University of Iowa, http://theremin.music.uiowa.edu). The correct Fo values were estimated manually by varying Fo and B and searching for values that would have partial frequencies close to the peaks present in the spectrum of the tone. Since ICF is designed for Fo values ranging from 20 Hz to 200 Hz, the tests were limited to keys 1 -35, and the methods assumed that the correct Fo was located within the aforementioned range. The results from the first test case with synthetic tones are shown in Figure 7 and the calculated error values and running times for each method are presented in Table 1. The results show that the proposed method produced best results. Moreover, this method runs significantly faster, as shown by the running times presented in Table 1. However, it has to be taken into account in comparing the running times that the YIN estimates the Fo in multiple short windows for each tone whereas the other two methods use a single window in the estimation. 455

Page  00000456 0 5 10 15 20 25 30 35 S0 5 10 15 20 25 30 35 Key index 2 00-0 W100 o 0 00 00 0 0 5 10 15 20 25 30 35 Key index I200 -I 100 -U- 0 1~ 0 5 10 15 20 25 30 35 Key index S200 -(3 100- 000 0 5 10 15 20 25 30 35 Key index 200 -- 1 0 0 '-- 0 5 10 15 20 25 30 35 Key index Figure 8. Results from the second test case using recorded test tones with (top) YIN, (middle) with ICF, and (bottom) with the proposed method using PFD. The solid lines indicate the manually estimated values. 0 5 10 15 20 25 30 35 Key index Figure 7. Results from the first F0 estimation test case using synthetic test tones with (top) YIN, (middle) with ICF, and (bottom) with the proposed method using PFD. The solid lines indicate the correct values. Method Average error Average Running (RMS) error (%) time (s) YIN 7.21 3.68 972 ICF 56.72 73.48 1029 PFD 0.14 0.13 68 Table 1. Average error values in RMS and percentage scales and total running times for the first test case using synthetic test tones. Figure 8 shows the results from the second test case using recorded tones. Similarly, Table 2 shows the calculated error values and running times for each method. Again, the results suggest that the proposed method is the most efficient and most accurate out of the three methods. 4. CONCLUSIONS In this paper, a new method for Fo estimation of inharmonic piano tones is proposed. The method uses the PFD technique to refine the final F0 estimate. The results show that the proposed method produces accurate results in an efficient way. Moreover, the method can estimate the inharmonicity of the tone simultaneously. Matlab code of the method is available at http://www.acoustics.hut.fi/go/icmc07-pfd/. 5. ACKNOWLEDGMENTS The author JR was supported by the Nokia Foundation. Moreover, the authors would like to thank Mr. Hannu Pulakka for the YIN implementation. 6. REFERENCES [1] Fletcher, H., E. D. Blackham, and R. Stratton, "Quality of piano tones," J. Acoust. Soc. Am., vol. 34, no. 6, pp. 749-761, 1962. Method YIN ICF PFD Average (RMS) 6.32 26.47 0.23 error Average error (%) 3.45 11.65 0.30 E Running time (s) 1105 1372 70 Table 2. Average error values in RMS and percentage scales and total running times for the second test case using recorded test tones. [2] de Cheveigne, A. and H. Kawahara, "Comparative evaluation of FO estimation algorithms," in Proc. Eurospeech, Copenhagen, Denmark, 2001. [3] G6mez, E., A. Klapuri, and B. Meudic, "Melody Description and Extraction in the Context of Music Content Processing," Journal of New Music Research, vol. 32, no. 1, 2003. [4] de Cheveigne, A. and H. Kawahara, "YIN, a fundamental frequency estimator for speech and music," J. Acoust. Soc. Am., vol. 111, no. 4, 2002, pp. 1917-1930. [5] Galembo, A. S. and A. Askenfelt, "Signal representation and estimation of spectral parameters by inharmonic comb filters with application to the piano," IEEE Trans. Speech and Audio Processing, vol. 7, no. 2, pp. 197 -203, 1999. [6] Rauhala, J., H.-M. Lehtonen, and V. Valimiki, "Fast automatic inharmonicity estimation algorithm," J. Acoust. Soc. Am., vol. 121, no. 5, 2007, pp. EL184-EL189. [7] Matlab version 7.1, Mathworks Inc. 1984 -2005. 456