Page  284 ï~~A Least-Square Algorithm for Fundamental Frequency Estimation Andrew Choi Department of Computer Science University of Hong Kong Pokfulam Road, Hong Kong E-mail: ABSTRACT: A new fundamental frequency estimation algorithm which combines the reliability of frequency-domain algorithms and the high resolution of time-domain algorithms is introduced. Unlike existing frequency-domain algorithms, this algorithm does not require the application of a window function to the signal during its spectral analysis step, and therefore has the additional advantage of being able to correctly analyze relatively shorter signal segments. This property is important in the application of the algorithm to pitch-to-MIDI converters, for which response time is the principal performance measure. This paper describes this new fundamental frequency estimation algorithm and experimental results that demonstrate its effectiveness. 1 Introduction A pseudo-periodic sound produced by a musical instrument is composed chiefly of harmonic components whose frequencies are integer multiples of a fundamental frequency. The problem of fundamental frequency estimation (FFE) is central to the automatic analysis of musical signals. Pitchto-MIDI converters are devices through which conventional musical instruments can be attached to digital synthesizers as controllers. In the application of FFE algorithms to pitch-to-MIDI converters, the signal must be analyzed and its fundamental frequency must be estimated in real time. This paper focuses on FFE for monophonic signals, which are made naturally by monophonic musical instruments, or when separate transducers are used to collect sounds from the independent sound-producing elements of an instrument. The response time of an FFE algorithm is the delay between the instant a note is played on the instrument and the instant the synthesizer begins to generate the corresponding sound. Assuming that the delay caused by the synthesizer is negligible, the response time is equal to the sum of the length of the initial segment of the signal analyzed and the computation time of the FFE algorithm. These two components must be made as short as possible, so that the player of the instrument does not perceive the delay. Other important performance measures of an FFE algorithm include the resolution at which it can distinguish neighboring frequencies and its accuracy. Most existing frequency-domain FFE algorithms (Amuedo, J.; Doval, B. and Rodet, X. 1991a, 1991b; Brown, J.) generate estimates reliably when the signal segment analyzed is sufficiently long. These algorithms operate by performing spectral analysis on the signal by segment and applying a pattern matching technique to the spectrum to determine each segment's fundamental frequency. Typically, a constant- Q transform or a Fourier transform is used for the former step. Since such a transform actually computes the spectrum of the periodic extension of a signal segment, a window function must first be applied to eliminate spectral leakage due to discontinuity at the ends of the observation interval (Harris, F.J.). Applying a window function to a signal segment that already contains very few samples removes much information that is useful in determining its spectrum, especially at low frequencies. Parametric spectral analysis techniques (Marple, S.L.) do not require the application of a window function and are capable of generating spectrums accurately from short signal segments. Their used in real-time FFE is considered elsewhere (Choi, A. 1995a). The FFE algorithm introduced in this paper is similar to these methods in the sense that it does not require the application of any window function as a preprocessing step. The effect of the lengths of signal segments on the accuracy of FFE algorithms is studied in (Choi, A. 1994). A dynamic programming algorithm is introduced for matching harmonics to peaks in the constant- Q transform of the signal. The result is an FFE algorithm that correctly handles 284 4IC MC PROCE E D I N G S 1995

Page  285 ï~~shorter signal segments than existing algorithms. However, since it also uses the constant- Q transform, it must apply a window function to the signal segments and does not exploit all the information contained in them. A new spectral analysis algorithm based on least-square fitting, which is central to the FFE algorithm of this paper, will be described in the next section. The FFE algorithm will then be described. Experimental results of its application will be presented in section 3. 2 Least-Square Spectral Analysis and FFE Algorithm Let the discrete signal segment to be analyzed be denoted by Wk, k = 1,..., N. Suppose a sinusoid of the form w^ k = a sin(fk) + b cos(fk) is to be matched to it. The frequency of the sinusoid is f, given in a unit such that f = 27rf'/s, where f' and s are the frequency of the sinusoid and the sampling frequency of wk in hertz, respectively. The values of a and b determine the amplitude and phase of wk. The square error of the match, a function of f, a, and b, is given by e = EN__(W k - Wk)2. For a given value of f, a pair of values can be chosen for a and b to minimize the square error of the match by setting 8e/a= a ENg sin(fk) sin(fk) + b Z:'1 cos(fk)sin(fk) - EN wk sin(fk) = 0; Oe/Ob = a J:= 1 sin(fk) cos(fk) + b >3Â~1 cos(fk) cos(fk) - EN Wk cos(fk) = 0. The solution to this pair of equations is given by a* = (QX - RW)/(PR - Q2) and b* = NN (QW - PX)/(PR - Q2), where P = Z a sin(fk) sin(fk), Q = i~- cos(fk) sin(fk), R = k- cos(fk)cos(fk), W = - g wk sin(fk), and X = - wk cos(fk). Let e*(f) be the value of e when a and b are set to a* and b*, respectively, computed from that given value of f. In other words, e* (f) is the minimum square error of matching a sinusoid of frequency f to the signal segment Wk over all possible amplitudes and phases for such sinusoids. Figure 1 shows a plot of the function e* (f) for a typical 300-sample signal segment of a C4 note sampled from an electric guitar. This figure illustrates two properties of this function: one that allows the spectrum of the signal segment wk to be deduced from the values of the function and one that enables the spectrum to be computed efficiently. These properties will be stated informally and without proof, because they are needed to describe the spectral analysis algorithm. Their proofs are given in (Choi 1995b). Property 1 Each "deep" trough in the function e* (f) corresponds to a harmonic component of the signal segment Wk. The value off of the minimum point at the bottom of a tough is approximately equal to the frequency of the corresponding harmonic component. In figure 1, the three deepest troughs reach their minimum values at f = 0.074, f = 0.147, and f = 0.224, respectively. Since the sampling frequency is 22255 Hz, these correspond to harmonic components with frequencies 262.10 Hz, 520.67 Hz, and 793.41 Hz, respectively. The frequency of C4 is 261.63 Hz and these are therefore approximately the fundamental, second harmonic, and third harmonic of the C4 note, respectively. Property 2 The "width" of each trough in the function e* (f) will be at least 27r/N, and this width is independent of the frequencies of the harmonic components of the given signal segment, if these are located sufficiently far apart from each other. In the example, since the value of N is 300, the widest part of each trough in the function will be wider than 0.0209. Since musical signals contain components whose frequencies are integer multiples of the fundamental frequency, the troughs corresponding to them are regularly spaced in f. Troughs will be separated from each other if the fundamental frequency is somewhat larger than 2ir/N, which is equivalent to 74.03 Hz for N = 300. Since the fundamental frequency can occur at any value of f, it is impractical to evaluate the function e* (f) at a large number of values of f in order to identify the troughs. However, property 2 ICMC PROCEEDINGS 1995 285

Page  286 ï~~makes it necessary to compute e* (f) only at values of f that are evenly spaced at a distance of 27r/(3N) apart. Doing so guarantees that at least three consecutive values will "fall into" each trough, where the middle value will be smaller than its two neighbors. An additional test can be used to eliminate troughs that are too shallow. In the example, since N = 300, e* (f) must be evaluated at 0.5/(27r/(3N)) 72 points. Since the value of e* (f) within each trough is unimodal, the golden ratio search or successive parabolic interpolation (Forsythe, G.E. et al.) can be used to determine the value of f at which the minimum occurs. Either technique converges on the minimum value by successively reducing the size of the interval in which it is known to lie. After the frequencies of the minimum points at the bottom of the troughs have been obtained, the fundamental frequency is estimated as follows. Let fl, f2,...,.fk be these frequencies, where f' < f2 <... < fk. Troughs in e*(f) may be distorted at low frequencies (if they are too close to f = 0) and may not be detected by the spectral analysis algorithm. Assuming that the harmonic component corresponding to the fundamental may or may not be detected, two cases must be distinguished: that fl, f2,..., fk correspond to (i) the fundamental, second, third,... and k-th harmonics of the signal, respectively, or (ii) the second, third,..., k-th, and (k + 1)-th harmonics of the signal, respectively. In the first case, the estimate of the fundamental frequency is the average of fi/i over i = 1, 2,...,k and in the second case, it is the average of f /(i + 1) over i = 1,2,..., k. The estimate corresponding to the set of values with a smaller standard deviation is reported by the FFE algorithm, since the standard deviation measures the consistency among the estimates of the fundamental frequency generated by the different harmonic components. The number of instruction cycles required by the FFE algorithm for each signal segment is analyzed in (Choi 1995b) and is found to be M(8N + 2d +9) + L(2Nt +1 IN + 2d + 9), where t and d are the numbers of instruction cycles to compute a sine/cosine function and a division, respectively, and M and L are the numbers of times e*(f) must be evaluated to locate the troughs and find their minimums, respectively. On an Analog Devices ADSP-2100 DSP processor, a typical set of values for these parameters are N = 300, d = 33, t = 25, M = 72, and L = 12, and a total of 398,700 instruction cycles are needed. A 30-MIPS version of this processor will require 13.29 ms to process such a signal segment. If N is reduced to 200, M becomes 48 and the same processor will require 7.59 ms to process each signal segment. 3 Experimental Results A set of experiments were conducted to study the real-time performance of the new FFE algorithm. The test inputs were sampled from an electric guitar at a sampling rate of 22255 Hz, and the notes G2 (98.1 Hz), G3 (196.0 Hz), and G4 (392.0 Hz) were used. These notes have lower fundamental frequencies than the test inputs used by previous papers on FFE. The results shown are representative of the results of repeated runs of the experiments and for notes with different fundamental frequencies. Three FFE algorithms were tested: algorithm CQCC, which is based on the constant-Q transform and cross-correlation (Brown, J.), algorithm CQDP, which is based on the constant-Q transform and dynamic programming (Choi, A. 1994), and algorithm LS, the new least-square algorithm. The three FFE algorithms were applied to the initial segments of the sampled notes. Figures 2, 3, and 4 plot the fundamental frequency estimates reported by these algorithms as functions of the lengths of the initial segments analyzed, for the notes G2, G3, and G4, respectively. Generally, algorithm LS performs better than algorithm CQDP by requiring slightly shorter initial segments to achieve accurate estimates. An exception occurs for the 6.25-ms initial segment of G3, for which algorithm CQDP produces a smaller absolute error. For initial segments that are shorter than the lengths required for accurate estimates, algorithm LS is more stable than algorithm CQDP (i.e., its absolute errors are much smaller). Both algorithms LS and CQDP require shorter initial segments (by 12.5, 7.5, and 2.5 ms for 02, 03, and 04, respectively) than algorithm CQCC to 286 6ICMC PROCEEDINGS 1995

Page  287 ï~~l|e+06, 600 CQCC9e+05Q CQDP... 500.LS....8e+05 400 7e+05 300 6e.+05200 - 5e+05 100 v- -- _ '.... 4e+05 i0 0 0.1 0.2 0.3 0.4 0.5 5 10 15 20 25 30 f segment length (ms) Figure 1. Plot of e*(f) for a C4 note Figure 2. Fund. freq. estimates for a G2 note 450., 450 '.0CQDP ------400. - 350 LS...350 CQCC - 300 CQDP-.--- 300 LS: 250 _ 250 200...-....".........150 200 100 150 50 100 I i 5 10 15 20 25 30 5 10 15 20 25 30 segment length (ms) segment length (ms) Figure 3. Fund. freq. estimates for a G3 note Figure 4. Fund. freq. estimates for a G4 note obtain accurate estimates. All three FFE algorithms perform well when the initial signal segments are longer than 25 ms. SUMMARY: A new FFE algorithm based on least-square fitting was described. It correctly analyzes shorter signal segments than previous FFE algorithms, and is thus more suitable for use in real time. Experimental results that demonstrate this property of the new algorithm were described. References Amuedo, J. 1985. Periodicity estimation by hypothesis-directed search. In Proc. of ICASSP '85, pages 395-398, Tampa, Florida, May 1985. Brown, J.C. 1992. Musical fundamental frequency tracking using a pattern recognition method. J. Acoust. Soc. Am., 92(3):1394-1402, September 1992. Choi, A. 1994. On the improvement of the real-time performance of two fundamental frequency recognition algorithms. In Proc. of First Brasilian Symp. on Computer Music, pages 27-32, Caxambu, MG, Brasil, August 1-5 1994. Choi, A. 1995a. Real-time fundamental frequency estimation by autoregressive spectral analysis. In Preparation. Choi, A. 1995b. Real-time fundamental frequency estimation by least-square fitting. Technical Report TR-95-03, Department of Computer Science, University of Hong Kong. Doval, B and Rodet, X. 1991a. Estimation of fundamental frequency of musical sound signals. In Proc. of ICASSP 1991, pages 3657-3660, Toronto, Canada, May 1991. Doval, B and Rodet, X. 1991b. Fuindamental frequency estimation using a new harmonic matching method. In Proc. of ICMC 1991, pages 555-558, Montreal, Cananda, 1991. Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977. Computer Methods for Mathematical Computations. Prentice-Hall, 1977. Harris, F.J. 1978. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1):51-83, January 1978. Marple, S.L. 1987. Digital Spectral Analysis with Applications. Prentice-Hall, 1987. IC MC PROCEE DI N G S 199528 287