# Detection of Sinusoidal Components in Sounds Using Statistical Analysis of Intensity Fluctuations

Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact mpub-help@umich.edu to use this work in a way not covered by the license. :

For more information, read Michigan Publishing's access and usage policy.

Page 00000100 DETECTION OF SINUSOIDAL COMPONENTS IN SOUNDS USING STATISTICAL ANALYSIS OF INTENSITY FLUCTUATIONS Pierre HANNA SCRIME - LaBRI Universite de Bordeaux 1 F-33405 Talence Cedex, France hanna@labri.fr ABSTRACT This article presents a signal characterization method to detect the sinusoidal components of any signal. This is based on the statistical studies of the fluctuations of the temporal envelope power. Two algorithms are proposed, one more efficient, the other more precise. The main advantages are the use of only one analysis window and especially, very small sensitivity to noise. Some experiments and results on synthetic sounds illustrate the high quality of this analysis with noisy sounds. 1. INTRODUCTION Many representations of musical sounds, based on spectral models, consider signals as a sum of sinusoids whose amplitude and frequency evolve slowly with time. The existing analysis methods are often developed in order to extract sinusoidal components [1]. However, noisy sounds normally used in electro-acoustic music are not composed of periodic pulses only. When such analysis methods are applied to sounds whose noisy part is very intense, they extract artificial frequency components, which are not representative. Indeed with one spectrum, calculated from a short-time analysis window (a few milliseconds) using the Fourier transform, it is very difficult to differentiate peaks which correspond to real periodicities from the ones related to noise or transients. Moreover when the noise amplitude gets close to the signal amplitude (signal-to-noise ratio low), some basic analysis methods cannot even detect the real sinusoidal components. Hence we propose in this paper a new approach, mainly based on the study of the temporal representation of the sound and particularly on the fluctuations of the dynamic envelope. This approach leads to a signal characterization method which finds out the parts of spectrum where pure sinusoids are located. We also present some experiments and results with synthetic sounds. 2. EXISTING METHODS In this section we give details about a few existing methods to detect sinusoidal components in any sound. We also present their main advantages and limits. 2.1. Basic method The most classical method consist in testing every local maximum from a spectrum S (S(k) so that S(k - 1) < S(k) > S(k + 1)). Obviously, peaks are rapidly chosen, but if sinusoids are mixed Myriam DESAINTE-CATHERINE SCRIME - LaBRI Universite de Bordeaux 1 F-33405 Talence Cedex, France myriam@labri.fr with noise whose amplitude is high (SNR < 0), peak amplitudes corresponding to sinusoids will not always be higher than peaks related to noise. Furthermore more peaks than the real number of pure sinusoids may be detected. 2.2. IRCAM measure of sinusoidality The characterization proposed in [2] calculates the complex correlation between the frequency shifted Fourier transform of the analysis window and each peak of the discrete Fourier transform of the signal. A coefficient gives a value between 0 and 1. The value 0 indicates the presence of noise while the value 1 indicates a sinusoidal component. Theoretically it can be used with any type of sounds. However, it has certain drawbacks which are presented in details in [3]: Stationarity of the signal: The precision of this measure relies on the assumption that every properties of the analyzed signal (frequency, amplitude) are constant in the analysis window. With real-world sounds, this assumption is rarely true and this leads to detection errors. Detection of non-sinusoidal components: The measure applied to peaks corresponding to side-lobes of sinusoidal components indicates pure sinusoids. Time versus frequency resolution: The use of short time Fourier transform lends to the usual problem of temporal versus frequency resolution. To calculate the measure, good frequency precision (large analysis window) is necessary whereas the stationarity assumption requires good temporal precision (small analysis window). 2.3. PSOLA method Peeters proposed in [3] some improvements of the IRCAM measure. Normalization of fundamental frequency: Assuming that the frequency modulation of signals are correlated to that of fundamental frequency, signal is re-sampled to minimize the influence of these frequency modulations. Phase variations measure: This new measure is based on the phase difference at two instants. This method leads to different and complementary results. For example spurious side-lobe peaks are no more detected. Neighboring peak subtraction: Each peak previously identified as a sinusoidal component is subtracted from the studied spectrum to cancel its influence on other peaks. 100

Page 00000101 These improvements are efficient but related to the assumption that sounds are either harmonic or pseudo-harmonic, and that they have low noise. Moreover they require a previous knowledge about the sinusoidality of the spectrum's components or about phase variations. This implies working with successive signal windows. 3. ANALYSIS OF THE FLUCTUATIONS OF THE TEMPORAL ENVELOPE In particular the sounds we propose to characterize are noisy and not necessarily harmonic. Therefore we first define noise and then we use the consequences of this definition on the fluctuations of the temporal envelope. 3.1. Definition of noise Thermal noises have been described in terms of a Fourier series [4]: X(t) = EN,1 Cn sin(wJt + Cn) where N is the number of frequencies, n is an integer, w,, are the pulsations which are equally spaced, C, are random variable distributed according to a Rayleigh distribution and bn are uniformly distributed random variables. 3.2. Temporal envelope 3.2.1. Definition A pure sinusoid, whose amplitude and frequency are fixed, is perceived as a steady sound whereas it may have many cycles per second. This stability is described by the fluctuations of temporal envelope. The temporal envelope E(t) can be defined [5] as: E(t)= I CU ei(wn ~ +n) We can easily extract the envelope of any signal from an inverseFourier transform by removing the negative frequency components from the spectrum. The auditory system cannot detect too fast fluctuations of an envelope. Hence, usual models take into account this property: the fluctuations are attenuated by a low-pass filter [4]. However, in our approach, this filter is not really necessary because of the frequency resolution. The frequency bandwidth we want to analyze is very narrow to be as precise as possible. The frequency differences in this bandwidth, which define the envelope power [6], are indeed very small. So filtering high frequencies may not be very useful. 3.2.2. Fluctuations of the envelope power In this paper we study, in particular, statistical variations of the power of the dynamic envelope. The mean of the variance of this power is described as a function of the number N of the frequency components in the signal and the bandwidth W of the signal [4]: By considering the studied sound as a mix between some sinusoids and some white noise, and by applying the definition of noise proposed above (part 3.1), our signal can be described as a sum of only sinusoids. But one can show that the amplitude corresponding to the frequency component of noise is lower than that of the sinusoids (even with signal-to-noise ratios negative). So we can approximate, in a narrow band containing one pure sinusoid, N as 1. This implies that the variance of the power envelope will be smaller. 3.3. Approximation methods In equation 1, the variance of the envelope power of the dynamic envelope is the mean average. This mean is temporal but also depends on the phase random values (see the definition part 3.1). However, the variance that we can easily measure is nothing but the variance of one instance of the signal. Experiments show that the variance values fluctuate a lot around their mean which gives less precise results. We propose here two improvements. The first method involves averaging the variance by distributing component phases. The second one calculates the largest variance. Each one has advantages and drawbacks but they are complementary. 3.3.1. Phase distribution Many realizations can be obtained from the original signal by modifying phases. From the definition of noise, phases are supposed to be uniformly distributed. After having calculated the Fourier transform, we can replace the values of the phase spectrum with random values, uniformly distributed between 0 and 27r. Then we can synthesize one instance of any signal by inverse Fourier transform, and we can eventually do it again as many times as necessary. For a more efficient algorithm, it is important to note that the spectrum of any signal is the same (for its positive part) as the one of its temporal envelope. Hence only one Fourier transform has to be performed: the temporal envelope can be calculated and the phase spectrum can be modified simultaneously (see the algorithms in the part 3.4). 3.3.2. Setting phase constants One can show that for equal values of the phase of each frequency component, the largest fluctuations of envelope power are obtained. In the same way as the average variance described above, this maximum variance depends on the number N of components. Hence, in theory, by choosing every phase value equal to a constant value (for example 0), variations in envelope fluctuations can be measured by modifying N. This is how sinusoids can be detected in a noisy sound. 3.3.3. Frequency filtering To detect one pure sinusoid in a band of any spectrum, we have to study only a small part of this spectrum. We have to analyze not only a small temporal window but also a small frequency window. We take into account the possibility of filtering a signal from its spectral representation. Once the Fourier transform is performed, we modify the spectrum to extract the narrow band we want to study and then, by inverse Fourier transform, we only calculate the envelope power of this band. (V(E2)) = f(N, W) (1) Especially, if N increases, (V(E2)) also increases. This variance is the measure we use in this article. 3.2.3. Application The previous equation shows that the measure we defined depends only on the frequency distribution and mainly on the parameter N [6]. 101

Page 00000102 3.4. Algorithm and implementation In this section, we present the two general algorithms to calculate the fluctuations of the dynamic envelope. Experiments show that some difficulties appear when implementing the two methods. Indeed if the analysis window is too small (for example, 5.8 ms), phase spectrum is not precise and the calculated phase values are not significant enough to synthesize other realizations of the same signal. That's why it is necessary to differentiate two analysis windows: the first one will be large enough (for example, 23ms) to give a good average variance or to modify phase values, while the second one will be smaller to have a good time precision and to calculate the measure of the variance of the envelope. 3.4.1. Algorithm for average variance 0.8 variance of the power envelope 0.7 0.6 0.5 0.4 0.3 0.2 0.1 I i Data: Signal samples begin Take a large signal window; for the number of instances chosen do Calculate FFT window; Reset negative parts; Take frequency window (band-pass filter); Draw phase values randomly between 0 and 27r; Modify the spectrum with the phase values; Calculate IFFT spectrum; Take new small signal window; Calculate variance; end Average calculated variances for all instances chosen end 1; 3.4.2. Constant phase values The algorithm is same as the previous one except that there is only one instance and the phases are set to a constant value (generally 0). 4. EXPERIMENTS AND RESULTS We have made several experiments with synthetic sounds (white noise is computed with random samples). We propose here only three examples chosen in the critical cases. Each point of the presented curves is calculated with only 3 frequency values (bins). This is the minimum number to detect a local maximum. Here, all the possible points have been calculated which is not always necessary. For each experiment we compare the two presented methods; plain lines denote random phases and dotted lines denote constant phases. We use a 1024 samples long window at the first step and then only 256 samples are used to compute the variance. For the random-phase method, we calculate 10000 instances which takes a few minutes (20 minutes in our examples). However, the constant-phase method takes only a very few seconds (5 seconds in our examples). 4.1. Harmonic sinusoids (different SNR values) For the first example (figure 1), we have mixed 7 harmonic sinusoids (k x 2930Hz) with white noise. The signal-to-noise ratios S u Frequency (Hertz) 0.0 ---1-1-1 -0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 Figure 1: Measure of sinusoidality with a mix of harmonic sinusoids and white noise. Each sinusoid amplitude is different to have decreasing SNR: SNR= 0, -3, -6, -9, -12, -18, -21. is decreasing (SNR= 0, -3, -6, -9, -12, -18, -21). Other experiments show that one pure sinusoid is perfectly detected for SNR= -15dB. 4.2. Sinusoids with amplitude modulation The second example (figure 2) has been made with 7 harmonic sinusoids (k x 2930Hz) whose amplitude Ak is modulated at 20Hz: Ak + A- sin(2r20t). The cases of linearly increasing or decreasing amplitude have also been tested and have given good results. 0.8 variance of the power envelope 0.7 0.6 0.5 0.4 0.3 r 0.1.;Frequency (Hertz) 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 Figure 2: Measure of sinusoidality of 7 harmonic sinusoids with amplitude modulation (20Hz) mixed with noise. Each sinusoid amplitude is different to have decreasing SNR: SNR= 0, -3, -6, -9, -12, -15, -18. 4.3. Sinusoid with frequency modulation The third example (figure 3) has been made with 7 harmonic sinusoids (k x 2930Hz) whose frequency Fk is modulated at 20Hz: Fk + - sin(27r20t) (R sample rate and N = 256). This sinusoid has been mixed with white noise. The cases of linearly increasing 102

Page 00000103 or decreasing frequency have also been tested and the results are good. 0.8........... variance of the power envelope 0.7 - 0.6 - 0.5 - 0.4 - 0.3 - 0.2 - 0.1.0 Frequency (Hertz) 0.0,1,1-1i-1-1 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 Figure 3: Measure of sinusoidality of 7 harmonic sinusoids with frequency modulation (20Hz) mixed with noise. Each sinusoid amplitude is different to have increasing SNR: SNR= -12, -9, -6, -3, 0, 3, 6, 9. 5. DISCUSSION After having presented some examples of experiments and results, we explain in this part the advantages and drawbacks of the two methods that we propose in this paper. 5.1. Efficiency versus precision Our technique to measure the sinusoidality can be performed in two different ways: by setting the phases at the same value or by averaging and choosing random phase values with a uniform distribution. It is obvious that the second method costs more time. Indeed a large number of instances of the analyzed signal have to be synthesized in order that the average is significant. But, as an inverse Fourier transform is performed each time to calculate the variance of the envelope power, this operation becomes slower and slower, and the calculation can take up to a few hours for a very large window. By comparison, setting the phases at the same value is very fast. But it is also important to take into account the precision given by each method. Indeed the constant-phase method appears to be less precise than the average method at times. This can be seen, for example, in figure 2, where spurious peaks are detected. The efficiency of each method is also determined by the size of the large analysis window. The method becomes slower as this size becomes larger, though experiments show that increasing this window size increases the precision of the results. It depends of the nature of the studied sounds: very noisy sounds require large window size whereas noiseless harmonic sounds require small window size. 5.2. Advantages and drawbacks The results presented just above show that the proposed methods seem to improve the existing techniques because of special properties: High frequency precision: Basing our method on the fluctuations of the temporal envelope gives the opportunity to study a frequency window whose width is only three bins. Indeed only three bins are necessary to define significant envelope fluctuations. By comparison, methods based on the form of the amplitude spectrum around the analyzed frequency require more than three bins in most cases. Only one analysis window: Our detection method needs only one analysis window, as opposed to the usual techniques based on the phase variations. Here, it is not necessary to know the phase spectrum of the future window to establish the presence or the absence of a pure sinusoid. Accuracy in noisy environment: Different experiments show that the main advantage of our methods is its capacity to detect sinusoids mixed with noise, especially when the noise amplitude is higher than that of the sinusoidal component. Of course, our methods have a few limits: Stationarity assumption: The precision is high when analyzing sinusoids whose amplitude is modulated but it decreases when the frequency is modulated. Efficiency: As already explained, although the random-phase method is very precise, it can be very slow. However if one want a high precision analysis, this technique may be used. 6. CONCLUSION We have presented a new approach to determine from a spectrum of any type of sounds, even very noisy sounds, which peak corresponds really to a sinusoidal component and which one is related to noise or transients. The main application of these methods is to preanalyze signals and identify part of the spectrums where sinusoidals are located. This input can then be effectively used by other techniques like DFT1 which uses the signal derivate [1] is very precise to extract precise partial parameters (frequency, amplitude). 7. REFERENCES [1] S. Marchand, Sound models for computer music: analysis, transformation, synthesis of musical sound, Ph.D Thesis, LaBRI, Universite Bordeaux I, 2000. [2] X. Rodet, "Musical sound signals analysis/synthesis: Sinusoidal+residual and elementary waveform models," Proceedings of the IEEE Time-Frequency and Time-Scale Workshop (TFTS'97), University of Warwick, Coventry, UK, 1997. [3] G. Peeters and X. Rodet, "Signal characterization in terms of sinusoidal and non-sinusoidal components," Proceedings of the Digital Audio Effects Workshop (DAFX'98, Barcelona), 1998. [4] W.M. Hartmann, Signals, Sound, and Sensation, Modem Acoustics and Signal Processing AIP Press, 1997. [5] W.M. Hartmann, S. McAdams, A. Gerzso, and P. Boulez, "Discrimination of spectral density," Journal of Acoustical Society ofAmerica, vol. 79, no. 6, pp. 1915-1925, 1986. [6] P. Hanna and M. Desainte-Catherine, "Influence of frequency distribution on intensity fluctuations of noise," Proceedings of the Digital Audio Effects Workshop (DAFX'O1, Limerick, Ireland), pp. 125-129, 2001. 103