SOFTWARE FOR SPECTRAL ANALYSIS, EDITING, AND SYNTHESIS

Klingbeil, Michael

SOFTWARE FOR SPECTRAL ANALYSIS, EDITING, AND SYNTHESIS Michael Klingbeil [email protected] ABSTRACT This paper describes the design and development of new software for spectral analysis, editing and resynthesis. Analysis is accomplished using a variation of the traditional McAulay-Quatieri technique of peak interpolation and partial tracking. Linear prediction of the partial amplitudes and frequencies is used to determine the best continuations for sinusoidal tracks. A high performance user interface supports flexible selection and immediate manipulation of analysis data, cut and paste, and unlimited undo/redo. Hundreds of simultaneous partials can be synthesized in real-time and documents may contain thousands of individual partials dispersed in time without degrading performance. A variety of standard file formats are supported for the import and export of analysis data. 1. INTRODUCTION Sinusoidal modeling has a proven track record of high quality resynthesis while offering numerous possibilities for novel sonic transformations [1, 3, 7, 8]. Software such as Lemur [4], AudioSculpt, and MetaSynth, to name but a few, have demonstrated the power and popularity of graphical user interfaces for spectral editing. A new software application named SPEAR, Sinusoidal Partial Editing Analysis and Resynthesis, has been created to offer increased speed, flexibility, and ease of use in the domain of graphical spectral editing. The following goals were kept in mind throughout the design and development phases: editing should be as fast and as easy to understand as in a time domain waveform editor, listening to transformations should be possible with no intermediate synthesis or processing stage, and high quality analyses should require only a few parameter settings. Additionally, SPEAR interoperates well with other analysisresynthesis software by offering both SDIF [11] and native format data exchange. In order to offer a familiar and comfortable interface, SPEAR is written to run using the native graphics of the host operating system. Portability is made possible using wxWidgets (http: //www.wxwidgets. org), a C++ GUI library which allows the software to be compiled for MacOS, Windows, and GTK Linux. Current builds of SPEAR run on MacOS 9, MacOS X, and Windows. 2. ANALYSIS Audio analysis adheres to the basic peak interpolation and partial tracking methods as detailed in [7, 10]. To begin analysis the user must specify minimum frequency spacing in hertz, fa, which for a harmonic analysis should correspond to the fundamental frequency. This is used to determine a default window length and main lobe width, as well FFT size and analysis rate (hop size). For inharmonic or polyphonic sounds, fa will determine the minimum frequency spacing between partials that can be independently resolved. Given a window length in samples of M = 4fs/fa (where fs is the sampling rate), the desired FFT size is N = 2 [g M]+1, resulting in a spectrum oversampling by a minimum factor of 2. 2.1. Peak Selection The local maxima in the magnitude spectrum guide the search for sinusoidal peaks. Parabolic interpolation of frequency and phase [7, 10] using bins n-1, n, n+1 selected from the N/2 bins of an analysis frame yield frequency, phase, and amplitude estimates for the candidate peak. With a minimum mainlobe width of 4fa and spectrum oversampled by a minimum factor of 2, each candidate peak will be sampled by at least 8 FFT bins. To aid the rejection of spurious peaks we may conservatively impose the restriction that the three magnitudes an_-, an, ain+ of a parabolic peak must also all exceed the magnitude of either neighboring bin; an-1 > an-2 or an+1 > an+2. 2.2. Amplitude Thresholds Two types of amplitude thresholds are used in the peak picking and partial tracking process. For a peak to be considered a candidate for tracking, it must exceed an absolute threshold Td (default value of -96 dB). These peaks are matched with existing partial tracks as described in the next section. If no appropriate track continuation is found, the partial ends -so Td may be considered a "death" threshold for possible continuation of a partial. Unmatched peaks become candidates for initiating new partial tracks. A new track is started only if a peak exceeds a dynamic threshold level Tb which is computed with a frequency dependent threshold curve At(fk) in combination with the maximum bin amplitude of the analysis frame, amax. The curve is given by A(fk) aL - aR + (aR)(bfk/20) (1) where fk is the peak frequency in kHz, b is a parameter controlling the shape of the curve, aL is the amplitude threshold at 0 kHz, and aR is the range of the curve in positive dB from 0 to 20 kHz. Figure 1 shows At (fk)