SOFTWARE FOR SPECTRAL ANALYSIS, EDITING, AND SYNTHESIS
Michael Klingbeil
[email protected]
ABSTRACT
This paper describes the design and development of new
software for spectral analysis, editing and resynthesis.
Analysis is accomplished using a variation of the traditional McAulay-Quatieri technique of peak interpolation
and partial tracking. Linear prediction of the partial amplitudes and frequencies is used to determine the best continuations for sinusoidal tracks. A high performance user
interface supports flexible selection and immediate manipulation of analysis data, cut and paste, and unlimited
undo/redo. Hundreds of simultaneous partials can be synthesized in real-time and documents may contain thousands of individual partials dispersed in time without degrading performance. A variety of standard file formats
are supported for the import and export of analysis data.
1. INTRODUCTION
Sinusoidal modeling has a proven track record of high
quality resynthesis while offering numerous possibilities
for novel sonic transformations [1, 3, 7, 8]. Software
such as Lemur [4], AudioSculpt, and MetaSynth, to name
but a few, have demonstrated the power and popularity of
graphical user interfaces for spectral editing.
A new software application named SPEAR, Sinusoidal
Partial Editing Analysis and Resynthesis, has been created to offer increased speed, flexibility, and ease of use
in the domain of graphical spectral editing. The following
goals were kept in mind throughout the design and development phases: editing should be as fast and as easy to
understand as in a time domain waveform editor, listening
to transformations should be possible with no intermediate synthesis or processing stage, and high quality analyses should require only a few parameter settings. Additionally, SPEAR interoperates well with other analysisresynthesis software by offering both SDIF [11] and native format data exchange.
In order to offer a familiar and comfortable interface,
SPEAR is written to run using the native graphics of the
host operating system. Portability is made possible using
wxWidgets (http: //www.wxwidgets. org), a C++
GUI library which allows the software to be compiled for
MacOS, Windows, and GTK Linux. Current builds of
SPEAR run on MacOS 9, MacOS X, and Windows.
2. ANALYSIS
Audio analysis adheres to the basic peak interpolation and
partial tracking methods as detailed in [7, 10]. To begin
analysis the user must specify minimum frequency spacing in hertz, fa, which for a harmonic analysis should
correspond to the fundamental frequency. This is used to
determine a default window length and main lobe width,
as well FFT size and analysis rate (hop size). For inharmonic or polyphonic sounds, fa will determine the minimum frequency spacing between partials that can be independently resolved. Given a window length in samples
of M = 4fs/fa (where fs is the sampling rate), the desired FFT size is N = 2 [g M]+1, resulting in a spectrum
oversampling by a minimum factor of 2.
2.1. Peak Selection
The local maxima in the magnitude spectrum guide the
search for sinusoidal peaks. Parabolic interpolation of frequency and phase [7, 10] using bins n-1, n, n+1 selected
from the N/2 bins of an analysis frame yield frequency,
phase, and amplitude estimates for the candidate peak.
With a minimum mainlobe width of 4fa and spectrum
oversampled by a minimum factor of 2, each candidate
peak will be sampled by at least 8 FFT bins. To aid the rejection of spurious peaks we may conservatively impose
the restriction that the three magnitudes an_-, an, ain+
of a parabolic peak must also all exceed the magnitude of
either neighboring bin; an-1 > an-2 or an+1 > an+2.
2.2. Amplitude Thresholds
Two types of amplitude thresholds are used in the peak
picking and partial tracking process. For a peak to be
considered a candidate for tracking, it must exceed an
absolute threshold Td (default value of -96 dB). These
peaks are matched with existing partial tracks as described
in the next section. If no appropriate track continuation
is found, the partial ends -so Td may be considered a
"death" threshold for possible continuation of a partial.
Unmatched peaks become candidates for initiating new
partial tracks. A new track is started only if a peak exceeds
a dynamic threshold level Tb which is computed with a
frequency dependent threshold curve At(fk) in combination with the maximum bin amplitude of the analysis
frame, amax. The curve is given by
A(fk) aL - aR + (aR)(bfk/20)
(1)
where fk is the peak frequency in kHz, b is a parameter controlling the shape of the curve, aL is the amplitude
threshold at 0 kHz, and aR is the range of the curve in
positive dB from 0 to 20 kHz. Figure 1 shows At (fk)