is useful to search for a possible fundamental frequency.
This fundamental frequency can also be used to do pitchsynchronous analysis. The size of the analysis window is
adjusted to the period of the fundamental frequency,
which gives the best time-frequency trade-off possible.
The SMS analysis makes use of re-analysis to obtain the
most accurate result. The fundamental frequency that
comes out of this process is considered to be the correct
one, and therefore the SMS analysis data contains just one
fundamental frequency per analysis frame, rather than a
group of possibilities with confidence factors.
In order to accomplish a musically meaningful
parameterization for sound transformation, the
deterministic plus stochastic model has been extended
with the extraction of high level attributes. These
attributes are calculated at each analysis frame from the
output of the basic SMS analysis (Serra, Bonada, 1998).
4. Representing SMS data with SDIF
The SDIF standard provides a representation of the data
commonly used for the modeling of sounds with sinusoids
and noise, with descriptors for fundamental frequency,
short-term Fourier transform and sinusoidal tracks. Not
surprisingly, since the SMS file format contains this data
as well, the SMS file format and the SDIF file format have
a lot in common, and it has been feasible as well as
attractive to use SDIF as an alternative storage format
within the SMS applications. Apart from giving users of
the SMS software the possibility of interchanging data
with other applications, it provides the possibility to read,
write and modify SMS data with external applications
with SDIF support, or to use the SDIF libraries to gain
access to the data directly, thereby facilitating
experimenting.
The SMS high level attributes, as well as a description of
the relationships between them, are not defined within the
SDIF standard. These are very specific to the SMS
analysis and synthesis, so it is not very likely that other
applications could make use of them. However, it might
be interesting to export these values as well, for
experimentation or visualization. In that case, it would be
useful if the SDIF standard would provide a way to add
arbitrary data, with a textual description of each.
Also, SMS can deal with segmentation into regions. The
proposal of Xavier Rodet, to make use of markers, would
allow this to be included in the SDIF export files. We
would like to see this become part of the SDIF standard.
The SDIF Frame Types, currently described by CNMAT
and IRCAM (http://www.cnmat.berkeley.edu/SDIF,
http://www.ircam.fr/sdif), which can be converted from
and to SMS data, are shown in table 1.
The internal storage of the SMS data is done in frames,
which are grouped in regions, which are grouped in tracks.
Each SMS frame contains all data needed to synthesize
that particular frame, including sinusoidal tracks, residual,
and fundamental frequency, as well as several high level
attributes. In terms of storage format, the main difference
between the SMS format and the SDIF format, is that
SDIF stores all information in separate frames per data
type where SMS bundles all this data in single frames.
However, since the SDIF data is organized in streams, or
sequences of interleaved streams ordered in time, the
conversion from one format to the other is very
straightforward.
Table 1. Table of SDIF frame types extracted from
SMS data files.
Frame Type of data
type
1FQO Fundamental Frequency Estimate
1STF Discrete Short-Term Fourier Transform
1TRC Sinusoidal Tracks
The conversion from SMS to SDIF is loss-less as far as
the high level attributes are not taken into consideration.
When we convert SMS to SDIF files, we make sure that
all high level attributes are incorporated first. This is
important, because the data is rather meaningless when the
extracted attributes are not accompanying it. It is possible
to incorporate the SMS high level attributes into the SDIF
data, thus canceling the effect of the extraction, and to
extract it again when needed. This does not result into any
noticeable loss of data. The conversion of an external
SDIF file that contains at least residual or sinusoidal track
into SMS will result into a valid SMS synthesis. The only
problem here is that SMS lacks the notion of confidence in
the Fundamental Frequency Estimates, as specified by the
SDIF standard. Instead, it only uses one fundamental
frequency, and if the SDIF file contains several
estimations, only the one with the highest confidence will
be used.
As an SDIF frame can contain any number of matrices, it
would be possible to store the used matrices in one single
frame. However, this would make it more difficult for any
other application to handle the data, because it would
require looking inside all frames and going through all
matrices when looking for a specific frame. Therefore we
decided to follow the SDIF specification closely and store
each matrix in the appropriate frame.
The SMS analysis separates the sound into the sinusoidal
components and the residual. The residual is stored as a
Short-Term Fourier Transform in the SMS generated
SDIF file. Here we see a shortcoming of the SDIF
specification, or maybe of the use of interchange file
formats in general. The SDIF standard does not provide
any way to indicate that the stored Short-Term Fourier
Transform is the analyzed residual, rather than the
complete sound. Furthermore, as during the SMS analysis
0