Using the Sound Description Interchange Format within the SMS Applications
Maarten de Boer, Jordi Bonada, Xavier Serra
Audiovisual Institute - Pompeu Fabra University
Rambla 31, 08002 Barcelona, Spain
{ mdeboer, jboni, xserra} @iua.upf.es
http://www.iua.upf.es
Abstract
Recently, we have seen an increased use and support of the Sound Description Interchange Format (SDIF),
among which the integration of SDIF in widely used environments such as MAX/MSP (Wright, Dudas, Khoury,
Wang, Zicarelli, 1999) and MPEG-4 (Wright, Scheirer, 1999). To follow and encourage this trend, we have
added support for importing and exporting SDIF files in the latest version of the SMS applications, a group of
applications for spectrum-modeling analysis and synthesis. In this paper we discuss the use of the SDIF standard
in the SMS applications. We give a brief introduction to SMS and SDIF, and examine the features and
limitations found in the SDIF standard when used to represent the SMS analysis data. We also present an
application for the graphical visualization of the SDIF data as extracted from SMS files, similar to the one used
in the SMS graphical tools.
1. Introduction
As the increased capabilities of storage and
communication media increase, and while the system
requirements for digital audio processing can be met more
easily, the field of audio analysis and synthesis has to deal
with larger amounts of available data. Not only it is
important to be able to organize and classify this data, as
has been addressed by the efforts of the MPEG-7 group
(Herrera, Serra, Peeters, 1999), but also there is a clear
need for a common format of this data, that can easily be
read and written by everybody, and that specifies the
storage of the most used content. This paper discusses the
use of such a format, the Sound Description Interchange
Format (SDIF), in a practical situation. The SMS
applications, developed at the Music Technology Group
of the Audiovisual Institute of the Pompeu Fabra
University, deal mainly with the kind of data that the
initial design of the SDIF standard has focused on. The
SMS applications are closed source, so the use of the SMS
data files was restricted to these applications themselves.
As we saw the desire for the possibility of further
experimentation with this data, and to access this data
from other programs, we have added SDIF as an
import/export file format. Currently, the SDIF files
contain only a subset of the SMS data, as only the most
common and standardized descriptors are used, but future
extension of the SDIF standard could change that.
2. About SDIF
The creation of the Sound Description Interchange Format
(SDIF), started in 1995 as collaboration between Xavier
Rodet (IRCAM) and Adrian Freed (CNMAT), is an
ongoing effort to create a file format that allows sound
analysis and synthesis researchers to interchange data.
Originally, it focused on spectral descriptions of sound,
but over the years SDIF has become more general, and
now includes other, non-spectral, descriptors for sound,
such as time-domain samples.
The SDIF format specification consists of two parts. First,
there is the specification of the actual file format and
contents. Second, there is a list of standard data types, and
their representation format. Data is stored in a sequence of
frames, similar to the chunks in the IFF, AIFF or RIFF
formats. Each frame contains same common data, such as
a data type identifier, a time-tag, a stream identifier, and a
number of matrices of floating point numbers, which
contains the actual data (Wright, 1999).
3. About SMS
Spectral Modeling Synthesis (SMS) is a set of techniques
and software implementations for the analysis,
transformation and synthesis of musical sounds, based on
the decomposition of the sound into a deterministic plus a
stochastic part (Serra, 1997). The output of the SMS
analysis is a collection of frequency and amplitude values,
representing the partials of the sound (sinusoidal, or
deterministic component), and either filter coefficients
with a gain value or spectral magnitudes and phases
representing the residual sound (non sinusoidal, or
stochastic component). The latter is obtained by
subtracting the re-synthesized sinusoidal component from
the original sound. To guide the tracking of the partials, it
0