Using the Sound Description Interchange Format within the SMS Applications Maarten de Boer, Jordi Bonada, Xavier Serra Audiovisual Institute - Pompeu Fabra University Rambla 31, 08002 Barcelona, Spain { mdeboer, jboni, xserra} @iua.upf.es http://www.iua.upf.es Abstract Recently, we have seen an increased use and support of the Sound Description Interchange Format (SDIF), among which the integration of SDIF in widely used environments such as MAX/MSP (Wright, Dudas, Khoury, Wang, Zicarelli, 1999) and MPEG-4 (Wright, Scheirer, 1999). To follow and encourage this trend, we have added support for importing and exporting SDIF files in the latest version of the SMS applications, a group of applications for spectrum-modeling analysis and synthesis. In this paper we discuss the use of the SDIF standard in the SMS applications. We give a brief introduction to SMS and SDIF, and examine the features and limitations found in the SDIF standard when used to represent the SMS analysis data. We also present an application for the graphical visualization of the SDIF data as extracted from SMS files, similar to the one used in the SMS graphical tools. 1. Introduction As the increased capabilities of storage and communication media increase, and while the system requirements for digital audio processing can be met more easily, the field of audio analysis and synthesis has to deal with larger amounts of available data. Not only it is important to be able to organize and classify this data, as has been addressed by the efforts of the MPEG-7 group (Herrera, Serra, Peeters, 1999), but also there is a clear need for a common format of this data, that can easily be read and written by everybody, and that specifies the storage of the most used content. This paper discusses the use of such a format, the Sound Description Interchange Format (SDIF), in a practical situation. The SMS applications, developed at the Music Technology Group of the Audiovisual Institute of the Pompeu Fabra University, deal mainly with the kind of data that the initial design of the SDIF standard has focused on. The SMS applications are closed source, so the use of the SMS data files was restricted to these applications themselves. As we saw the desire for the possibility of further experimentation with this data, and to access this data from other programs, we have added SDIF as an import/export file format. Currently, the SDIF files contain only a subset of the SMS data, as only the most common and standardized descriptors are used, but future extension of the SDIF standard could change that. 2. About SDIF The creation of the Sound Description Interchange Format (SDIF), started in 1995 as collaboration between Xavier Rodet (IRCAM) and Adrian Freed (CNMAT), is an ongoing effort to create a file format that allows sound analysis and synthesis researchers to interchange data. Originally, it focused on spectral descriptions of sound, but over the years SDIF has become more general, and now includes other, non-spectral, descriptors for sound, such as time-domain samples. The SDIF format specification consists of two parts. First, there is the specification of the actual file format and contents. Second, there is a list of standard data types, and their representation format. Data is stored in a sequence of frames, similar to the chunks in the IFF, AIFF or RIFF formats. Each frame contains same common data, such as a data type identifier, a time-tag, a stream identifier, and a number of matrices of floating point numbers, which contains the actual data (Wright, 1999). 3. About SMS Spectral Modeling Synthesis (SMS) is a set of techniques and software implementations for the analysis, transformation and synthesis of musical sounds, based on the decomposition of the sound into a deterministic plus a stochastic part (Serra, 1997). The output of the SMS analysis is a collection of frequency and amplitude values, representing the partials of the sound (sinusoidal, or deterministic component), and either filter coefficients with a gain value or spectral magnitudes and phases representing the residual sound (non sinusoidal, or stochastic component). The latter is obtained by subtracting the re-synthesized sinusoidal component from the original sound. To guide the tracking of the partials, it 0
Top of page Top of page