Page  00000001 Using the Sound Description Interchange Format within the SMS Applications Maarten de Boer, Jordi Bonada, Xavier Serra Audiovisual Institute - Pompeu Fabra University Rambla 31, 08002 Barcelona, Spain { mdeboer, jboni, xserra} @iua.upf.es http://www.iua.upf.es Abstract Recently, we have seen an increased use and support of the Sound Description Interchange Format (SDIF), among which the integration of SDIF in widely used environments such as MAX/MSP (Wright, Dudas, Khoury, Wang, Zicarelli, 1999) and MPEG-4 (Wright, Scheirer, 1999). To follow and encourage this trend, we have added support for importing and exporting SDIF files in the latest version of the SMS applications, a group of applications for spectrum-modeling analysis and synthesis. In this paper we discuss the use of the SDIF standard in the SMS applications. We give a brief introduction to SMS and SDIF, and examine the features and limitations found in the SDIF standard when used to represent the SMS analysis data. We also present an application for the graphical visualization of the SDIF data as extracted from SMS files, similar to the one used in the SMS graphical tools. 1. Introduction As the increased capabilities of storage and communication media increase, and while the system requirements for digital audio processing can be met more easily, the field of audio analysis and synthesis has to deal with larger amounts of available data. Not only it is important to be able to organize and classify this data, as has been addressed by the efforts of the MPEG-7 group (Herrera, Serra, Peeters, 1999), but also there is a clear need for a common format of this data, that can easily be read and written by everybody, and that specifies the storage of the most used content. This paper discusses the use of such a format, the Sound Description Interchange Format (SDIF), in a practical situation. The SMS applications, developed at the Music Technology Group of the Audiovisual Institute of the Pompeu Fabra University, deal mainly with the kind of data that the initial design of the SDIF standard has focused on. The SMS applications are closed source, so the use of the SMS data files was restricted to these applications themselves. As we saw the desire for the possibility of further experimentation with this data, and to access this data from other programs, we have added SDIF as an import/export file format. Currently, the SDIF files contain only a subset of the SMS data, as only the most common and standardized descriptors are used, but future extension of the SDIF standard could change that. 2. About SDIF The creation of the Sound Description Interchange Format (SDIF), started in 1995 as collaboration between Xavier Rodet (IRCAM) and Adrian Freed (CNMAT), is an ongoing effort to create a file format that allows sound analysis and synthesis researchers to interchange data. Originally, it focused on spectral descriptions of sound, but over the years SDIF has become more general, and now includes other, non-spectral, descriptors for sound, such as time-domain samples. The SDIF format specification consists of two parts. First, there is the specification of the actual file format and contents. Second, there is a list of standard data types, and their representation format. Data is stored in a sequence of frames, similar to the chunks in the IFF, AIFF or RIFF formats. Each frame contains same common data, such as a data type identifier, a time-tag, a stream identifier, and a number of matrices of floating point numbers, which contains the actual data (Wright, 1999). 3. About SMS Spectral Modeling Synthesis (SMS) is a set of techniques and software implementations for the analysis, transformation and synthesis of musical sounds, based on the decomposition of the sound into a deterministic plus a stochastic part (Serra, 1997). The output of the SMS analysis is a collection of frequency and amplitude values, representing the partials of the sound (sinusoidal, or deterministic component), and either filter coefficients with a gain value or spectral magnitudes and phases representing the residual sound (non sinusoidal, or stochastic component). The latter is obtained by subtracting the re-synthesized sinusoidal component from the original sound. To guide the tracking of the partials, it

Page  00000002 is useful to search for a possible fundamental frequency. This fundamental frequency can also be used to do pitchsynchronous analysis. The size of the analysis window is adjusted to the period of the fundamental frequency, which gives the best time-frequency trade-off possible. The SMS analysis makes use of re-analysis to obtain the most accurate result. The fundamental frequency that comes out of this process is considered to be the correct one, and therefore the SMS analysis data contains just one fundamental frequency per analysis frame, rather than a group of possibilities with confidence factors. In order to accomplish a musically meaningful parameterization for sound transformation, the deterministic plus stochastic model has been extended with the extraction of high level attributes. These attributes are calculated at each analysis frame from the output of the basic SMS analysis (Serra, Bonada, 1998). 4. Representing SMS data with SDIF The SDIF standard provides a representation of the data commonly used for the modeling of sounds with sinusoids and noise, with descriptors for fundamental frequency, short-term Fourier transform and sinusoidal tracks. Not surprisingly, since the SMS file format contains this data as well, the SMS file format and the SDIF file format have a lot in common, and it has been feasible as well as attractive to use SDIF as an alternative storage format within the SMS applications. Apart from giving users of the SMS software the possibility of interchanging data with other applications, it provides the possibility to read, write and modify SMS data with external applications with SDIF support, or to use the SDIF libraries to gain access to the data directly, thereby facilitating experimenting. The SMS high level attributes, as well as a description of the relationships between them, are not defined within the SDIF standard. These are very specific to the SMS analysis and synthesis, so it is not very likely that other applications could make use of them. However, it might be interesting to export these values as well, for experimentation or visualization. In that case, it would be useful if the SDIF standard would provide a way to add arbitrary data, with a textual description of each. Also, SMS can deal with segmentation into regions. The proposal of Xavier Rodet, to make use of markers, would allow this to be included in the SDIF export files. We would like to see this become part of the SDIF standard. The SDIF Frame Types, currently described by CNMAT and IRCAM (http://www.cnmat.berkeley.edu/SDIF, http://www.ircam.fr/sdif), which can be converted from and to SMS data, are shown in table 1. The internal storage of the SMS data is done in frames, which are grouped in regions, which are grouped in tracks. Each SMS frame contains all data needed to synthesize that particular frame, including sinusoidal tracks, residual, and fundamental frequency, as well as several high level attributes. In terms of storage format, the main difference between the SMS format and the SDIF format, is that SDIF stores all information in separate frames per data type where SMS bundles all this data in single frames. However, since the SDIF data is organized in streams, or sequences of interleaved streams ordered in time, the conversion from one format to the other is very straightforward. Table 1. Table of SDIF frame types extracted from SMS data files. Frame Type of data type 1FQO Fundamental Frequency Estimate 1STF Discrete Short-Term Fourier Transform 1TRC Sinusoidal Tracks The conversion from SMS to SDIF is loss-less as far as the high level attributes are not taken into consideration. When we convert SMS to SDIF files, we make sure that all high level attributes are incorporated first. This is important, because the data is rather meaningless when the extracted attributes are not accompanying it. It is possible to incorporate the SMS high level attributes into the SDIF data, thus canceling the effect of the extraction, and to extract it again when needed. This does not result into any noticeable loss of data. The conversion of an external SDIF file that contains at least residual or sinusoidal track into SMS will result into a valid SMS synthesis. The only problem here is that SMS lacks the notion of confidence in the Fundamental Frequency Estimates, as specified by the SDIF standard. Instead, it only uses one fundamental frequency, and if the SDIF file contains several estimations, only the one with the highest confidence will be used. As an SDIF frame can contain any number of matrices, it would be possible to store the used matrices in one single frame. However, this would make it more difficult for any other application to handle the data, because it would require looking inside all frames and going through all matrices when looking for a specific frame. Therefore we decided to follow the SDIF specification closely and store each matrix in the appropriate frame. The SMS analysis separates the sound into the sinusoidal components and the residual. The residual is stored as a Short-Term Fourier Transform in the SMS generated SDIF file. Here we see a shortcoming of the SDIF specification, or maybe of the use of interchange file formats in general. The SDIF standard does not provide any way to indicate that the stored Short-Term Fourier Transform is the analyzed residual, rather than the complete sound. Furthermore, as during the SMS analysis

Page  00000003 the Short-Term Fourier Transform is obtained both from the original sound, and, after subtraction of the sinusoidal component, from the residual, it might be useful to store both analyses in the same SDIF file. The SDIF standard allows this, making use of stream identifiers, but here as well the possibility to indicate the difference between the two is missing. 5. SDIFDisplay We are developing the SMS applications cross-platform, for both MS Windows and UNIX, in particular GNU/Linux. The SMS data can be visualized with the SMSTools software, which currently only runs under MS Windows. To provide similar functionality for UNIX, we have implemented a visualization tool that can display the data in SDIF files, called SDIFDisplay. It currently only supports the SDIF Standard Frame Types that are used by SMS, but it could be the basis of a full-featured SDIF graphical visualization/editing tool. Internally, the SDIFDisplay program makes use of several C++ classes for the visualization of the different SDIF frame types, all inherited from a common visualization widget. These visualization classes can be embedded in a widget group that provides scrollbars, zooming, rulers and an adjustment for the color-scale. The latter is very useful to enhance the visualization of values with a lowamplitude, which otherwise would be hardly visible. It is possible to zoom in, allowing accurate inspection of the analysis results. Furthermore, it is possible to synchronize the scrolling and zooming of all displayed frame types. Internally, the SDIFDisplay program makes use of C++ classes written on top of the CNMAT SDIF C-library, to provide easy, object-oriented access to the SDIF matrices and frames. SDIFDisplay makes use of FLTK, an open source and freely availably C++ graphical user interface library for X (UNIX) and Microsoft Windows. SDIFDisplay is open source, and freely available, and works under most UNIX flavors and Microsoft Windows. Figures 1 and 2 show screenshots from the interface of the SDIFDisplay program. 6. Conclusion We recommend further use and extension of the SDIF format. The main shortcomings we found while adding SDIF support to the SMS data, is the absence of time markers/regions, the possibility to indicate the relationship between the different frame types, and the possibility to add arbitrary application specific data, in a way that other applications would still be able to deal with it in a useful way. However the SDIF standard is being used by more and more applications, and a proposal has been made for the future maintenance and extension of the SDIF standard, (Wright, Chaudhary, Freed, Khoury, Wessel, 1999), even though the use of SDIF still lacks common practice. We hope to have contributed to the future of SDIF in a positive way by adding support for SDIF within the SMS applications. We would like to observe, that SDIF is indeed intended to be an interchange format, and that even though the format is flexible, it is likely that applications that are capable of reading and/or writing SDIF files, require their own application specific file-format as well. The SMS based applications developed by the Audiovisual Institute plus the SDIF utilities presented here are publicly available on the Web (http://www.iua.upf.es/sms). 7. References Herrera, P., X. Serra, G. Peeters. 1999. "Audio Descriptors and Descriptor Schemes in the Context of MPEG-7", Proceedings of the ICMC99. Serra, X. 1997. "Musical Sound Modeling with Sinusoids plus Noise". G. D. Poli and others (eds.), Musical Signal Processing, Swets & Zeitlinger Publishers, 1997. Serra, X., J. Bonada. 1998. "Sound Transformations on the SMS High Level Attributes". Proceedings of 98 Digital Audio Effects Workshop, Barcelona 1998. Wright, M., R. Dudas, S. Khoury, R. Wang, D. Zicarelli. 1999. "Supporting the Sound Description Interchange Format in the Max/MSP Environment", Proceedings of the ICMC99, Bejing, China, 1999. Wright, M. A. Chaudhary, A. Freed, S. Khoury, D. Wessel. 1999. "Audio Applications of the Sound Description Interchange Format", Proceedings of the Audio Engineering Society 107th Convention. Wright, M. 1999. "SDIF Specification", "SDIF Standard Frame Types", http://www.cnmat.berkeley.edu/SDIF Wright, M., E. Scheirer. 1999. "Cross-Coding SDIF into MPEG-4 Structured Audio", Proceedings of the ICMC99, Bejing, China, 1999.

Page  00000004 e 1. Screenshot of SDlFDisplay, with a full view of resulting from an SMS analysis. data are Z. Screen rt of SUDIUisplay zoomea In.