Using the Sound Description Interchange Format within the SMS Applications

de Boer, Maarten; Bonada, Jordi; Serra, Xavier

« Prev Next »

is useful to search for a possible fundamental frequency. This fundamental frequency can also be used to do pitchsynchronous analysis. The size of the analysis window is adjusted to the period of the fundamental frequency, which gives the best time-frequency trade-off possible. The SMS analysis makes use of re-analysis to obtain the most accurate result. The fundamental frequency that comes out of this process is considered to be the correct one, and therefore the SMS analysis data contains just one fundamental frequency per analysis frame, rather than a group of possibilities with confidence factors. In order to accomplish a musically meaningful parameterization for sound transformation, the deterministic plus stochastic model has been extended with the extraction of high level attributes. These attributes are calculated at each analysis frame from the output of the basic SMS analysis (Serra, Bonada, 1998). 4. Representing SMS data with SDIF The SDIF standard provides a representation of the data commonly used for the modeling of sounds with sinusoids and noise, with descriptors for fundamental frequency, short-term Fourier transform and sinusoidal tracks. Not surprisingly, since the SMS file format contains this data as well, the SMS file format and the SDIF file format have a lot in common, and it has been feasible as well as attractive to use SDIF as an alternative storage format within the SMS applications. Apart from giving users of the SMS software the possibility of interchanging data with other applications, it provides the possibility to read, write and modify SMS data with external applications with SDIF support, or to use the SDIF libraries to gain access to the data directly, thereby facilitating experimenting. The SMS high level attributes, as well as a description of the relationships between them, are not defined within the SDIF standard. These are very specific to the SMS analysis and synthesis, so it is not very likely that other applications could make use of them. However, it might be interesting to export these values as well, for experimentation or visualization. In that case, it would be useful if the SDIF standard would provide a way to add arbitrary data, with a textual description of each. Also, SMS can deal with segmentation into regions. The proposal of Xavier Rodet, to make use of markers, would allow this to be included in the SDIF export files. We would like to see this become part of the SDIF standard. The SDIF Frame Types, currently described by CNMAT and IRCAM (http://www.cnmat.berkeley.edu/SDIF, http://www.ircam.fr/sdif), which can be converted from and to SMS data, are shown in table 1. The internal storage of the SMS data is done in frames, which are grouped in regions, which are grouped in tracks. Each SMS frame contains all data needed to synthesize that particular frame, including sinusoidal tracks, residual, and fundamental frequency, as well as several high level attributes. In terms of storage format, the main difference between the SMS format and the SDIF format, is that SDIF stores all information in separate frames per data type where SMS bundles all this data in single frames. However, since the SDIF data is organized in streams, or sequences of interleaved streams ordered in time, the conversion from one format to the other is very straightforward. Table 1. Table of SDIF frame types extracted from SMS data files. Frame Type of data type 1FQO Fundamental Frequency Estimate 1STF Discrete Short-Term Fourier Transform 1TRC Sinusoidal Tracks The conversion from SMS to SDIF is loss-less as far as the high level attributes are not taken into consideration. When we convert SMS to SDIF files, we make sure that all high level attributes are incorporated first. This is important, because the data is rather meaningless when the extracted attributes are not accompanying it. It is possible to incorporate the SMS high level attributes into the SDIF data, thus canceling the effect of the extraction, and to extract it again when needed. This does not result into any noticeable loss of data. The conversion of an external SDIF file that contains at least residual or sinusoidal track into SMS will result into a valid SMS synthesis. The only problem here is that SMS lacks the notion of confidence in the Fundamental Frequency Estimates, as specified by the SDIF standard. Instead, it only uses one fundamental frequency, and if the SDIF file contains several estimations, only the one with the highest confidence will be used. As an SDIF frame can contain any number of matrices, it would be possible to store the used matrices in one single frame. However, this would make it more difficult for any other application to handle the data, because it would require looking inside all frames and going through all matrices when looking for a specific frame. Therefore we decided to follow the SDIF specification closely and store each matrix in the appropriate frame. The SMS analysis separates the sound into the sinusoidal components and the residual. The residual is stored as a Short-Term Fourier Transform in the SMS generated SDIF file. Here we see a shortcoming of the SDIF specification, or maybe of the use of interchange file formats in general. The SDIF standard does not provide any way to indicate that the stored Short-Term Fourier Transform is the analyzed residual, rather than the complete sound. Furthermore, as during the SMS analysis 0

« Prev Next »