Current Research in Music Technology at the Audiovisual Institute of the Pompeu Fabra University

Serra, Xavier

« Prev Next »

2 Audio processing and synthesis From the initial development of the Sinusoidal plus Residual model (also known as Spectral Modeling Synthesis) as part of the PhD thesis of Xavier Serra at Stanford University (Serra, 1989) and from the improvements carried out since then at the UPF by a number of researchers, the MTG is recognized as a worldwide reference on spectral based audio processing techniques. In fact, many of the research projects are based, or related, to these signal processing techniques. In the last ten years, and in the context of different research projects, there have been many improvements to the basic spectral model and its implementation. Some of the key developments have been based on the identification and extraction of audio features that are relevant for particular applications or particular sound families. New models have also been proposed and efficient and flexible implementations of these models have been done. The development of a generic sound synthesis system has been a major goal of the MTG since the beginning. A recent example of this type of work is the SALTO project (Haas, 2001) in which a real-time wind instrument synthesizer was developed. This is an example of applying the basic Sinusoidal plus Residual model to a particular application and working with a particular family of sounds. A very different spectral based analysis/synthesis approach was required for the development of an automatic nearlossless time stretching system (Bonada, 2000). This technology permits changing the duration of an audio sequence without modifying the timbre and pitch of its content. The basic Sinusoidal plus Residual model is not appropriate for such an application and a model had to be developed for time-scaling any audio signal. 3 Audio content analysis The efficient management of sound archives (it doesn't matter if they are personal or institutional) requires the usage of indexes and categories that cover different levels of description. Additionally to the traditional manually generated meta-data (format, resolution, channels, author, year, performer, etc.) it is possible to automatically generate some "descriptors" that will capture the sonological or musical features that are embedded in the audio files. The automatic generation of descriptors uses traditional analysis and signal processing techniques, but it also adapts techniques from other fields such as artificial intelligence or data mining. Using those multiple techniques, the signal content is extracted and descriptors with different abstraction levels are generated (more or less understandable for a "non-technical" user). Apart from the descriptors, it is also interesting to study the relations between the different levels of description in order to get a continuous flow from the physical signal to the symbolic labels used to represent its content. The area of audio content analysis is the one in which the MTG is putting the biggest effort at this time. We are working on many of the key issues of the field, including instrument classification (Herrera et alt., 2003), melodic description (G6mez et alt, 2003) and rhythmic description (Gouyon and Meudic, 2003). A particular example of a framework in which we carry out all this research is the SIMAC project. SIMAC addresses the study and development of innovative components for a music information retrieval system. The key feature is the usage and exploitation of semantic descriptors of musical content which are automatically extracted from music titles. These descriptors are generated in two ways: as derivations and combinations of lower-level descriptors and as generalizations induced from manually annotated databases by the intensive application of data mining techniques. The project aims also towards the empowering (i.e. adding value, improving effectiveness) of music consumption behaviours, especially of those that are guided by the concept of similarity. The gained knowledge about all this (semantic descriptors, similarity, collection organization, retrieval patterns, etc.), regarding users as individuals but also as members of organized communities, become operational as prototypes for: * Annotating music items and collections in a way that can be exploited by data mining algorithms, * Organizing -sonically and visually- music collections, * Discovering interesting music by exploiting content analysis and user profiling, and * Interacting with them in ways that will add value to owned music. These inter-connectable components are devised, developed, and tested in connection with real communities of users and of content distributors. 4 Audio identification A specific topic in content analysis is audio identification. In this area we have had very successful and practical results (Batlle et alt., 2004). The technology developed in the MTG is based on a low level audio analysis that extracts parameters related to the spectral characteristics of the sound and its temporal evolution. With this analysis an audio signal is coded as a sequence of descriptors, where each one of these descriptors is associated with a specific spectral distribution. This reduces the amount of Proceedings ICMC 2004 0

« Prev Next »