2 Audio processing and synthesis
From the initial development of the Sinusoidal plus
Residual model (also known as Spectral Modeling
Synthesis) as part of the PhD thesis of Xavier Serra at
Stanford University (Serra, 1989) and from the
improvements carried out since then at the UPF by a
number of researchers, the MTG is recognized as a
worldwide reference on spectral based audio processing
techniques. In fact, many of the research projects are based,
or related, to these signal processing techniques.
In the last ten years, and in the context of different research
projects, there have been many improvements to the basic
spectral model and its implementation. Some of the key
developments have been based on the identification and
extraction of audio features that are relevant for particular
applications or particular sound families. New models have
also been proposed and efficient and flexible
implementations of these models have been done.
The development of a generic sound synthesis system has
been a major goal of the MTG since the beginning. A recent
example of this type of work is the SALTO project (Haas,
2001) in which a real-time wind instrument synthesizer was
developed. This is an example of applying the basic
Sinusoidal plus Residual model to a particular application
and working with a particular family of sounds.
A very different spectral based analysis/synthesis approach
was required for the development of an automatic nearlossless time stretching system (Bonada, 2000). This
technology permits changing the duration of an audio
sequence without modifying the timbre and pitch of its
content. The basic Sinusoidal plus Residual model is not
appropriate for such an application and a model had to be
developed for time-scaling any audio signal.
3 Audio content analysis
The efficient management of sound archives (it doesn't
matter if they are personal or institutional) requires the
usage of indexes and categories that cover different levels of
description. Additionally to the traditional manually
generated meta-data (format, resolution, channels, author,
year, performer, etc.) it is possible to automatically generate
some "descriptors" that will capture the sonological or
musical features that are embedded in the audio files. The
automatic generation of descriptors uses traditional analysis
and signal processing techniques, but it also adapts
techniques from other fields such as artificial intelligence or
data mining. Using those multiple techniques, the signal
content is extracted and descriptors with different
abstraction levels are generated (more or less
understandable for a "non-technical" user). Apart from the
descriptors, it is also interesting to study the relations
between the different levels of description in order to get a
continuous flow from the physical signal to the symbolic
labels used to represent its content.
The area of audio content analysis is the one in which the
MTG is putting the biggest effort at this time. We are
working on many of the key issues of the field, including
instrument classification (Herrera et alt., 2003), melodic
description (G6mez et alt, 2003) and rhythmic description
(Gouyon and Meudic, 2003).
A particular example of a framework in which we carry out
all this research is the SIMAC project. SIMAC addresses
the study and development of innovative components for a
music information retrieval system. The key feature is the
usage and exploitation of semantic descriptors of musical
content which are automatically extracted from music titles.
These descriptors are generated in two ways: as derivations
and combinations of lower-level descriptors and as
generalizations induced from manually annotated databases
by the intensive application of data mining techniques. The
project aims also towards the empowering (i.e. adding
value, improving effectiveness) of music consumption
behaviours, especially of those that are guided by the
concept of similarity. The gained knowledge about all this
(semantic descriptors, similarity, collection organization,
retrieval patterns, etc.), regarding users as individuals but
also as members of organized communities, become
operational as prototypes for:
* Annotating music items and collections in a way
that can be exploited by data mining algorithms,
* Organizing -sonically and visually- music
collections,
* Discovering interesting music by exploiting
content analysis and user profiling, and
* Interacting with them in ways that will add value to
owned music.
These inter-connectable components are devised, developed,
and tested in connection with real communities of users and
of content distributors.
4 Audio identification
A specific topic in content analysis is audio identification.
In this area we have had very successful and practical
results (Batlle et alt., 2004). The technology developed in
the MTG is based on a low level audio analysis that extracts
parameters related to the spectral characteristics of the
sound and its temporal evolution. With this analysis an
audio signal is coded as a sequence of descriptors, where
each one of these descriptors is associated with a specific
spectral distribution. This reduces the amount of
Proceedings ICMC 2004
0