THE RHYTHM TRANSFORM:
TOWARDS A GENERIC RHYTHM DESCRIPTION
Enric Guaus, Perfecto Herrera
Music Technology Group, Institut Universitari de l'Audiovisual
Universitat Pompeu Fabra
{enric.guaus,perfecto.herrera}@iua.upf.es
http://www.iua.upf.es/mtg
ABSTRACT
In the past few years, automatic genre classification has
become one of the most interesting topics in the Music Information Retrieval (MIR) field. Musical genre is one of
the most valuable metadata when managing huge music
databases and many successful efforts have been done to
automatically compute it. Basically, timbric and rhythmic description of audio is used for this purpose. From
our point of view, some of these descriptors have been
designed under some rigid constrains and the generalization of these algorithms into more flexible applications becomes a difficult task. In this paper, we present a method
for rhythmic description which can improve the efficiency
and flexibility of genre classifiers when used in combination with timbre descriptors.
1. INTRODUCTION
Musical Genre Classification has become one of the most
valuable metadata when managing huge music databases.
Broadcast radio stations or CD-Stores base its organization on Musical Genres. As a consequence, the Music Information Retrieval community has developed many different algorithms for Automatic Genre Classification [ 1 ][2].
The problem arises when Genre has to be defined. Neither musicians nor musicologists agree with a common
definition of genre. For simplicity, genre can be defined
as a class of music with a set of common properties that in
the perception of the average listener distinguish music in
that category from the other songs[3]. Unfortunately, no
more precise definition is possible.
From a technical point of view, the basic structure for
most of the systems can be divided into three different
parts: feature extraction, pattern recognition and conditioning.
Feature extraction is the most important part of the process. Depending on the extracted features, the system
will classify genres according to this specific description.
In a general way, timbre related descriptors (MFCC and
derivatives) and rhythm related descriptors (onsets, time
signature and beat) are used. Some melodic and harmonic
features should also be included in more sophisticated forthcoming systems.
The second step is usually based on Pattern Recognition techniques. There are several general-purpose machinelearning and heuristic-based techniques that can be adapted
to this task. Hidden Markov Models or Neural Networks
are the usual "tools of the trade". Finally, some conditioning of the output data needs to be done.
As mentioned above, some systems use rhythmical description for genre description. It is commonly accepted
that rhythm is an important musical feature when describing genre. Important studies about rhythmical properties
of music can be found in [4] [5]. Beat tracking and meter
detection are also very active topics in the MIR community. But some problems appear when some non-rhythmic
data have to be recognized, i.e. classical music or speech.
The goal of this paper is to propose and discuss about a
flexible rhythmic description and their inclusion in a genre
classification system in order to improve its robustness.
Speech data has been included to the database forcing the
system to have a high variety of rhythms, as well as Dance
and Classic music are also included providing a high variety in timbre. The paper is organized as follows: In Section 2, the proposed rhythm transform is explained and
how to use it in a real environment is shown. In Section
3 the classification scheme and the used databases are introduced, in Section 4 the experiments and results are discussed and, finally, the conclusions and future work can
be found in Section 5
2. FEATURE EXTRACTION
2.1. The Rhythm Transform
As mentioned in the previous section, a number of different algorithms for rhythm feature extraction have been
developed (e.g. Beat Histogram proposed by Tzanetakis
in [1] or the Beat Spectrum proposed by Foote in [6]).
Although some of them provide a general representation
of the rhythmicity of the input signal, manual tuning is
usually required. Other successful studies on rhythm are
based on the the periodicities in some specific BPM values (e.g. the Beat tracking systems proposed by Scheirer
in [7] and Goto in [8]) but they do not provide rhythmical information beyond the BPM scale. Here, we use the
proposed Rhythm Transform for computing the rhythmic