THE RHYTHM TRANSFORM: TOWARDS A GENERIC RHYTHM DESCRIPTION

Guaus, Enric; Herrera, Perfecto

THE RHYTHM TRANSFORM: TOWARDS A GENERIC RHYTHM DESCRIPTION Enric Guaus, Perfecto Herrera Music Technology Group, Institut Universitari de l'Audiovisual Universitat Pompeu Fabra {enric.guaus,perfecto.herrera}@iua.upf.es http://www.iua.upf.es/mtg ABSTRACT In the past few years, automatic genre classification has become one of the most interesting topics in the Music Information Retrieval (MIR) field. Musical genre is one of the most valuable metadata when managing huge music databases and many successful efforts have been done to automatically compute it. Basically, timbric and rhythmic description of audio is used for this purpose. From our point of view, some of these descriptors have been designed under some rigid constrains and the generalization of these algorithms into more flexible applications becomes a difficult task. In this paper, we present a method for rhythmic description which can improve the efficiency and flexibility of genre classifiers when used in combination with timbre descriptors. 1. INTRODUCTION Musical Genre Classification has become one of the most valuable metadata when managing huge music databases. Broadcast radio stations or CD-Stores base its organization on Musical Genres. As a consequence, the Music Information Retrieval community has developed many different algorithms for Automatic Genre Classification [ 1 ][2]. The problem arises when Genre has to be defined. Neither musicians nor musicologists agree with a common definition of genre. For simplicity, genre can be defined as a class of music with a set of common properties that in the perception of the average listener distinguish music in that category from the other songs[3]. Unfortunately, no more precise definition is possible. From a technical point of view, the basic structure for most of the systems can be divided into three different parts: feature extraction, pattern recognition and conditioning. Feature extraction is the most important part of the process. Depending on the extracted features, the system will classify genres according to this specific description. In a general way, timbre related descriptors (MFCC and derivatives) and rhythm related descriptors (onsets, time signature and beat) are used. Some melodic and harmonic features should also be included in more sophisticated forthcoming systems. The second step is usually based on Pattern Recognition techniques. There are several general-purpose machinelearning and heuristic-based techniques that can be adapted to this task. Hidden Markov Models or Neural Networks are the usual "tools of the trade". Finally, some conditioning of the output data needs to be done. As mentioned above, some systems use rhythmical description for genre description. It is commonly accepted that rhythm is an important musical feature when describing genre. Important studies about rhythmical properties of music can be found in [4] [5]. Beat tracking and meter detection are also very active topics in the MIR community. But some problems appear when some non-rhythmic data have to be recognized, i.e. classical music or speech. The goal of this paper is to propose and discuss about a flexible rhythmic description and their inclusion in a genre classification system in order to improve its robustness. Speech data has been included to the database forcing the system to have a high variety of rhythms, as well as Dance and Classic music are also included providing a high variety in timbre. The paper is organized as follows: In Section 2, the proposed rhythm transform is explained and how to use it in a real environment is shown. In Section 3 the classification scheme and the used databases are introduced, in Section 4 the experiments and results are discussed and, finally, the conclusions and future work can be found in Section 5 2. FEATURE EXTRACTION 2.1. The Rhythm Transform As mentioned in the previous section, a number of different algorithms for rhythm feature extraction have been developed (e.g. Beat Histogram proposed by Tzanetakis in [1] or the Beat Spectrum proposed by Foote in [6]). Although some of them provide a general representation of the rhythmicity of the input signal, manual tuning is usually required. Other successful studies on rhythm are based on the the periodicities in some specific BPM values (e.g. the Beat tracking systems proposed by Scheirer in [7] and Goto in [8]) but they do not provide rhythmical information beyond the BPM scale. Here, we use the proposed Rhythm Transform for computing the rhythmic