ï~~AN EFFICIENT OFF-LINE BEAT TRACKING METHOD FOR MUSIC WITH STEADY TEMPO Bee Suan Ong & Sebastian Streich Center for Advanced Sound Technologies Yamaha Corporation 203 Matsunokijima, Iwata, Shizuoka, 438-0192, Japan { beesuan,sstreich} @beat.yamaha.co.jp ABSTRACT This paper presents an efficient off-line approach for automatic estimation of tempo and the exact temporal locations of beats in rhythmic music signals with steady tempo. Different from previous approaches, our method first finds an appropriate location to start beat detection within the analyzed signal. Consequently, the initial beat estimations reach a high level of reliability. The method then infers the beat locations in the rest of the signal by using the global tempo estimate. The processing involves three stages: rhythm feature extraction, tempo estimation, and beat induction. The performance of the proposed method is evaluated using a database of 160 music excerpts from 13 different genres. The global tempo recognition rate reaches 92%, the average p-score for beat tracking is 0.7. The results indicate that despite its simplicity, the proposed approach is successful in extracting rhythmic description from music audio signals and suitable for targeted applications. 1. INTRODUCTION Rhythmic descriptions are one of the most basic and important elements used in characterizing music by human listeners. The ability to capture such musical representation brings benefit to manipulation and retrieval of music audio data in a variety of applications. One example is the mobile music player that automatically selects songs with a suitable tempo for jogging or work-out. Another example is the beatsynchronous playback of tracks in an easy-to-use mixing application for dance music. Over the last few years, automatic extraction of rhythmic descriptions (particularly beat and tempo information) directly from music signals has received a lot of attention. The current state of the art can be observed in the results of several recent MIREX' events. According to [7], most approaches use either the signal energy along time [6] or the detected onsets [2] to derive the beat periodicity of any type of music signal. Collins [1] points out that style-specific approaches towards beat tracking, very much like human listeners, lhttp://www.music-ir.org/mirex/2008/index.php/Main Page Tempo Tempo Estimation E_ I Music Rhythm Beat Candidate Beat Input - Feature Filtering Locations Signal Extraction Beat Grid Determination Beat Induction Figure 1. Block diagram of the proposed method. can take advantage of conventions and characteristic properties of the material. By doing so, they can outperform their generalized equivalents by considerable margins within their domain. On the other hand, a method with an extreme degree of specialization also puts strong restrictions on potential practical applications. Our method follows an intermediate way by requiring the tempo to be constant throughout the track, limited to the range of 60 to 200 beats-per-minute (bpm), and the beat to be regular within a certain tolerance range. While this approach is doomed to fail for classical music and is likely to encounter problems in categories like Jazz or Latin, it is reasonable for the great majority of modern Pop, Rock, and Dance music. We further chose to exploit the advantages of an off-line or non-causal system for the benefit of a higher accuracy in the beat estimation. Our method is conceptually very similar to the one described by Ellis in [4]. However, we rely on a simpler front-end and apply several simple selection criteria for the beat induction instead of the proposed dynamic programming approach. The rest of the paper is organized as follows: Section 2 explains our method for estimating tempo and beat locations. Section 3 shows and discusses the evaluation results. Finally, Section 4 presents our conclusions and future research plans. 2. METHOD Our proposed approach involves three main processing stages as shown in Figure 1. The first stage consists in the rhythm feature extraction, which in our case is an onset detection function based on the differences between spectral frames. Using this feature we then
Top of page Top of page