Recognition of Isolated Musical Patterns in the context of Greek Traditional Music using Dynamic Time Warping TechniquesSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact email@example.com to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 00000001 Recognition of Isolated Musical Patterns in the context of Greek Traditional Music using Dynamic Time Warping Techniques Aggelos Pikrakis (1) Sergios Theodoridis (1) Dimitris Kamarotos (2) (1) Department of Informatics, University of Athens, Greece pikrakis @di.uoa.gr, firstname.lastname@example.org (2) IPSA-Aristotle University of Thessaloniki, email@example.com Abstract New sound processing tools can give new possibilities in analysis of musical structures, modeling of an instrument's characteristics and musical pattern recognition. This paper presents an efficient method for recognizing isolated musical patterns played by Greek Traditional Clarinet, in a monophonic environment. A moving window technique is applied to the input signal. From each frame a frequency is selected which corresponds to a certain Fourier peak. The extracted sequence of frequencies is in turn treated as a signal. The logarithm of the above signal is calculated and the resulting signal is given as input to a recognizer which employs standard Dynamic Time Warping Techniques in order to determine the input pattern. A recognition rate higher that 98% is achieved by our method. 1 Introduction In this paper we present a new concept, that can be applied in a semi-automated estimation of similarities between large numbers of musical patterns of a given monophonic instrument. The musical patterns are traditionally categorised for their qualities, described in perceptual terms by musicians. In this case we examine musical patterns, performed by clarinet players in the tradition of Greek Popular Music. The musical system itself and the techniques of instrument players give to this sound material a radically different structure when compared with this of western equal-tempered tradition. Some major differences are: * The use of larger, formalised transitory patterns as a main element of the musical structure and not as an ornamental one, as it is in the case of western musical tradition. * The existence of smaller "melismatic" transitional patterns between notes. * The intervalic system itself (system of musical scales) that contains many musical intervals that are smaller than the well-tempered. * The existence of special effects, mainly in the start or the ending of sound production, seriously altering the standard spectral characteristics of the instrument. From a large number of transitory musical patterns, in different instrumental styles, we have selected ten typical models of groups, usually encountered in practice. The criteria for this choice were the common use of these patterns in various Greek traditions and the differences on time length that occur in their use. This last musical characteristic -the elasticity of the musical pattern, retaining its musical function, while stretching its total length up to five times in some cases- is a major problem when trying to recognize this material. The goal of this paper is to develop a classification scheme for these ten types of musical patterns, to assist the musicologist to search and locate automatically such musical patterns in a database or digital recording. 2 Feature Generation Our proposed scheme is based on the Dynamic Time Warping (DTW) technique. The first stage of our method is to extract the sequence of features from the musical pattern, in order to use it for the recognition via the DTW methodology. To achieve this, a moving window Fourier transform is performed on the musical pattern. For each window the frequencies corresponding to the two most dominant peaks are selected as feature candidates. From these two frequencies the lowest is selected as the respective feature if it either corresponds to the most dominant peak or it corresponds to a peak whose amplitude is higher than a percentage threshold relative to the most dominant peak. The value of this threshold was set equal to 0.25, after extensive experimentation. In any other case the higher of the two
Page 00000002 frequencies is selected as the respective feature. Furthermore, if the highest Fourier peak of a frame is less than a pre-defined threshold, the frame is considered to be noisy and the extracted frequency is set to one. Figure 1 shows the evolution of the frequency content of a musical pattern over time. It demonstrates that the frequency content of the signal is split into horizontal frequency bands. Ideally the dominant frequency should always be found on the same band. It would then suffice to choose the frequency corresponding to the dominant peak from each frame. However, this is not always the case. It turns out that the dominant peak can be found in either of the two lowest bands. So, the algorithm should always track the lowest distinguishable frequency band, even if for some frames the dominant peak isn't located in that band. This is the reason that the previously mentioned percentage threshold was adopted, in the feature extraction process. Let now vi, i=]...N be the extracted sequence of frequencies, where N is the number of frames into which the original input pattern is split, during the application of the moving window technique (Figure 1). As a post-processing step the logarithm of v is calculated, pi=log(vi), i=].. N. This is because we are trying to imitate some aspects of the human auditory system, which is known to analyse an input pattern using a logarithmic frequency axis (Figure 2). 2 0 10 20 30 40 60 0 4 3 2 I. 0 10 20 30 40 Frames BO 0 h 2000 -1500 -1000 -500 - ~1?L~ ~CC~ 5 10 15 20 25 30 35 40 45 50 (b) 600 -400 -200 - Figure 2: Post processing step for two input patterns. The vertical axis corresponds to log(vi) leading to vectors Vrj, against which the unknown pattern is compared. A similar procedure is applied to the input pattern leading to a vector ti, i=]...M, where M~ is the number of frames into which the unknown pattern is split, during the application of the moving window technique. Before proceeding, a constant c has to be calculated and subtracted from vector t, depending on the reference pattern Vrj with which the unknown t is compared. The value of c is determined in the following way: Let Fr, be the starting frequency of r, and F, the starting frequency of t (after taking the logarithms). Then c = F'rn- t For each one of the above comparisons a matching cost and a best path are calculated. Recognition is achieved according to the lowest matching cost comparison (Figure 3). We performed extensive experiments regarding the constraints that should be employed for the DTW procedure. We especially focused on the Itakura and Sakoe-Chiba constraints and finally we adopted the latter ones. The adopted constraints allow the best path to contain vertical or horizontal segments of arbitrary length (Figure 4). 0 5 iO 15 20 25 30 35 40 45 50 Framee Figure 1: a) Contour plot of the spectrogram of a musical pattern b) The extracted frequency vector. 3 The Dynamic Time Warping Methodology The last stage employs DTW in order to determine the input musical pattern. Our effort is focused on ten types of musical patterns and for each type a reference (representative) pattern is chosen. For each one of the ten reference patterns ri,..1....10, the sequence of feature frequencies is extracted and their logarithm is computed,
Page 00000003 4 Results 5 References A total of 1000 musical samples for the ten musical types  J. Deller, J. Proakis, J. Hansen, "'Discrete-Time was created in a recording studio environment by four Processing of Speech Signals", McMillan, 1993. different musicians. A recognition rate higher than 98% was achieved by our method. oo=0,0989 100 0 10 20 30 40 50 60 70 80 ot=0,4109 100 0 10 20 30 40 50 60 70 80 o=0,0322 100 0 10 20 30 40 50 60 70 80 Test pattemrn X Figure 3: Comparing an input pattern x of musical type M3 with three reference patterns rl, r2, r3 representing musical types MI, M2, M3. The closest match is the one between x and M3.  H.F. Silverman, D.P. Morgan, " The Application of Dynamic Programming to Connected Speech Recognition", IEEE ASSP Magazine, July 1990.  J.C Brown, "Musical fundamental frequency tracking using a pattern recognition method", J. Acoust. Soc. Am. 92(3), September 1992.  S.Karas "Theoritikon - Methodos, on Greek Traditional Music", ed. Athens 1982.  D.R. Stammen, B. Pennycook, "'Real-time Recognition of Melodic Fragments using the Dynamic Time Warping Algorithm", ICMC Proceedings 1993. (i-lj) ~i.1,$ (ij) (W-1) Figure 4: Sakoe-Chiba constraints (no slope constraint).