Page  559 ï~~Criterion of Data Complexity in Rhythm Recognition Andranick Tanguiane ACROE-LIFIA, Institut Polytechnique, 46, Av. Felix Viallet, F-38031 Grenoble Abstract Our approach to rhythm recognition is based on least complex representing data in terms of generative elements and their transformations. The complexity is understood by Kolmogorov, i.e. as the amount of memory needed to store the algorithm of the data generation. For our purposes, the time events are considered as generated by certain rhythmic patterns, while the time relationships between them determining a high-level pattern of tempo curve. The complexity is shared between the low-level rhythmic patterns and the algorithm of their transformations (elaboration and tempo deviations). This way the problem of rhythm recognition is brought to the optimal (least complex) two-level data representation. 1 Introduction By automatic notation of performed rhythm we understand a symbolic representation of time events with specified relative durations and tempo curve. Since the first publication on computer notation of performed music (Moorer 1977) this problem has been considered as one of the most important in music recognition. An error-analysis approach to rhythm recognition is developed in (Chafe et al. 1982; MontReynaud & Goldstein 1985), where the rhythm is treated as a pulse train distorted by tempo deviations and rhythmic elaboration (subdivisions of initial durations). In (Rosenthal 1988) rhythm is regardered as a mean of structural arranging musical data. A considerable progress in rhythm recognition is achieved with the help of neuron nets (Desain, Honing & de Rijk 1989) which are used to provide a simplified description of a sequence of time events. Our approach to rhythm recognition is based on the principle of correlativity of perception (Tanguiane 1989; 1990). By correlativity of perception we mean its capability to find out similar configurations of stimuli (i.e. structurally arranged groups) and to form high-level configurations with them in order to provide least complex data representations. The complexity of data is understood by Kolmogorov-as the amount of memory which is necessary to store the algorithm of their generation (Calude 1988). In our case, the low-level configurations are similar rhythmic patterns. The high-level configuration, determined by the time relationships between similar rhythmic patterns, is associated with the tempo curve. In other words, we consider repeating rhythmic patterns as reference units for tempo tracking. Such a hierarchical scheme of data representation is complemented with a feedback which guides the data representation in the least complex way. This corresponds to explaining rhythm function as simplifying music perception. 2 Overcoming Tempo and Rhythm Interdependence In music theory rhythm is understood as a pulse train, and tempo is defined to be its speed. Such a definition is ambiguous, since rhythmic patterns are recognized with respect to a certain tempo, while tempo being determined with respect to certain rhythmic patterns. Let us illustrate the ambiguity of rhythm and tempo with an example. Consider a sequence of durations (intervals between tone onsets) written down in two ways in fig. la and fig. lb. In fig. la the given sequence is interpreted as a single rhythmic pattern under a constant tempo, whereas in fig. 1b, as a ICMC 559

Page  560 ï~~repeated pattern under a tempo change. Since this sequence of durations is perceived rather as a single rhythmic pattern, the notation in fig. la is preferable. Now consider the same sequence of durations in the melodic context shown in fig. ic and fig. id. Since the melody repeat results in perceiving the sequence of time events as generated by a repeated pattern under a tempo change, the notation in fig. ld can be justified. A similar effect-the perception of longer or shorter durations as a tempo change-is used in fugues, where the theme is recognizable both in augmentation and in diminution. To explain such an ambiguity in tempo and rhythm, we refer to the principle of correlativity of perception. We assume that perception performs optimal data representation which may depend on the situation. Suppose that to code a duration not specifying the associated pitch, one byte is needed, but to code a duration with the associated pitch, two bytes are needed. Also suppose that to represent the events in terms of repetition, we refer to a repeat algorithm, which requires four bytes (the algorithm call, return point, number of repetitions, and new tempo instruction). Under these assumptions the complexity of our four representations is displayed in Table 1. One can see that the representation of the time events as a single rhythmic pattern under a constant tempo, corresponding to fig. la, is less complex out of the melodic context. On the contrary, in the given melodic context the representation with the reference to repeat algorithm, corresponding to fig. ld, is less complex. This meets the fact that melodic patterns, being more redundant, are more stable with respect to distortions like tempo deviations. Thus our approach is equivalent to considering repeated rhythmic patterns as reference time units which determine the perception of tempo. On the other hand, rhythmic patterns can be perceived as repeated only if they are commensurable in time with respect a certain tempo. Therefore, rhythm and tempo are interdependent concepts which cannot be defined separately. Nevertheless, we break this logical circle between the concepts of rhythm and tempo, introducing the criterion of least complex representation of the given sequence of time events. 3 Finding Similar Rhythmic Patterns To represent a sequence of time events we need collection of generative rhythmic patterns, description of their elaboration, and tempo curve. According to the principle of correlativity of perception, we have to find the optimal (least complex) representation. To obtain the desired representation, we apply correlation analysis to the time events under distortions of the time scale, similar to (Witkin 1983). To simplify the computational search for the desired representation, we suggest a method of directed search which we call the method of variable resolution. Let us illustrate the method with a simple example. Consider the following sequence of tone onsets: t = 0,10,19,30, where one unit is equal to 1/20 sec. (fig. 2a). Define the event function, putting _ 1 if t=0,1, 19,30, s(t)- j0 otherwise. We could try to detect period p in this sequence by peaks of the autocorrelation function C(p) = s(t-p)s(t), t but this method fails, since all time intervals are different and no period can be detected. In order to ignore small deviations in periodicity, we "blur" the events, replacing each "time point" by a "time spot" as shown in fig. 2b. Now the time intervals correlate. The greatest autocorrelation values for the blurred event function are displayed in the left section of table 2. Note that the autocorrelation peak at p = 0 is trivial. It also results in the peak at p = 1, since the events are blurred. The next peak occurs at the period centered at p = 10. After we have revealed period p, we can shift the associated correlated time events along the time scale, trying to increase the autocorrelation. The direction of such local scale distortions are determined by centers of the blurred events, recognized by an increase in autocorrelation. The shifts which provide the maximal autocorrelation (see fig. 2c and the right ICMC 560

Page  561 ï~~aI b) c) --1 1 =60 'J112 w l J=60.1=120 J=60 111111 d) J-6o J-12o Fig. 1 Table 1. The Complexity of Representation of Time Events in Fig. 1 Representation Representation Representation Representation corresponding corresponding corresponding corresponding to Fig. la to Fig. lb to Fig. ic to Fig. ld Complexity of data (amount of memory to store them) 6 bytes 3 bytes 12 bytes 6 bytes Complexity of algorithm of data transformations (amount of memory to store the call for the repetition algorithm) 0 bytes 4 bytes 0 bytes 4 bytes Total complexity of the representation 6 bytes 7 bytes 12 bytes 10 bytes a) I I 1 1 -40 - 10 19 30 b) 121000000012100000012100000000121 10 9 11 c) 121000000012100000001210000000121 10 10 10 Fig. 2 Table 2. The Most Salient Periods in the Sequence of Time Events in Fig. 2 in Fig. 2b in Fig. 2c Period Autocorrelation Period Autocorrelation p C(p) p C(p) 0 8 0 24 1 4 10 18 10 4 1 16 9 3 9 12 11 3 11 12 19 3 20 12 20 3 19 8 30 2 21 8 8 1 30 6 ICMC 561

Page  562 ï~~section of table 2) correspond to the desired distortions of the time scale. Our idea is to "pick up" the correlation under a poor resolution, and then restore the resolution step by step, keeping the correlation by appropriate adjustments of the scale. Note that this algorithm can be realized on a multilevel neuron net, each level of which corresponds to a certain scale resolution. Further we have to consider the hypothesis about the representability of the series of time events as a repeated rhythm. For that purpose we compare the complexity of direct coding with the complexity of the coding with a reference to repet algorithm. Obviously, the way of coding the tempo curve influences the final decision making. One of possible ways to code the tempo curve is to fix the instances when the tempo goes out of, say, the 5%-neighborhood of the current value. Such a heuristic algorithm meets zone nature of perception and its logarithmic sensitivity. The validity of our approach was proved by computer experiments on recognizing the snare drum part from "Bolero" by M. Ravel performed by striking the computer keyboard. The rhythm recurrency was recognized already by the end of the first measure, while the recurrency being detected both at the level of eights and crotches. The recurrency was also recognized at the level of sixteenth triplets, but with much greater irregularities, since the deviations in their relative durations were up to ratio 7:6:10. 4 Conclusions The main statements of the present paper are the following: " Our approach to rhythm recognition is based on some general ideas (principle of correlativity of perception) rather than on a particular knowledge. In a sense the model is "self-sufficient", since it chooses between alternative data representations. * The interdependence between concepts of tempo and rhythm is overcome by introducing the criterion of least complex data representation. " The computational model of rhythm recognition is based on correlation analysis of time events under distortions of the time scale. To find the required distortions which provide high correlation of time events, a method of variable resolution is proposed. References Calude C. (1988) Theories of Computational Complexity. Amsterdam: North-Holland. Chafe C., Mont-Reynaud B. & Rush L. (1982) Toward an Intelligent Editor of Digital Audio: Recognition of Musical Constructs. Computer Music J. 6(1), 30-41. Desain P., Honing H. & de Rijk K. (1989) A Connectionist Quantizer. Proc. 1989 Int. Computer Music Conf. San Francisco: Computer Music Association. 80-85. Mont-Reynaud B. & Goldstein M. (1985) On Finding Rhythmic Patterns in Musical Lines. Proc. 1985 Int. Computer Music Conf. San Francisco: Computer Music Association. 391-397. Moorer J.A. (1977) On the Transcription of Musical Sound by Computer. Computer Music J. 1(4), 32-38. Rosenthal D. (1988) A Model of the Process of Listening to Simple Rhythms. Proc. 14th Int. Computer Music Conf. Cologne: Feedback-Studio-Verlag. 189-197. Tanguiane A.S. (1989) A Principle of Relativity of Perception and Its Applications to Pattern Recognition in Analysis of Performed Music. Proc. First Int. Conf. on Music Perception and Cognition, Kyoto, Japan, 17-19 October, 1989. 261-266. Tanguiane A.S. (1990) A Model of Correlative Perception and Certain Problems of Pattern Recognition. Matematicheskoye Modelirovaniye 2(8), 90-111 (Russian). English version submitted to Music Perception. Witkin A.P. (1983) Scale-Space Filtering. Proc. 8th Imt. Joint Conf. on A rtificial Intelligence. Vol. 2. Karlsruhe, West Germany. 1019-1022. ICMC 562