Page  248 ï~~A Cognitive Model of Pattern Matching in Music Yuzuru Hiraga University of Library and Information Science 1-2 Kasuga, Tsukuba, 305 JAPAN jp Abstract This paper presents a scheme for p)attern matching in music. This scheme is oriented towards structural matching, which seeks for p)atterns reflecting the music structure. The underlying claim is that interaction with the memory of previously heard sequences is fundamental in music cognition. Matching of p)atterns is based on the common occurrence of component features, which enables handling a. wide range of similarities, and provides insight into the mental representation and processing of music. 1 Introduction The analysis/cognition of music structure requires not only the knowledge of general rules and principles, but, also interaction with the memory of preceding individual events. When we listen to music, we can easily recognize recurrent patterns of great diversity, or can identify a. known tune instantaneously. These examples suggest the manipulation of structural patterns - their identification, matching, storage and access - as important issues in music cognition research. Although their importance is well noticed (e.g. [Stammen & Pennycook, 1993], [Rowe & Li, 1994], [Katayose & Inokuchi, 1995]); reflection upon our music experience seems to indicate that the vast extent of pattern processing in music is yet to be explored. This paper is oriented toward the characterization of our cognitive capabilities through a computational viewpoint. This stance leads us to rethink the basic issues on the role and nature of pattern manipulation. In particular, we emphasize that patterns are directly related to how music is represented in our memory, and thus its processing is fundamental in music cognition. 2 Pattern in Music Consider the nursery rhyme Are you sleeping? (Frere Jacques?). Its initial sequence "CDECCDECEFG..." clearly has a, repeated pattern of "CDEC". Assuming the absence of articulation (as in a flat computer performance), segmentation rules based on local discontinuity cannot account for the segmentation at "C-C" (and may even falsely point at "E-C"); so segmentation is induced primarily from the matching of the two (identical) sequences. This means that matching is prevalent and autonomous in the cognitive process, to be regarded as a primitive operation working at the same level as the segment detectors. Unlike the exact. repetition above, general cases of matching manages to recognize a diverse range of similarity over a wide temporal range. This requires the matching process to be highly elaborate. The nature of the matching is also associated with the function of the sequences within the piece. The above refers to matching within the piece. Matching also occurs with known music., even starting at midpoint. Although their functions are quite different (the function of the latter is mainly identification), it would be reasonable to assume tha.t they involve the same or similar cognitive mechanism. This means that the matching procedure has a. vast amount of data, within its scope. 3 Alternative Views 3.1 Matching Strategy In a. standard approach to pattern matching, the pattern is represented in some form (e.g. an encoded numerical sequence) with an associated distance measure of similarity. This distance is taken to indicate the strength of the match, and the pattern is identified according to this value. An alternative view is to characterize matching as the sharing of common component features. For example, a diatonic transposition (e.g. CDE and EFG) can be matched on basis of having the same diatonic contour {+1, +1}, in spite of the difference in pitch and chromatic interval. Hiraga 248 ICMC Proceedings 1996

Page  249 ï~~These two views are not distinct in that exact match can be changed to allow some degree of tolerance, or the results of the component match can be evaluated with a distance measure. The essential difference is in that the latter is more inclined to taking a highly structurized view of the pattern, which is encapsulated in the distance value in the former. Another alternative is to base matching on transformations. Two patterns can be matched if one can be transformed into the other (or both can be transformed into the same pattern) by given transformation functions. This view emphasizes the difference rather than the similarity of the pa.tterns, and provides an excplanation of how they differ. 3.2 Matching Process Another common approach in matching is the use of search - the target pattern is matched against candidate patterns and the one with the highest rating is selected. A direct problem is that search is expensive, especially when the number of candidates gets large. An alternative view is to discard the notion of search altogether. This is to say that the processing of input is associated with the matching with existing patterns. This view is in agreement with the prevalence of the matching process. Here, the issue is shifted more from the matching process to the organization and access methods of memory. 4 Representation Here, we restrict our scope to melodies (monophonic note sequences), and provide only an informal account of the representation scheme. A music fragment 1l (i.e. sequence of notes) is represented as a set of component features: F. F is a (total) decomposition of Al if if contains sufficient information to reconstruct M. Otherwise it is partial, and an abstraction of M. Breaking each note into its pitch and duration results in a pitch sequence and rhythm sequence. The pitch sequence can be further decomposed into key and scale degree sequence, the key into tonic and mode (e.g. majior or minor), and scale degree sequence into representative element (pitch or scale degree) and diatonic contour. For examlple, the sequence "CDEC" can be represented as C major, starting in C (scale degree I) with contour {,+1, +1, -2}. ("*" indicates the position of representative element). The contour itself can be derived locally from interval values, independent of key identification. This relies on the construction and our general knowledge of scales. The terns scale is used here to mean not only diatonic scales, but any pitch collection with cognitive significance (e.g. chromatic scale, chords, octaves). A pitch sequence is called to be primitive if it is either a duplication of the same pitch, or stepwise ascending or descending motion (over some scale) '. In the rhythm domain, the counterpart of primitive sequences is (but not restricted to) periodic repetition. Primitive sequences provide the basis for forming hierarchical structures. The hierarchy here is recursive in that all levels are primitive sequences, where higher level sequences take the representative elements of the lower levels as elements. The "CDEC" sequence can be decomposed into "CDE" and "EC", implying an overall return imotion over C (note that overlap of boundary elements are allowed [Hiraga, 1993]). Representative elements are selected according to this criteria., which generally corresponds to the first or the last element of the sequence. In the rhythm domain, the counterpart of representative elemeltts is the sum of the member durations. The notion of similarity based on featlre equality can be applied to this framework to h'laracterize a wide class of matching instances. Tra ulsposition alters the degree of the rel)resentative (lement while modulation alters the tonic, hot l retaining the contour and rhythm. Augment at iol and dotted rhythm are examples of pitch ilvariant transformations. Higher level sequences iin a hierarchy can account for simple variations (1)y ignoring ornamental tones) or partial matching (& in a dominant and tonic ending). The diff-rn,,. then, can be accounted for by simple traiLs for n.i tion functions. Based on this observation, we view t hat, match occurs when some features are invariant. and is reinforced if the difference can be acco,lte for by some transformation. Matches in this sense are partially ordered. The problem here is what counts as a. matcl. In general, there seems to be no definitive criteria because much depends on context (e.g. where the lpa ttern occurs, how persistent it is) and even subjective judgment ([Meyer, 1973] has an extensive discussion). The tentative criteria, include, besides the strength of the match, temporal proximity and complexity of the lpattern itself. Primitive sequences hprovide a simlple measure of complexity. Equal beat rhythm is in general too weak to establish a. match, while simple rhythm 1)atterns like [2, 1, 1] is significant enough, regardless of the lpitch. This is to say that the size and dlelth of the hierarchical structure indicates the complexity, and hence the well-definedness of the It is plausible to allow for Â~2 intervals in special cases, as in a chord. ICMC Proceedings 1996 249 Hiraga

Page  250 ï~~pattern. 5 Matching Process The no-search strategy mentioned in section 3 requires that stored patterns are content addressable in a broad sense. There are various computational methods that realize this, including hash functions, neural nets, and convolution methods. Reducing the issue of pattern matching to that of feature equality is already effective in simplifying the task because it restricts candidates and makes the content indexing scheme applicable. This, however, is at the price of introducing multiple representations (at different levels), their derivation, and simultaneous matching. The greatest difficulty in matching is to determine the pattern boundaries [Rowe & Li, 1994; Katayose &; Inokuchi, 1995]. As mentioned in 2, matching itself helps determining the segmentation, so the problem is inherently circular. There is no doubt that segmentation cues like accents and articulation provide strong evidence for the breaking points, and may even override boundaries induced by matching [Lerdahl & Jackendoff, 1983]. This, however, does not eliminate matching itself because noticing the conflict leads to intended esthetic effects. Based on this consideration, we put the basis of pattern induction on the structuring based on l)rimitive sequences (especially in the pitch domain) at all levels. Since the boundaries of primitives sequences necessarily overlap, there is essential ambiguity on the breaking points. In the present scheme, this is handled by launching competing hypotheses of matching. In the Are You Sleeping? example, the initial "CDE" does not trigger matching because it is l)rimitive. The following two Cs both trigger matching because they are at the boundaries of l)rimitive sequences. The subsequent "DE" favors the latter C as the starting point, and the C at the eighth note establishes the match (because the end point in the initial "CDEC" meets the starting lpoint of the second). Note that the end p~oint of the initial, model lpattern is undetermined during the lprocess. The present scheme utilizes discrimination trees for content indexing. Patterns entered in the tree start from the root and branch at differing nodes. An inlput that. matches an existing pattern follows the lpath along the tree and identifies the match by the tag attached to the node. Entries at different levels of abstraction is made in different trees, so the overall relpresentation is multiple and redundant. Currently, a partial implelentation of this scheme is programmed and is seen to exhibit in tended behavior. It is partial in that hierarchical structures cannot be handled properly. This requires establishing link relationships among hierarchically related patterns. One aspect of improving this scheme is to introduce retrospective matching, in which matching is delayed until an assumed pattern is welldefined (in the sense of pattern complexity). A more fundamental revision of introducing an elaborate framework that eliminates spurious matching is a topic of future research. 6 Conclusion This paper )resented a scheme for pattern matching in music. The scheme is oriented towards structural matching, which seeks for patterns reflecting the music structure. The view presentedl here is that music is heard, analyzed and stored as a set of decomposed features, similarity is judgedI on basis of the common occurrence of those features, and that interaction with such memory plays an essential role in music cognition. The current scheme, however, still lacks the ela.borateness required by this view. For example, there are other al)l)lications of pattern matching as well, such as following amd analyzing erroneous or extremely expressive lpcr formance in real-time. This requires, among ot elm things, treatment of sporadic mismatches. Thein are some prosl)ects to handle this (e.g. pre-analyze for mistouched tones), which is a. topic of firmt,mme work. References, 1993 Hiraga, Y.: A Computational MlodIl of Music Cognition Based on Interacting Primitive Agents, Proceedings of ICM C 1993, pp.292-295. Katayose & Inokuchi, 1995 Katayose, H. and Inokuchi, S.: A Model of Pattern Processing for Music, Proceedings of ICMC 1995, l)l).505-506. Lerdahl & Jackendoff, 1983 Lerdahl, F. and Jac'kendoff, R.: A Generative Theory of Tonal Music, Cambridge, MA., MIT Press. Meyer, 1973 Meyer, L.: Explaininq Music, Chicago, IL., Univ. of Chicago Press. Rowe & Li, 1994 Rowe, R. and Li, Tang-Chun: Pattern Processing in M~usic, Proceedings,of ICMC 1994, l)l).60-62. Stammen L- Pennycook, 1993 Stammen, D.R. and Pennycook, B.: Real-Time Recognition of Melodic Fragments Using thne Dynamic Timewarp Algorithm, Proceedings of ICMC 1993, 1)1).232-235. Hiraga 250 ICMC Proceedings 1996