ADAPTIVE MELODIC SEGMENTATION AND MOTIVIC IDENTIFICATIONSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact firstname.lastname@example.org to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 1 ï~~ADAPTIVE MELODIC SEGMENTATION AND MOTIVIC IDENTIFICATION Greg Wilder Orpheus Media Research, LLC 247 N 2nd St. #1D Philadelphia, PA 19106-1307 email@example.com ABSTRACT The existence of meaningful patterns within musical contexts is undeniable, but the abstract nature of the language makes reliable detection of these relationships by machine a notoriously difficult task. This paper will briefly discuss the unique difficulties associated with musical data mining and propose an autonomous system that attempts to overcome the present challenges through the application of observations in music cognition research as a basis for adaptive algorithms capable of isolating and describing fundamental schemata (motivic structures) in addition to their likely variations. Through application of this system, it may be possible to delineate the relevant vectors in order to abstractly represent characteristics that grant thematic identity to a specific work, style, or genre,; in effect providing a detailed musical fingerprint. 1. INTRODUCTION An autonomous system capable of reliable motivic analysis could play a significant role in automated piracy detection, music recommendation and discovery, copyright infringement disputes, and model-based computer composition. Comparative analysis of the results might also be used to explore differences between individual composers' styles and to better understand trends throughout their individual development. With this in mind, our present goal is to develop a series of algorithms capable of accurately segmenting musical input and mining the results for defining motives. The results may then be used to describe and compare fundamental structures and unique identity characteristics of any musical input, regardless of style or genre. One of the most persistent difficulties with machine analysis has been the deeply complex and often conflicting tendency of rule-based approaches. The abstracted nature of musical language requires the application of context-aware decision making to determine the prototypical structure and perceptual relevance of core musical ideas. To overcome this and related challenges, our proposed system relies on observations in music cognition as a basis for algorithms capable of maintaining awareness of emergent grammar trends at multiple levels. Also, when live (human generated) performance data is provided, the process adapts to the natural (and very often desirable) variations that occur. These stylistic nuances remain intact throughout processing and could serve as a way of examining individual performance interpretation. 2. RELATED WORK With the exception of certain semiotic codes, music is an abstracted language that lacks specific instances and definitions with which to express concrete ideas. As a result of detailed study into hierarchical levels of cognitive awareness, Irene Deliege has suggested that, even when they occur in isolation, small but identifiable musical structures (e.g., motives) enable the listener to connect multiple related memories encompassing a broader scope than the original material would otherwise suggest . In other words, it would seem that characteristic motives form the cognitive basis of musical language definition and style grouping. The founding principles of Gestalt perception suggest that we tend to order our experience in a manner that is regular, orderly, symmetric, and simple. Further, psychologists have defined "Gestalt Laws", which allow us to predict our interpretation of sensation. Of particular interest is the Law of Closure, which states that the mind may experience elements it does not directly perceive in order to complete an expected result. Eugene Narmour's Implication-Realization Model is a detailed formalization of musical expectation based on Leonard Meyer's work on applied Gestalt psychology principles, and indirectly forms the basis for critical aspects of this work [5, 6, 7]. Composer/musicologist Fred Lerdahl and linguist Ray Jackendoff have attempted to codify the cognitive structures necessary for the perception of musical language and the limits of human musical capacity as determined by our general cognitive functions. Lerdahl and Jackendoff concluded that musical discretization, or segmentation, is necessary for cognitive perception and the processing of musical grammar [3, 4]. 2.1. Segmentation Approaches Previous attempts to create autonomous segmentation algorithms have often depended on the application of complicated rule sets that rely on assumptions about specific style and language conventions. However, these rule-based approaches tend to create internal conflicts in real-world application scenarios . Even if these conflicts are resolved appropriately, the assumptions required to design the original rule base necessarily limit the analysis process with regard to style and genre. Also, certain discretization systems require destructive preprocessing of the input data to provide consistency within the samples . While this may make data
Page 2 ï~~processing more straightforward, it alters the original input, thus destroying the integrity of the data, making the results less meaningful. 2.2. Musical Data Mining Musical grammar naturally contains similar patterns throughout, but determining which of these have analytical value remains a significant challenge. Certain approaches attempt to filter results based on pattern frequency or length; however, this still ignores context considerations contained within the emergent musical data set [8, 9]. With few exceptions, the difficulty of identifying and incorporating the role of musical parallelism remains unaddressed . This requires either pre-processing segmentation or a post-processing filtering algorithm capable of reliably identifying pattern start points so that beginning similarity can be analyzed. 3. MELODIC REPRESENTATION The difficulties inherent to accurately representing music for transmission and analysis have plagued musicians since sounds were first notated. Our solution implements a multidimensional representation scheme that relies on the relative change found in simultaneous attribute signifiers between consecutive note events (NEs). Conducting analysis using Delta Functions (3.1) provides a context-based abstraction of the material that maintains focus on the musical processes present in the data stream, and allows the attribute layers to more easily align with key identifying characteristics. 3.1. Delta Functions Delta functions are a normalized (0.0-1.0) representation of sequential event attribute change (between NEn, NEn+1 and NEn+1, NEn+2), and are used throughout segmentation and motive identification processing. 4. ADAPTIVE SEGMENTATION The process of adaptive segmentation provides robust discretization of the musical surface and serves as a preprocessor to motive identification. While there is no way to universally define correct segmentation results, the relative success of any segmentation algorithm may be understood as its ability to produce reasonable solutions for commonly accepted styles of Western musical input . 4.1. Adaptive Thresholds The creation of attribute-specific adaptive threshold values assists in evaluating the strength of perceptually relevant boundary candidates, and are determined by scanning the attribute layers for points of uncharacteristic change: a direct application of Gestalt principle. Adapting threshold points for individual attribute layers provides system flexibility and ensures compliance with the prevailing data trends (context-awareness). The initial attribute threshold candidate is set to the standard deviation of the data set and tested against all NEs. If the number of NEs passing under the threshold falls within the acceptable boundary range (currently 15%-45% of total NEs), the candidate is kept (Fig. 2). SDelta Length Adaptive Threshold. E,. o.,,'..',. (Threshold Uo41982.. Figure 2. Results that fall below the adapted threshold are identified as boundary candidates. Otherwise, further adaptation occurs as follows. After ensuring the candidate produces a result below the lower boundary (15%), the algorithm establishes an incremental value (based on data trends for the current attribute) to be applied to the threshold until the number of resulting NEs passing the threshold test is within the acceptable range (15%-45%). Attribute thresholds are then applied and attribute boundary candidates identified. If an event vector remains static (i.e. all velocities are the same), the system discards the attribute data so as to preclude its influence on boundary weighing. 4.2. Weighting Factors A normalized (0.0-1.0) Weighting Factor for each attribute boundary candidate is calculated based on its position within the previously determined attribute threshold range as shown in Table 1. pitch weight = 1 -((A NEn / P) * 0.01) (2) Range A Pitch Position (P) Weight 0.667-0.763 0.724 0.096 0.9246 Table 1. Sample Pitch Weight calculation results. delta-attribute = 1 - (A NEn- A NEn+1) (1) For instance, any group of four consecutive NEs may be represented and compared using two normalized data points to define the attribute-specific delta between events (Fig. 1). A Pitch Values (Normalized) 1.0 II 0.80528 Figure 1. NEs with normalized delta values. At present, delta functions are calculated for each consecutive NE pair with regard to pitch (hertz), contour, onset time (seconds), offset time (seconds), length (seconds), and velocity. General articulation identifiers may be inferred through a combination of length, offsetonset difference, and velocity.
Page 3 ï~~Boundary candidates are then collected and filtered through a "bonus" system to appropriately promote their perceptual relevance before Boundary Identification (4.4) occurs. 4.3. Weighting Bonus Drawing on Gestalt principles and Narmour's melodic expectation process model, we can define cognitively relevant musical conditions to guide decisions made by the weighting system. For instance, it may be said that the longer any perceptual process iterates or duplicates without closure, the more important (or final) the closural event becomes. Systematic application of this concept is best illustrated by way of an example. Throughout the weighting process, the system links pitch contour with the delta_pitch attribute. As pitch contour remains constant (P)', equity is accumulated and then applied as a bonus when a change (R)2 is detected. Additional weight is added for larger deltapitch values whose contour continues unchanged. As a way of further connecting these attributes, the Contour Equity Bonus is only applied to the result if delta_pitch passes its previously adapted threshold value. The result of such a series of events is outlined in Fig. 3. Contour Equity Equity Builds Released Figure 3. Demonstration of Contour Equity Bonus The remaining attributes are treated similarly with the strongest Weighting Bonus given to uncharacteristically large gaps (as defined by global and regional data trends) between NEs. 4.4. Boundary Identification The weighting process produces confidence values that account for relative change in attributes between each NE. Using these results, a new threshold is calculated (using the standard deviation of the data set) and applied to determine segmentation boundary points (Fig. 4). Boundary Weight Results 5. MOTIVIC IDENTIFICATION In order to discover perceptually relevant motivic (or thematic) groupings, the previously identified segments require further analysis and possible adjustment. To this end, motivic identification begins with a "finalizing" process (5.2), and in certain cases, a comparison-based regrouping/reduction of the segmentation results (5.3). Once segment reorganization is complete, the system searches for relevant motive candidates by comparing attribute similarity patterns (5.1) of the smaller segments to those found in larger segments (5.4). This process creates a "short list" of motivic candidates that is further reduced by examining contextual pattemrns (taking into account musical parallelism), and leaves only the most perceptually relevant motives. Finally, the original data stream is scanned to identify motivic instances and their variation forms. 5.1. Variation Matrix The Variation Matrix is a Euclidean-based distance matrix adapted to search for attribute similarity patterns while ignoring differences in sample size. The comparison of delta attribute patterns allows the system to determine the extent to which events within identified boundaries share common properties. d[i][j] = Min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost) (3) Rejecting sample size supports variation searches within identified boundaries, a prerequisite for similarity ballooning (see 5.3). distance= (d[n][m] - SegmentDiff) 5.2. Segment Finalization (4) The Segment Finalization process attempts to discover and correct uncharacteristically large offset/onset gaps between consecutive NEs within currently defined segment boundaries. As before, this method adapts the required judgement criteria from prevailing data trends. 5.2.1. Boundary Split If the split point occurs with a single NE on either side, the gap-isolated NE is removed from the current segment and added to the closest neighbor. 5.2.2. Mid-Point Split NEs on either side of the mid-segment split are combined with their neighbor segments and tested against the remaining segments for multiple attribute vector similarity. New segments are created as necessary from residual material to accommodate unmatched groupings. 5.3. Similarity Ballooning Bounda- esuls.,,, elahiR esult Figure 4. Results that fall below the adapted threshold are identified as boundaries. 1 (P) = (P0 or P-) = Same or Similar Process (after Narmour) 2 (R) = (R-) = Closural Reversal (after Narmour) Similarity Ballooning is the combining of consecutive segment candidates based on attribute similarity comparisons to create larger (more complete) groupings of thematically related material. This process is triggered only when more than 15% of consecutive segment pairs contain fewer than four NEs each and serves to reduce the overall number of segments, thus strengthening system awareness of phrase-level form. Similarity Ballooning compares selected attributes of defined segments (smaller than the median) for similarity using the Variation Matrix (5.1). If candidates are
Page 4 ï~~considered a match, the system attempts to "balloon" the smallest candidate by combining it with its smallest neighbor. Ballooning is deemed successful when the newly expanded segment finds a similar match elsewhere within the data stream (Fig. 5). Grouping 1 I I Grouping 1 provides model for ballooning process. Grouping 2 (pr-~..lln) Original segmentation breaks motive into two segments. Ballooning finds Grouping 1 (pot-b, all ) I I model and exp ds to mat h. Figure 5. Similarity Ballooning at work in Handel's "V'adoro, pupille" (Giulio Cesare). 5.4. Motivic Candidate Identification Motivic Candidates are initially determined through a basic examination of similarity relationships between identified segments. A sliding window (defined by the smaller candidate) is employed in the comparison of segments smaller than the median with all other segments containing more than 5 events in search of a partial attribute match. This technique allows matches to include melodic variations within segment groupings, and addresses musical parallelism through windowed comparisons that take as their starting point the beginning of thematic segments. Once a potential match is discovered, ballooning occurs one event at a time (in the same direction) until a variation match is determined. Because the system is compiling a list of prototypical Motivic Candidates, exact matches (duplicates) are discarded. The candidate list is then further pruned so that individual NEs are permitted to serve as the starting point for a single candidate. In the case of multiple potential candidates, favor is given to the largest grouping. 5.5. Motive Identification As uniquely defining characteristics of the motivic candidates emerge from the musical surface, dominance is verified (or denied) through their rate of discovery (or lack thereof) and placement within the original musical texture. The remaining number of motives is quite manageable, generally less than 10% of total_NEs. After duplicates are removed, the system scans the original data searching for instances of each candidate in addition to their variant forms. A variable window scanning technique allows the original segment boundary markers to be factored into the identification process. 6. CONCLUSIONS AND FUTURE WORK The proposed system produces reasonable results with musical input covering a very wide range of styles; however, further evaluation is required before it may be reliably implemented in the proposed application areas. While length considerations prevent the inclusion of analysis examples, sample results are available at http://www.orpheusmediaresearch.com/. Many fascinating research application questions remain. Might segments and motives comprise schema that could be implemented in style-based scripts to better analyze and understand form? Multiple performances of the same pieces have yielded different results; could this method present a way of guiding interpretation? What can we learn by analyzing and comparing similar works from different composers? Might subtle stylistic or composer trends become evident? 7. ACKNOWLEDGEMENTS The author wishes to thank his collaborator, Stuart Paul Duncan (Cornell University) on work related to melodic segmentation, in addition to Dr. Eugene Narmour (University of Pennsylvania) and Dr. Jeremy Gill (Temple University) for their invaluable input. 8. REFERENCES  Cambouropoulos, E., "Musical Parallelism and Melodic Segmentation: A Computational Approach", Music Perception, 23, 249-269. (2006)  Deliege, I., M. Mdlen, D. Stammers, and I. Cross, "Musical schemata in real-time listening", Music Perception, 14, 117-160. (1996)  Jackendoff, R., Lerdahl, F., "The Human Music Capacity: What is it and what's special about it?", Cognition, 100, 33-72. (2006)  Lerdhal, F., Jackendoff, R. A Generative Theory of Tonal Music. Cambridge: MIT Press. (1983)  Meyer, Leonard B. Emotion and Meaning in Music. Chicago: Chicago University Press. (1956)  Narmour, E. The Analysis and Cognition of Basic Melodic Structures.: The ImplicationRealization Model. Chicago: University of Chicago Press. (1990)  Narmour, E. The Analysis and Cognition of Melodic Complexity.: The ImplicationRealization Model. Chicago: University of Chicago Press. (1992)  Rolland, Pierre-Yves, "Discovering patterns in musical sequences", Journal of New Music Research, 334-350. (1999)  Rolland, Pierre-Yves, "F1ExPat: Flexible extraction of sequential patterns", Proceedings oflEEE ICDM, San Jose, CA, 2001.  Temperley, David. The Cognition of Basic Musical Structures. Cambridge, MA: MIT Press. (2001)  Thom, B., Spevak, C., and Hothker, K., "Melodic Segmentation: Evaluating the Performance of Algorithms and Music Experts", Proceedings of ICMC, Gothenburg, Sweden, 2002.