Page  00000001 Automatic Generation of Grouping Structure based on the GTTM Masatoshi Hamanakal' 2), Keiji Hirata3) and Satoshi Tojo4) 1) Research Fellow of the Japan Society for the Promotion of Science, 2) National Institute of Advanced Industrial Science and Technology (AIST), 3) NTT Communication Science Laboratories, 4) Japan Advanced Institute of Science and Technology m.hamanaka@aist.go.jp Abstract This paper describes an automatic grouping system, which segments the music into units such as phrases or motives, based on the Generative Theory of Tonal Music (GTTM in short, hereafter). The GTTM is considered to be one of the most promising theories of music in regard to computer implementation; however, no order in applying those rules is given and thus, more often than not, may result in conflict among them. To solve this problem, we introduce adjustable parameters, which enable us to give priority among rules. We show the experimental results that our method outperformed the baseline performance by over thirty percent, tuning the parameters. In addition, we show that the system displays the time-span tree based on these grouping rules together with metric information given by input MusicXML. Introduction The purpose of this study is to automatically derive a time-span tree that assigns a hierarchy of 'structural importance' to the notes of a piece of music. The hierarchy is based on the generative theory of tonal music (GTTM) (Lerdahl and Jackendoff 1983). Automatic generation of a time-span tree from the music surface enables us to analyze the deeper structure (Hirata and Aoyagi, 2003). It also provides a summarization of the music, which can be used as a representation of search, and thus results in music retrieval systems (Hirata and Matsuda 2003). The GTTM is composed of four modules, each of which assigns a separate structural description to a listener's understanding of music. These four modules output the grouping structure, metrical structure, time-span reduction, and prolongational reduction, respectively. The grouping structure is a hierarchical segmentation which results in motives, phrases, and sections. The result of grouping is used to derive a time-span tree, together with metrical information of MusicXML. In this paper, we describe a method to articulate automatically a transition of notes into groupings by the GTTM. Previous segmentation methods have been unable to construct hierarchical grouping structures because they have focused on detecting the local boundaries of the melody (Stammen and Pennycook 1994), (Temperley 2001), (Cambouropoulos 2001), (Ferrand, Nelson and Wiggins 2003). Attempts have been made to implement several of the rules of the GTTM grouping structure in computer systems, but these methods have been incapable of resolving the conflict between the rules (Ida, Hirata, and Tojo 2001), (Touyou, Hirata, and Tojo 2002). Our system of segmentation based on the GTTM makes it possible to construct hierarchical grouping structures in a top-down process using bottom-up detection of local boundaries. The system is equipped with adjustable parameters, and they enable us to control the strength of each rule. When a user changes the parameters, the hierarchical grouping structures change as a result of the new segmentation. With this system, we came to generate time-span trees, searching for plausible parameter sets. 2 Problems of applying grouping rules The grouping structure is intended to formalize the intuitive belief that tonal music is organized into groups that are in turn composed of subgroups. These groups are presented graphically as several levels of arcs below a music staff. There are two types of rules for grouping in the GTTM: grouping well-formedness rules (GWFR) and grouping preference rules (GPR). Grouping well-formedness rules are necessary conditions for the assignment of a grouping structure and restrictions on these structures. When more than one structure can satisfy the well-formedness rules of grouping, the grouping preference rules (GPR) indicate the superiority of one structure over another. The GPRs consist of seven rules: GPR1 (alternative form), GPR2 (proximity), GPR3 (change), GPR4 (intensification), GPR5 (symmetry), GPR6 (parallelism), and GPR7 (time-span and prolongational stability). GPR2 has two cases: (a) (slur/rest) and (b) (attack-point). GPR3 has four cases: (a) (register), (b) (dynamics), (c) (articulation), and (d) (length). In this section, we specify the problems with GPRs in terms of computer implementation. 2.1 Conflict between rules Because there is no strict order for applying GPRs, the conflict between rules often occurs when applying GPRs and results in ambiguities in analysis. Figure 1 shows a Proceedings ICMC 2004

Page  00000002 simple example of the conflict between GPR2b (AttackPoint) and GPR3a (Register). GPR2b states that a relatively greater interval of time between attack points initiates a grouping boundary. GPR3a states that a relatively greater pitch difference in between smaller neighboring intervals initiates a grouping boundary. Because GPR1 (alternative form) strongly prefers that note 3 alone not form a group, a boundary cannot be perceived at both 2-3 and 3-4. To solve this problem, we use adjustable parameters SGPR j (= 2a, 2b, 3a, 3d, 4, 5, 6) (0 SGPR j < 1) that enable us to control the strength of each rule. 1 2 3 4 5 6 A i. 1 1 MusicXML Detection of low-level boundary Detection of high-level boundary -1 -^1 4,.. a oo ~m-~"J""J" Divide by top down; 1 Applying GPR1, 2, 3, 4, 5, 6 4 Ml A A ' ' GPRA GPR2b Figure 1. Simple example of conflict between rules. 2.2 Ambiguity in defining GPR4, 5, and 6 The GTTM does not resolve much of the ambiguity that exists in applying GPR4, 5, and 6. For example, GPR6 (Parallelism) does not define the decision criteria for construing whether two or more segments are parallel or not. The same problems occur with GPR4 (Intensification) and GPR5 (Symmetry). To solve this problem we attempted to formalize the criteria for deciding whether each rule is applicable or not. Automatic segmentation based on the GTTM system Figure 2 shows the processing flow of the system. As a primary input format we chose MusicXML (Recordare LLC 2004) because the format is expected to be a common interlingua in music notation, analysis, retrieval, and other applications. A hierarchical grouping structure was constructed top-down using bottom-up detection of local boundaries. We then designed GroupingXML as the output format for segmentation results. In our experiments, we restrict the music structure to monophony to correctly evaluate the performance of each rule. 3.1 MusicXML MusicXML is a music representation format based on XML (extensible mark-up language). It has attribute elements and note elements. The attribute element contains the musical attributes of scores, such as the key signature, time signature, and clef. The time signature includes the numerator (beats) and denominator (beat-type). The note element has a pitch defined by step and octave elements. Every note also has a duration based on divisions of a quarter note. Several additional elements may be associated with a note. Tied notes, slurs, fermatas, and arpeggios are represented by top-level children of the notation element. Dynamics, ornaments, articulations, and technical indications specific to particular instruments are also top Grouping-XML Figure 2. Processing flow of the system level children of the notation element. In this experiment, we prepared the input data in MusicXML with FinaleTMt and DoletTMfor Finale plug-in, in which such tags as dynamics and articulations were not included. 3.2 Application of GPRs In this section, we discuss the application of GPR1, GPR2a, GPR2b, GPR3a, GPR3d, GPR4, GPR5, and GPR6. We were unable to apply GPR3b (articulation) and GPR3c (dynamics) because MusicXML for our system input does not have dynamic and articulation elements. The degree of boundary for each rule can be expressed as DiGPRj (=1, 2a, 2b, 3a, 3d, 4, 5, 6) (O DGPRj < 1) can be expressed as Di (0~_ D i ~1). Our segmentation system has thirteen adjustable parameters, which include SGPRj(= 2a, 2b, 3a, 3d, 4, 5, 6),, Wr, W,W, TGPR4 and 7ow-level (Table 1). Table 1: Thirteen adjustable parameters Parameters Description SGPRj The strength of each rule. j =(2a,2b,3a,3d,4,5,6) a The standard deviation of a normal distribution for GPR5. Wr Weight of priority of the same rhythm compared with the same register in parallel segments. W, Weight of priority of one end of a parallel segment compared with the start of a parallel segment. W, Weight of priority of large parallel segments. TGPR4 The value of the threshold that decides whether GPRs 2 and 3 are relatively pronounced or not. tow-level The value of the threshold that decides whether transition i is a low-level boundary or not. GPRs 2, 3, and 4. GPRs 2, 3, and 4 are the rules for a transition of four notes. The degree of the boundary of each rule indicates whether the transition of i is heard as a group boundary (DiGPRj =1) or not (DiGPRj =0). GPR4 has an adjustable parameter TGPR4 (0 TGPR4 1) to control the value of the threshold that decides whether GPRs 2 and 3 are relatively pronounced or not. See http://www.recordare.com/ Proceedings ICMC 2004

Page  00000003 GPR2a{1 D - GPR2b{1 Di GPR3a{ DGPR3d{1 Di = GPR4 1 where resti1 < resti and resti > rest i+1 resti -1 resti or resti ~ rest i+1 loli-1 < lioi and ioit > ioii+1 ioi,-1 2 ioit or ioit ~ ioii+1 regi i1 <regi, and regi i> regi i+1 regi i-12 regi i or regi i~ regi i+1 leni1 =0 and lent ~ 0 and len g1 =0 len_1 ~ or len, = 0or leni,1 ~0 Pi rest > T GP4or piiot > TGPR4 or piet'gist > TGPR4 GPR4 GPR 4 p restJf ~ T PXand P101" ~ TGPR4and pr'egist ~ T (1) (2) (3) (4) (5) Xqi,. 1 =E; 0 y =1 qj qj 'Y k{ k l 1 zq,~ ' = C 0 I k=1 division qi ~q and q~ qi + r else q, -q)x division = lorg - iol g=1 g=1 else q -q )x division = ioi - ioi and regi= regi1 g=1 g=1 else ([]:Gausian integer) rest ioui regi, leni: interval between current offset and next onset.: inter onset intervals.: pitch intervals.: subtraction of duration. o regii-1 ~ regii or regii ~ regii+ prest restI pioi oi pregi regi ese Z rest1 C ioi C regi1 j=i-1 j=i-1 j=i-1 GPR5. GPR5 is the rule for symmetry in a grouping structure. We use a normal distribution with the standard deviation a as a symmetry level DiGPR5 SO that there is a preference to subdivide groups into two parts of equal length. GPR1. GPR1 is designed to prevent a group from containing a single event. Therefore, the boundary must be stronger than the neighboring transition. The strength of low-level boundaries can be expressed as follows: Bf = JDGPR j x SGPRj max DGPRj xSGPRj2 (8) j(=2a,2b,3a,3d,6) 2b,3a,3d,6) The degree of the GPR1 boundary indicates whether the transition of i is heard as a group boundary (DiGPR1 =1) or not (DIGPR1 =0). DGPR1 [1 B1_1 ~ B1 and B. B>+ (9) R ielse 3.3 Detection of low-level boundary Low-level boundaries are detected by MusicXML using GPR j(2,b3,d6 ow-level/ Dij (=2a,2b,3a,3d,6). 7 is an adjustable parameter for controlling the value of the threshold that decides whether transition i is a low-level boundary or not. The degree of low-level boundaries Dijo"~eve/ can be expressed as follows: SB Tlow-level GPR 1 1 owleve 1 ] Bi > T and D - 1In ioii L = start 2 end - rioi 2 j=start 20.2 GPR5 a - e (6) where start: start transition of a group. end: end transition of a group. GPR6. GPR6 is the rule for parallelism in the grouping structure. GPR6 has three adjustable parameters, which are W, (weight of priority of the same rhythm compared with the same register in parallel segments), W, (weight of priority of one end of a parallel segment compared with the start of a parallel segment), and W1 (weight of priority of large parallel segments) (0~W,, W,, W1 ~ 1). We define the parallel level DiGPR6 as follows: GStat x(]-Ws) mi =1 end DGPR6 (7) s? ] (7 1r Gitart ( ) endx W mi] =3 0 mij y - where division: duration of a quarter note. r: length of parallel segments based on the division of quarter notes. 1i (0I) else 3.4 Detection of high-level boundaries A hierarchical grouping structure is constructed in the top-down method while local-boundaries are found in the bottom-up way. A group that contains a local boundary detected iteratively by the next level boundary ~ is calculated as follows: ^low-level GPRj GPj i = argmax D, x Di x SGpRI (11) i j(=2a,2b,3a,3d,4,5,6) 3.5 GroupingXML We designed GroupingXML as an export format for hierarchical grouping structures. GroupingXML has group elements, note elements, and applied elements. Note elements align the order of onset times, which connect to notes in MusicXML using Xpointer (W3C 2002) and Xlink (W3C 2001). All note elements are inside hierarchical group elements. The applied elements are located between the end of a group tag and the start of the next group tag, which is where the GPR are applied. Figure 3 shows a simple example of GroupingXML. We developed a GroupingXML viewer to display grouping structures (Figure 4). G Start -i Zqj q1 Gi X (1 - Wr)X Yq j q1 y l+W1 clie j x Wr x rl+W1 zei 4j fr z y G/'-d Zi-r qj-r rx (1- Wr)xx r1+W1 W Yi-r q-rr r Yqt-r q1-r r Zq_ q1-r r 1 2 in 2 3 o qi q 4-1 and q1 ~ q _ and qi = qi1 and q1 = q1. qi = i-1 and q = q -_ and qi ~ qi+1 and q~ qi qi* ~-1 and q1 q ~q and qi qi 1and q1 ~qj+ else Proceedings ICMC 2004

Page  00000004 -<group> -<group> +<note id="P1-1-1"/> +<note id="P1-1-2"/> +<note id="P1-1-3"/> +<note id="P1-1-4"/> +<note id="P1-2-1"/> </group>A <applied rule="2a"/>J <applied rule="6"/> -<group> +<note id="P1-2-3"/> GPR2a, GPR6 +<note id="P1-2-4"/> +<note id="P1-2-5"/> +<note id="P1-2-6"/> +<note id="P1-3-1"/> </group> </group> Figure 3 Simple example of GroupingXML. p i Fie di Hl Table 2: F-measure for our method Melody Baseline Our method with performance configured parameters 1. TurkishMarch 0.09 0.95 2. Wiegenlied 0.41 1.00 3. Brindisi 0.03 0.90 4. My dearest father 0.03 0.11 5. The Nutcracker March 0.01 0.05 Total (100 melodies) 0.32 0.67.... Conclusion Figure 4 Screen snapshot of GroupingXML viewer. 4 Experimental results We evaluated the performance of segmentation using an F-measure, which is given by the weighted harmonic mean of Precision P (the proportion of selected groupings that are correct) and Recall R (the proportion of correct groupings that were identified). We did not take care that low-level error is propagated up to higher-level; we counted wrong answers with ignoring the difference of grouping levels. Fmeasure = 2 x PxR (12) P+R This evaluation required us to prepare correct grouping data. We collected a hundred pieces of 8-bar length, monophonic, classical music pieces, and asked those who have expertise in musicology to give them groupings manually, faithfully with regard to GPR's. Those manual results were cross-checked by three other experts. The segmentation changes depending on the parameters configured. To evaluate the baseline segmentation performance of our system, we used default parameters, which were SGPR j(= 2a, 2b, 3a, 3d, 4, 5, 6)=0.5 a =0.05, Ws =0.5 7,PR4 ow-level-0 ' We =0.5, W1-0.5 fG =0.5, and T,=0.5. It costed us about 10 minutes on average for a piece to find a plausible tuning of parameter sets. As a result of configuring the parameters, the performance of segmentation outperformed the baseline F-measure by more than thirty percent (Table 2). We developed a segmentation system based on the GTTM. This system makes it possible to construct hierarchical grouping structures. Experimental results show that the performance of segmentation outperformed the baseline F-measure by more than thirty percent as a result of configuring the parameters. At the current stage, the timespan tree is generated only by GPR's together with metric information given by musicXML. We are now planning to improve the precision of trees, implementing the further details of Time-span tree generation rules. References Lerdahl, F., and R. Jackendoff. (1983). A Generative Theory of Tonal Music. Cambridge, Massachusetts: MIT Press. Hirata, K., and T. Aoyagi. (2003). "Computational Music Representation on the Generative Theory of Tonal Music and the Deductive Object-Oriented Database." Computer Music Journal 27(3), 73-89. Hirata, K., and S. Matsuda. (2003). "Interactive Music Summarization based on Generative Theory of Tonal Music." Journal ofNew Music Research, 32:2, 165-177. Stammen, D. R., B. Pennycook. (1994). "Real-time Segmentation of Music using an Adaptation of Lerdahl and Jackendoff s Grouping Principles." In proceedings of the International Conference on Music Perception and Cognition, pp. 269-270. Temperley, D. (2001). The Cognition of Basic Musical Structures. Cambridge, Massachusetts: MIT Press. Cambouropoulos, E., (2001). "The Local Boundary Detection Model (LBDM) and its application in the study of expressive timing." In Proceedings of the International Computer Music Conference, pp. 290-293. Havana, Cuba: International Computer Music Association. Ferrand, M., P. Nelson, and G. Wiggins. (2003). "Memory and Melodic Density: A Model for Melody Segmentation." In Proceedings of the XIV Colloquium on Musical Informatics (XIV CIM 2003), pp. 95-98. Firenze, Italy. Ida, K, K. Hirata, and S. Tojo. (2001). "The attempt of the Automatic Analysis of the Grouping Structure and Metrical Structure based on GTTM." SIG Technical Report, Vol. 2001, No. 42, pp 49-54, (in Japanese). Touyou, K, K. Hirata, S. Tojo, and K. Satoh. (2002). "Improvement of Grouping Rule Application in Implementing GTTM." SIG Technical Report, Vol. 2002, No. 47, pp. 121-126, (in Japanese). Recordare LLC. (2004) "MusicXML 1.0 Tutorial." http://www.recordare.com/xml/musicxml-tutorial.pdf. W3C. (2001) "XML Linking Language (XLink) Version 1.0." http://www.w3.org/TR/xlink/. W3C. (2002) "XML Pointer Language (XPointer)." http://www.w3.org/TR/xptr/. Proceedings ICMC 2004