Page  00000001 FINDING SUBSEQUENCES OF MELODIES IN MUSICAL PIECES Kamil Adiloglu Berlin University of Technology Department of Electrical Engineering and Computer Science ABSTRACT The similarity neighborhood is a paradigmatic model investigating similar melodies of equal length depending on the number of appearances of them within a given piece. In the present paper, we propose some extensions to the similarity neighborhood model to identify also the subsequence relationships between melodies of different length. These relations are considered in two directions called the presence neighborhood and the content neighborhood. The presence neighborhood of a given melody is a set of longer melodies being similar to those containing the given melody within. Strong presence further requires that the corresponding short segments are similar as well (inheritance property). The content neighborhood, on the other hand, contains subsequences of a given melody. 1. INTRODUCTION The similarity neighborhood model[l] is a paradigmatic model searching a given piece exhaustively to find significant melodies within. These melodies are determined according to the number of appearances of them. Rather than restricting the search into equality of melodies, the similarity neighborhood model defines a similarity criterion of melodies. The method we propose in the present paper to recognize the subsequence relationships of melodies are inspired by the model presented by Buteau[2] and Mazzola [4]. Buteau and Mazzola proposed a topological model based on similarity of motives, which are arbitrary subsets of a given piece. Because of the asymmetric nature of this topology, i.e. neighborhoods contain motives of different length, they introduced two functions called presence and content to measure the presence of a motive in others. If the roles are inversed, the content of a motive is measured. Within the similarity neighborhood model, we restrict our domain to melodic segments, containing only consecutive notes, to get rid of the computational difficulties as well as the difficulties to music-theoretically interpret the results provided by the model of Buteau and Mazzola. In order to investigate the presence and content of a given melody within a given piece, we define two additional neighborhood concepts in Section 3, namely the presence neighborhood and the content neighborhood. Klaus Obermayer Berlin University of Technology Department of Electrical Engineering and Computer Science 2. THE SIMILARITY NEIGHBORHOOD MODEL The similarity neighborhood model[l] uses the similarities between melodies to identify the significant melodies of a given piece. The significance of a melody depends on the amount of repetition of the melody itself or any kind of variation of it. In order to decide whether two melodies are similar, we use a mathematical distance measure. Therefore some musical concepts are defined in mathematical terms. 2.1. Basic Definitions The difference in pitch between two notes is called a melodic interval. In the similarity neighborhood model, we measure the intervals chromatically. Furthermore the note onsets are ignored, and a melody is considered as a sequence of chromatical pitch values. This decision leads us to the following definitions. Definition 1 A melody m of length n is a sequence of n + 1 integers (to, 1,.., tn) E Zn+1. Its coordinates ti denote chromatic pitch. Definition 2 A segment of a melody m is a subsequence m = -(ti,..., ti+,) ofm, where i + n' <= n. Melodies are considered as trivial segments of themselves, namely m = mn. This allows us to consider segments of segments in a recursive way, depending on an ambient melody M, which is the melodic encoding of a piece. In the melodic structure of a given piece, melodies appear in different variations. In our representation, the augmentation and the diminution of a given melody are represented the same as the original melody. However, some variations differ from the original melody. Therefore, we need to define the shape of a melody, in order to distinguish these kinds of variations from mere chromatic transpositions. Definition 3 The shape of the melody m is defined as: P: n+l"+, zn, (1) P(m) (tl - to, t2 - tl,... t, - n-l). (2)

Page  00000002 In the chromatical pitch representation, the major and minor intervals are represented differently. For this reason, translations do not always have the same shapes. Furthermore, inversions and retrogrades do not have the same shapes either. Hence, examining the equality of two melodies does not suffice to recognize all possible variations of melodies. In order to obtain satisfying results, the similarity of melodies should be discussed. 2.2. The Similarity of Melodies The mathematical definition of the shape of a melody enables us to apply a mathematical distance measure to measure the similarity between two melodies. In the present article, we used Pearson's correlation coefficient [3] also called Pearson's r to measure the similarity of two shapes. The correlation coefficient can detect similarities between chromatic translation (max. positive correlation), chromatic inversion (max. negative correlation, because of negative shape), diatonic translation and inversion (high similarity) and melodic variations up to a chosen threshold. However, the retrograde cannot be detected by applying the correlation coefficient in its naive form, because the order of the intervals are reversed. One would need to consider the group action of shapes [4] to detect the retrograde. 2.3. The Significance of a Melody The significance is indicated by the occurrences of a given melody within a given piece. These occurrences are detected by measuring the similarities for the given melody throughout the given piece. The similarity neighborhood set of a given melody is constructed from the detected similar melodies within a given piece, which is formally defined as: Definition 4 The similarity neighborhood ofa given melody m within a given piece M is defined as: S~ Th M card(UR(m, M)) Signif icancen(m, M) =, (5) n This formula accounts for the number of occurrences of similar melodies to the given melody, divided by a normalization term, which enables us to compare the significance values of different length melodies. 2.4. The Prototypes The similarity neighborhood sets constructed, and the corresponding significance values are calculated for all of the melodies within a given piece. However, we look for melodies having relatively high significance values. In order to identify these kinds of melodies, we define a new term. Definition 6 A given melody m, whose significance value is higher than the threshold T is called a prototype, which is shown by Prtn (T) ={mSignificancem(m, M) > T}, (6) N T n c2 where c2 is a constant. (7) The set Prtn is the set of all prototypes of the length n. The melodies with higher significance values are called the prototypes of the given piece for a particular length n. 3. SEGMENT RELATIONS OF MELODIES From the music-theoretical point of view, a melodic analysis of a given piece explains how the melodic material is introduced and used throughout the given piece. Therefore the segment relationships between melodies should be identified as well. However, the similarity neighborhood model cannot provide that kind of information, because only equal length melodies can be compared. In order to investigate the segment relationships of melodies two additional neighborhoods are defined. 3.1. Presence Neighborhood A first account to what one may call the 'presence' of a melody m within a piece is given by its similarity neighborhood. In the same way, we can consider the 'presence' of a melody within subsequences of the piece. A particular interesting set of such subsegments are the prototypes containing m. The idea is to include also the similarity neighborhoods of these ambient segments. Definition 7 The set of all segments of length n' being similar to a prototype m' containing the given melody m is called the weak presence n '-neighborhood of m. The weak presence neighborhood of a given melody m is the union of all of the weak presence n '-neighborhood sets for: U" (m, M) R { M11: |Tr-(pA(m),up (M7)) | > RI}(3) 2ne1 _ where c1 is a constant. (4) The similarity of two melodies are determined by comparing the distance value with the threshold R. The similarity neighborhood of a given melody contains all of the similar melodies to the given melody within a given piece. In order to indicate the importance of the given melody within a piece, we define the significance of the given melody as follows: Definition 5 The significance ofa given melody m within a given piece M is the ratio of the cardinality of the similarity neighborhood set of the given melody to the normalization ratio N of the length N of the piece to the length n of the melody:

Page  00000003 Figure 1. The strong presence neighborhood relation between the long melodies m' and m", and the short m and ml. m i" n' E]length(m), length(M)]. These two sets are defined as follows: 3.2. Content Neighborhood Naively, the 'content' of a given melody is the set of all segments of the melody. These segments naturally appear within the given melody, whereever the given melody literally appears. However, some of the segments of the given melody also appear independently, and some similar occurrences of a melody do not necessarily contain similar subsegments. The content neighborhood aims to identify particularly the segments of a given melody up to similarity. Definition 9 The set of all segments of length n' similar to a prominent subsegment m' of a given melody m is called the weak content n'-neighborhood of the given melody. The weak content neighborhood of a given melody m is the union of all of the content n '-neighborhood sets for: n E [minimum melody length, length(m) [. These two sets are defined as follows: Presn' (m, M) Pres(m, M) U UR(m',M), m=m " m'EPt" UPres (m, M). (8) (9) The weak presence neighborhood does not guanrantee that similar melodies have the same weak presence neighborhood sets, because two melodies containing two given melodies does not always have the same melodies within their similarity neighborhood sets. Therefore we calculate the equivalence closure for the R-similarity relation, and study the stability of the classes under inclusion of segments. The segments that are contained within the weak presence neighborhood do not assure that a similar segments to the given melody are contained within them at the corresponding positions. Hence, in general, the inheritance property does not hold for the weak presence neighborhood. In order to assure the inheritance property (see Figure 1, we define the strong presence neighborhood as follows: Definition 8 The strong presence n '-neighborhood of a given melody m contains only those segments m " of length n 'from the weakpresence n '-neighborhood, which -in addition to being similar to a prominent segment m' containing m- have the property that the analogous subsegment in m" is also similar to m. The strong presence neighborhood of a given melody m is the union of all of the strong presence n '-neighborhood sets for: n' E]length(m), length(M)]. Pres! (m, M) (10) U {m" U (m', M)m" U(mM)}, m= m m' E PrLto' C7'(m,M) C(m, M) U UR(m',M), m'=m m'CP" U Co(m, M). (12) (13) We calculate also the higher order content neighborhoods by unifying the weak content neighborhood sets. In that way, we can identify also the melodies that are contained within other melodies, similar to the given melody. The weak content neighborhood contains only submelodies of a given melody and similar melodies to those submelodies. However, it is not considered whether the melodies containing those submelodies are similar in analogy to the given melody m or not. In other words, the inheritance property to m does generally not hold for the weak content neighborhood set. In order to guarantee that the inheritance property also holds, we define the strong content neighborhood set of a given melody as follows: Definition 10 The strong content n '-neighborhood of a given melody m contains only those similar segments from the weak content n '-neighborhood, so that the melodies analogously containing these segments are also similar to the given melody. The strong content neighborhood of the union set of a given melody m is the union of all of the strong content n '-neighborhood sets for: n E [minimum melody length, length(m) [. Cong (m,M) U {m" Uin*(m', M)|m" e UJ(m,M)}, m'=m" m'EPrt" (14) Pres(m,M) U {P (m, M)}. (11) Con(m, M) -= U{Con (m, M)}. (15)

Page  00000004 . -... i. r _----- ".._ ____ Figure 2. Prototype Melodies from the Inventio 14 in B flat Major 4. EXPERIMENTAL APPLICATION We tested our model on the Two Part Inventions of J. S. BACH [5]. In these experiments, the minimum length of a melody has been chosen to be 4 notes. The constants has been chosen to be the values 0.1 for ci and 1.0 for C2. To demonstrate the success of the model, we present the results of the Inventio No. 14 in B-Flat Major in a music-theoretical context. 4.1. Inventio 14 in B-Flat Major The soprano part introduces the first prototype melody of the piece, which lasts during the first three measures. This melody appears twice within the whole piece, second appearance is in the sixth measure in the bass part. A shorter prototype melody, which is a segment of the melody mentioned in the previuos paragraph, is repeated three times within the longer melody. The shorter melody is further decomposed in its content neighborhood into a nine notes long melody, and its inversion repeated immediately. The nine notes long melody is repeated six times in the first three measures of the piece. These appearances are alternating, ones in the original form, and ones in the inversion. This melody contains in its presence neighborhood sets the melodies mentioned in the previuos two paragraphs. At the beginning of the fourth measure, a six notes long melody is introduced, which appears alternately in both parts. This melody comes from the first six notes of the nine notes melody. Therefore, it is an element of the content neighborhood of the nine notes melody. The alternating movements are repeated in the beginning of the fifth measure for the last time. The six notes long melody appear then simultaneously in both parts, in the bass part in its original from, and in the soprano inverted. The sixth measure begins similar to the first measure. However the roles of the soprano part and bass part are reversed. The three consecutive measures beginning with the sixth measure are analogous to the first three measures of the piece, but the roles of the voices are exchanged. The following measures repeat the six notes melody a lot of times, both in an alternating manner and simultaneously. In the sixteenth measure, the nine notes long melody appears in both parts overlappingly. The nineteenth measure contains both the nine notes and the six notes melody, and the piece ends in the twentieth measure. Consequently, in the Inventio 14, we observe segment relations in four levels. All of these levels are shown in the Figure 2. 5. CONCLUSION Using the domain of melodic segments enabled the similarity neighborhood model as well as the extensions introduced in the present paper provide music-theoretically interpretable results. The similarity neighborhood model extracts the significant melodies, and the variations of these melodies throughout a given piece. From the paradigmatic music analysis point of view, identifying the similarities between melodies is important. From the music-theoretical viewpoint, the relations between the melodies of different length are explained to understand the development of a composition. The presence neighborhood sets indicate the presence of melodies within the longer melodies. The presence of a short melody within a long one, is the content of the long melody, if the roles of the melodies are inverted. This relation is explained by the content neighborhood sets of the long melody. These roles are interchangeable as long as the thresholds do not matter, i.e. R = 0. As soon as only prototypes are taken into account, these concepts are no longer interchangeable. 6. ACKNOWLEDGEMENTS The author Kamil Adiloglu was supported by the NaFOG Foundation. 7. REFERENCES [1] Adiloglu K., Noll T. and Obermayer K.: A Paradigmatic Approach for Melodic Segmentation: in www.cs.tu-berlin.de/~kamil. [2] Buteau C.: Reciprocity between Presence and Content Functions on a Motivic Composition Space in Tatra Mt.Math.Publ. 23, pp. 17-45, 2001. [3] Freund J. E. and Simon G.A.: Modern Elementary Statistics, Prentice Hall Inc, Englewood Cliffs, New Jersey, 1992. [4] Mazzola G.: The Topos of Music, Birkhuser, Basel, 2002. [5] Ratz E.: Einfiihrung in die Musikalische Formenlehre, Universal Edition, Vienna, 1973.