Page  00000120 The use of melodic segmentation for content-based retrieval of musical data Massimo Melucci and Nicola Orio Universita di Padova Dipartimento di Elettronica e Informatica Via Gradenigo 6/a - 35131 - Padova - Italy {melo,orio} ABSTRACT The research work reported in this paper merges the musical knowledge about melodic segmentation with content-based techniques borrowed from Information Retrieval. The aim of this work is to design and implement a musical IR system. The: system gives the final user the possibility of accessing a musical collection using queries based on melodic excerpts, rather than on textual bibliographic data only. The basic idea is an assumption about the existence of a duality between textual and musical information retrieval. The notation-based representation of music suggests the possibility to use the potentialities of IR to employ some of the standard principles that hold in textual domain to the musical domain. An experimental study reports on the results of indexing and retrieval tests using a well-known information retrieval statistical model. We believe that such a musical content-based indexing and segmentation technique may be integrated with current musical document data bank management systems to improve their effectiveness. 1 Information Retrieval The growth of Web and the advent of search engines have been letting researchers in many disciplines to use and know information retrieval (IR) technologies since early nineties. However, research work in IR and operational IR system implementation dates back to early fifties when computers allowed to automatically process abstracts and bibliographic records. The main aim of a IR system remains the same - to retrieve in real-time the very small set of documents being relevant to user's queries from a very large collection of unstructured documents. Retrieval is content-based since the matching between queries and documents takes place on the basis of their own semantic content. The core process of a IR system is indexing. Indexing is the process which creates an index from a document collection to represent the semantic content of the collection itself, as well as of queries. It is indexing that transforms a set of keywords into a data structure with the representational power needed to describe semantic content. Every IR system function is based on index and index encapsulates all the query and document semantics. No operation involving semantic content takes place without considering index. There are many sources about IR, the main ones being [1, 2, 3]. The research work reported in this paper merges the musical knowledge about melodic segmentation with contentbased information indexing and retrieval techniques. The aim of this work is to design and implement a musical information retrieval system. The system gives the final user the possibility of accessing a musical collection using queries based on melodic excerpts, rather than on textual bibliographic data only. The basic idea is an assumption about the existence of a duality between textual and musical information retrieval. The notation-based representation of music suggests the possibility to use the potentialities of information retrieval to employ some of the standard principles that hold in textual domain to the musical domain. 2 Automatic Music Indexing and Retrieval Musical language is complex and structured. A given composition may be described by its melody, harmony, rhythm, and style. Not all of these features are relevant for the typical final user; in addition the user may not have the necessary knowledge about music language and notation to interact with a music information retrieval system - 120 - ICMC Proceedings 1999

Page  00000121 based, for instance, on harmony or style. In this work it is proposed to describe a composition just using its melodic profile. This feature seems to be the more suitable for a user, who may not know bibliographical informations or harmonic structures, but may remember some of the musical phrases or the main melody of a composition. Musical content-based retrieval requires the indexing of documents and queries. However, the problem of this kind of indexing is twofold. Although listeners perceive music as a sequence of musical phrases, the musical notation does not highlight "lexical units" like text words. Since musical language lacks of separators, it is necessary to recognize boundaries between different phrases of the melody and to point them out by introducing separators. Musical phrases may have a number of different variants, due for instance to modulations or to the presence of grace notes. Phrases with these variants, which may appear in the same composition as in different works (by the same composer or by different composers), are perceived as similar. A system for musical IR should take into account this peculiar characteristic. This paper presents a technique to index musical documents and queries to allow musical content-based retrieval. The technique works in three steps. The first step is an algorithm that takes as input a Standard MIDI File, extracts from the file all the melodies played by different instruments/channels, performs a segmentation based on the detection of melodic surfaces associated to musical phrases. The output of the algorithm consists of a textual representation of the input MIDI file that can then be used as input to an automatic indexing process that is the second step. The second step is normalizing the musical phrases. Normalization can be performed both on note durations and on melodic intervals. Hence musical phrases may be described at different levels of accuracy, to overcome the problem of the presence of variants. The third step is indexing. Through indexing, phrases are extracted from and assigned to documents and queries. These phrases are then used at searching-time as surrogates of the complete documents from which they have been extracted. In order to accomplish searching tasks, phrases assume that representational power that gives meaning to document and query content by assigning textual attributes or numerical weights to phrases. 3 Melodic Segmentation for Automatic Music Indexing To perform the segmentation of melodies in lexical units, that is in musical phrases, it has been used the Local Boundaries Detection Model (LBDM) proposed by Cambouropoulos [4]. According to this model, listeners perceive the change from a musical phrases to another one, that is the presence of boundaries in a melodic surface, whenever there are some changes in the relationship among the notes. Hence a musical phrase can be bounded by a note with a duration, or which forms an interval, different from the surrounding notes. The LBDM detects the boundaries of a melodic surface by giving a weight to all the notes in a melody: the weight is related to the probability that a boundary may occur after that note. The weight values depend on: (i) the relationship among the musical intervals that each note forms with the previous and the subsequent notes, (ii) the relationship among note durations of two subsequent notes, and (iii) the presence of musical rests. Notes which have the higher weights are selected as boundaries in the melodic surface. In this way it is possible to introduce separators of lexical units, like in textual documents. It is hence possible to apply methods developed for textual IR, considering the detected melodic surfaces as the analogous of terms in textual documents. 4 Normalization of Musical Phrases In Section 2 it was highlighted that, one of the problems in the indexing of musical documents, is the presence of variants in musical phrases. Lexical units pointed out by the segmentation algorithm may not be equal but may be perceived similar by the user, that is they may carry the same content. In this work it is proposed to overcome partially this problem by introducing two kinds of normalizations in the representation of musical phrases. Normalization regards both note durations and note intervals. First of all it is performed a Pitch Transposition: all the notes are transposed, in order to have each musical phrase beginning with the same note. Then it can be performed a Duration Normalization: note durations are expressed in multiples of their Greatest Common Divisor, so that what becomes important is the relationship among durations. Since variants of musical phrases may regard small changes in the pitch contour, due for instance to modulations, also a Pitch Normalization can be performed: pitches are quantized in a number of different levels, related to the musical intervals (unison, from minor second to major third, from perfect fourth to major sixth, and so on). Finally it is possible to consider only the pitches, when Duration Removal is performed. The experimental study presented in this work was performed using four different combinations of these nor ICMC Proceedings 1999 - 121 -

Page  00000122 malizations. In particular it was chosen to combine the performed normalizations in order to have an increasing generality of the musical phrases. The four combinations are: Pitch Transposition with durations unchanged (VT); Pitch Transposition and Duration Normalization (PTDN); Pitch Normalization and Duration Normalization (PNDN); Pitch Normalization and Duration Removal (PNDR). 5 Experimental Study The experimental study we carried out aimed to describe the impact of the normalization technique on the results of musical document retrieval. We expect that the degrees of normalization at indexing-time are means to tune the degree of exhaustivity and specificity of musical document retrieval. If this hypothesis is true, we can therefore use those means to give the user the capability at query formulation time of deciding the quantity of extraneous musical material that is likely to be retrieved, and the quantity of pertinent musical material that is likely to be missed. The experiments have been conducted in a laboratory setting using a collection of musical documents and queries as testbed. The experiments are supposed to give us some useful insights to design and implement a prototype musical IR system which can be employed within user-based experimentations. The set of test documents consists of 419 musical works, each work is stored into a MIDI file. Many works include textual descriptions about, for example, time, tonality, or title. While the full musical data has been used to index documents, the textual data have not been used as we were interested on studying musical content-based retrieval on the basis of musical data only. Documents are pieces of Tonal Western Music (TWM), and specifically complete musical works of Baroque, Classic and Romantic periods, such as movements of Concertos or Sonatas of various lengths - from 3 up to 35 minutes - and of six different composers (G.F. Hindel, J.S. Bach, W.A. Mozart, L. van Beethoven, F Listz, P.I. Tchaikovskij). We created 4 different versions of the same document set corresponding to the 4 normalization criteria. On setting queries up, we kept two objectives in mind. The first objective has been to describe the statistical behaviour of retrieval results, and therefore we needed a quite large number of queries. The second objective has been to understand the behaviour of retrieval results from a musical point of view. We performed a quantitative analysis by computing numerical measures. The aim has been to study the relationships between the quantity of retrieved musical documents as some independent factors change. We set up a query set to perform the quantitative analysis. The set consists of different versions of a set of 419 document incipites which have been extracted automatically from the document set. One version has been created for each of six different query lengths corresponding to six numbers of starting tokens - i.e. 419 queries have automatically been created using x starting phrases, provided x = 1, 2, 3,4, 5,10. One version has also been created for each of the four normalization criteria. We were then provided with (4 x 6) different versions of the set. The IR model employed to construct indexes and retrieve documents was the vector-space model (VSM). Accordingly to the VSM model, both documents and queries are represented as k-variate vectors [2]. Table 1 reports the numerical parameters about queries segmented and normalized. In the Table, the average number of unique phrases per query and the average number of queries being indexed by phrase are reported for each normalization level, and for each incipit query length. Table 1 summarizes test results for the analysis performed using queries against the whole document set. For each normalization level, and for each incipit query length, two values are reported - the average number of retrieved document of which composer is different from the query author (first row) and rverage number of retrieved documents (second row). The reported values are directly correlated to the incipit query length, i.e. the longer the incipit query, the higher the number of retrieved documents and the number of different composers. This can be explained by the higher chance that a long query token occurs within a document than a short query token. This may also be explained by the fact that the number of documents indexed by an individual token is rather low, and that the individual tokens index different sets of documents. A direct correlation can also be observed between the reported values and the level of normalization. Normalized tokens allows for the retrieval of higher quantity of documents, perhaps of different composers. Specifically, PNDR gives the highest increase of number of retrieved documents and of different composers. Actually, normalization have been designed to conflate similar phrases and periods together in order to have larger sets of documents being indexed by each token. The combination of two facts - the low decrease of the average number of unique tokens per query as normalization proceeds (Table 1), and the increase of the average number of documents being indexed by an individual token - can explain the order of magnitude of the reported values. In particular, it is interesting to note that documents of around three different composers, i.e. around one tenth of the - 122 - ICMC Proceedings 1999

Page  00000123 Incipit Query Length 1 2 3 4 5. 10 PT Av. phrases/query 1.0 2.0 3.0 3.9 4.8 9.2 Av. queries/phrase 1.1 1.1 1.2 1.2 1.2 1.2 N. does different author 1.2 2.0 2.6 2.9 3.1 3.5 N. retrieved doecs 8.0 13.2 17.9 20.7 22.3 26.1 PTDN Av. phrases/query 1.0 2.0 3.0 3.9 4.8 9.1 Av. queries/phrase 1.1 1.2 1.3 1.3 1.3 1.4 N. does different author 1.4 2.0 2.7 3.1 3.3 3.8 N. retrieved does 9.1 13.4 18.1 21.3 23.3 27.4 PNDN Av. phrases/query 1.0 2.0 2.9 3.8 4.6 8.6 Av. queries/phrase 1.2 1.4 1.5 1.5 1.5 1.6 N. does different author 1.8 2.5 2.9 3.3 3.5 3.9 N. retrieved does 11.4 16.0 19.5 21.9 23.7 27.6 PNDR Av. phrases/query 1.0 1.9 2.8 3.6 4.4 7.8 Av. queries/phrase 2.1 2.6 2.9 3.1 3.3 3.5 N. does different author 3.0 3.3 3.6 3.7 3.8 3.9 N. retrieved does 21.0 23.9 25.6 26.8 27.5 29.2 Table 1: Main results of the analysis developed on the test collection total retrieved quantity, can be retrieved on average. It is also worth of mentioning that the two tested device - the level of normalization - allow for the retrieval of documents at different degrees of specificity and exhaustivity at every incipit query length. 6 Conclusions and future work The results given by our approach are encouraging. If the query belongs to a musical work written by a composer included in the collection, then top-ranked documents are usually written by the same composer. In particular, if the musical work is included in the collection, it is presented to the user with the highest score. As we lack of test collections being usually available in textual IR, we had to construct an ad-hoc set of experimental documents and queries. We consider these laboratory-based experiments as the first step towards an experimental environment where many "real" final users shall be involved. The construction of a set of test relevance judgements or the employment of a large number of human judges were too expensive tasks and out of the scope of this research. However, we think that a test collection for musical IR experimentation should be one of next steps. We are going to design and implement a prototype system for musical IR based on our methodology. The prototype will be used by Web standard browser. References [1] C.J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. [2] G. Salton and M.J. McGill. Introduction to moden Information Retrieval. McGraw-Hill, New York, NY, 1983. [3] W.B. Frakes and R. Baeza-Yates, editors. Information Retrieval: data structures and algorithms. Prentice Hall,, Englewood Cliffs, NJ, 1992. [4] E. Cambouropoulos. Musical rhythm: a formal model for determining local boundaries. In E. Leman, editor, Music, Gestalt and Computing, pages 277-293. Springer-Verlag, Berlin, 1997. ICMC Proceedings 1999 - 123 -