Page  00000495 Melodic Modeling: A Comparison of Scale Degree and Interval Yipeng Li Department of Computer Science and Engineering, The Ohio State University David Huron School of Music, The Ohio State University huron. 1 Abstract Statistical models of Western melody were created as inspired by traditional music-theoretical conceptions. One model was based on pitch interval measurements; a second model was based on scale-degree successions. Both models distinguished between major and minor scale contexts. Both models were used to calculate the aggregate probability for over 8,000 test melodies. Paired results for both models showed that the scale-degree succession model produced lower aggregate probabilities than the interval model for vast majority of test melodies. A de novo Markov model was also created using a scale-degree representation. This Markov model was shown to perform better than either of the two expert-inspired models. Introduction Musical modeling can serve a number of goals. For example, modeling might be used to help better understand the perception of music by providing testable hypotheses arising from theoretical presumptions. In addition, musical modeling can be useful in computational applications, such as automatic music classification and music information retrieval. In the psychology of music, melodic modeling has attracted attention as part of efforts to better understand musical expectation. A number of different views of expectation have been formulated over the past two decades, based on empirical research (Coons and Kraehenbuehl 1958; Goldstone 1979; Schellenberg 1996, 1997; Schellenberg et al. 2002; Schmuckler 1989, 1990, 1997; see Huron (2006) for a review). Some theories, such as that produced by Narmour (1990, 1992), are non-computational theories largely inspired by Gestalt principals of perception. Margulis (2005, Margulis and Levine 2004) has reformulated such principals in a more computational form. In recent years, considerable evidence has been assembled in support of statistical learning for both pitch sequences (Saffran et al. 1999) and rhythm (Desain et al. 2003). These results have led to more data-driven models of expectation (Krumhansl et al. 1999; Eerola 2004; see Huron (2006) for a review). In modeling any phenomenon, two general approaches might be distinguished. One approach is to solicit informed opinions from domain-related experts as to how they conceive of the phenomenon and subsequently to build models that reflect these professional conceptions. Since experts can sometimes hold partial or incomplete understandings of their domain, a second approach might use generic algorithmic tools that model a phenomenon with little regard for traditional knowledge (what might be called a "de novo" approach). In the case of melodic modeling, most research has applied de novo approaches that pay little attention to how the experts view the phenomenon. In this paper we propose to approach the problem of melodic modeling using the former approach, that is, we will build models that are inspired by conceptions that broadly reflect two general ways musicians tend to conceive of melody. We will then compare these two approaches and evaluate their effectiveness. In addition, in light of the recent research successes related to statistical learning, we will compare the results from these two models with a de novo Markov model. To anticipate our results, we will show that, while the musical models have some predictive success, a de novo approach employing a Markov model is actually superior. A musical melody is commonly conceived as a sequence of pitches. For musicians, however, this is not the best way to conceive of a melody. A melody can be transposed to different pitch levels without losing its perceived identity. This suggests that a melody can be adequately described in relative, rather than absolute terms. For Western musicians, two systems are used to characterize relative pitch. One system regards melodies according to pitch distance or interval. When a melody is transposed to a different pitch height, the intervals remain constant, even though the pitches are changed. Apart from the interval conception, musicians also commonly think 495

Page  00000496 of melody tones as tones in some scale, such as the Western major or minor scale. In this case, pitches can be represented as tonic solfa syllables (do, re, mi, fa, etc.). Musicians prefer to use the term "scale degree" for such scale-related representations. In Western music, scale degrees are denoted using the numbers 1 through 7. In the scale-degree succession conception, the parallel to the "melodic interval" is a pair of successive scale degrees. Where an interval representation might express a particular tone relationship as a "major sixth", the corresponding scaledegree succession representation might be "1-6". Another difference between interval and scale degree representations is the tendency in the latter representation to collapse all tones to a single-octave representation. For example, the scale degree succession "4-2" might also represent the interval of a major sixth, since the second scale degree may occupy the octave above the initial tone. When musicians conceive of melodies in scale-degree terms, they often pay little attention to the octave in which the tones appear. Apart from the distinction between interval and scale-degree succession, musicians also emphasize the difference between the major and minor modes. While there are a number of similarities between music in major and minor keys, musicians view the major and minor contexts as having a significant impact on the pattern of pitch successions. Accordingly, we might distinguish four different conceptual models: interval structures in the major mode, interval structures in the minor mode, scale degree successions in the major mode, and scale degree successions in the minor mode. melodies. One set of statistical models of Western melody was based on interval-related frequencies of occurrence, and the second set of models was based on the frequencies of occurrence of various pairs of successive scale degrees (without regard to octave). In concrete terms, we implemented the interval model as follows. Given a note sequence, (N1, N2,..., NM), (Ni is the MIDI number of the ith note and M is the number of notes in a melody), the interval model can be formulated as: M-1 p(N1,N..NM =N p(A1, A,, a -2) = n p(A i) i=-1 (1) where Ai Ni+l - Ni is the interval and p(NV, N2,..., N M) is the probability of the sequence. The intervals are assumed to be independent from each other, i.e., only the first-order dynamics of a melody is considered. p(A) is approximated by the relative frequency of A for melodies with the same mode over a database mentioned in section 3. The scale degree succession model can be modeled as: M-_1 P(NIN2""...NM)- p p(Ni+1Ni) i=l (2) Approach As noted, scale degree succession and interval are the preeminent tonal representations used by Western musicians. In modeling musical melodies, one might ask whether one of these representations is superior to the other. There are a number of ways of comparing different models. A relatively straightforward approach is to compare the aggregate probability of melodic sequences as represented by the models. Using a sample of real melodies, we might use statistical models based on either scale-degree succession or interval representations to calculate the aggregate probability of a given melodic sequence. A model may be said to be more successful if it represents the same melody as being more predictable. This approach assumes that a more parsimonious representation of a phenomenon has a greater likelihood of capturing the actual underlying generative principles-an assumption that we acknowledge is open to debate. In practical terms, we implemented the scale degree succession model and interval model as a set of probabilities based on an analysis of events in a large database of musical During the evaluation, Ni is mapped to its scale degree based on the key of the musical piece. Therefore p(Ni+l, Ni) captures musicians' conception of melody as scale degree succession. It is assumed that each pair of scale degrees is independent of each other. In this model p(Ni+l, Ni) is approximated by the relative frequency of pair (Ni+l, Ni) for melodies with the same mode and tonic over the same database. Since p(Ai) N= p(Ni + A,, Ni), p(Ai) essentially is the marginal distribution of Ni+l given NVi+ - Ni = A-. According to the probability theory, the marginal distribution is always greater than the joint distribution, i.e., p(Ai) > p(Ni+l, Ni), one may therefore expect that the interval model will be always better than the scale degree succession model. However, in the scale-degree succession model, p(Ni+l, Ni) is estimated from melodies with the same tonic while in interval model p(Ai) is estimated from all melodies regardless of the tonic. With the additional information of tonic, the scaledegree succession model might be more informative and discriminative. Also during the estimation of p(Ni+l, Ni), all notes are collapsed to an octave, this mapping may or may not improve the predictive power of the scale-degree succession model depending on the distribution of notes in different octaves. Musical Sample Statistical measures were gathered using the Essen Folksong Collection (Schaffrath 1995). This collection consists 496

Page  00000497 of over six thousand notations of traditional Germanic folksongs assembled from ethnomusicological sources spanning both eastern and western Europe. The encoded music is available in the Humdrum format (Huron 1995), which includes key designations for each melody. The melodies are encoded in a notation-like form analogous to a musical score, that is, the source music has not been interpreted or performed. The melodies were segregated according to mode: major mode melodies (n = 5,416) and minor mode melodies (n = 754). Separate statistics were calculated for major and minor. For each melody, the musical interval between successive notes was calculated. Interval calculations were suspended for those notes interrupted by a notated rest. Intervals were calculated in semitones. Separate statistics were tallied for ascending and descending intervals. In the case of scale degree, scale tones were determined with respect to either the major scale or the harmonic minor scale. Accidentals (tones outside of the scale) were also distinguished. In the case of scale degree, we calculated the frequency of one scale tone followed by a second scale tone. Octave information was collapsed into a single octave, consequently no explicit contour (up/down) information was retained. Each melody also identified the tonic, and this tonic information was used in order to determine the scale degree assignments. In total, the statistics for both scale degree succession and interval were based on over 150,000 notes. The test sample was drawn from an independent database containing over eight thousand Finnish folksongs (Eerola and Toiviainen 2004). As in the Essen Folksong Collection, this collection also distinguished melodies in major and minor modes. Some melodies in this collection have an indeterminate key designation, or no specified key. These latter melodies were excluded from consideration. In total, a test sample contains 4,900 melodies in the major mode and 3,349 melodies in the minor mode. For each melody we calculated the aggregate probability using the appropriate major or minor interval model, and the appropriate major or minor scaledegree succession model. each scale degree. The seventh row shows the probabilities for the scale-degree dyads. For the first phrase of the folksong "Es taget minnecliche" the aggregate interval log probability is -18.1143, and the aggregate scale-degree succession log probability is -29.7107. The differences in probabilities suggests that an interval representation may be a better predictive model for this particular work. Aggregate interval and scale-degree succession log probabilities were similarly calculated for each of the melodies in our test sample. Once again, different models were used depending on whether the melody was a priori classified as major or minor. For 98.4% of the melodies using the major mode, the interval model proved to be superior to the scaledegree succession model. For 99.6% of the melodies using the minor mode, the interval model again proved to be superior. These results shows that the presence of the tonic information in the scale-degree succession model is not sufficient to overcome the inherent superiority of the interval model. As noted in the introduction, the initial motivation for this study was to examine the predictive efficacy of models derived from existing musical conceptions of melody. As an addendum to this work, we decided to investigate the predictive efficacy of a de novo model based on the commonly used Markov models found in other domains involving sequential prediction. Markov models and hidden Markov models have been widely used in many computational tasks, such as pitch tracking (Wu et al. 2003). The representation used for this Markov model was based on scale degree information. In effect, the Markov model adds the zeroth-order scale-degree probabilities to the first-order scale-degree succession probabilities. Using a first-order Markov model, a melody might be characterized as: M-1 p(NI, N2 *...* NM) = pN1) r9 Ni+1ýNi) i-1 (3) Test Result Table 1 illustrates sample outputs of the two models. The table shows the openning phrase from the folksong "Es taget minnecliche", a work in the key of C major. The pitches from the first phrase are shown in the top-most row, followed by the MIDI key-number equivalents. The third row shows the interval distances for successive pitches in semitones; ascending (+) and descending (-) intervals are distinguished. The fourth row shows the probabilities for each interval. The fifth row identifies the scale degrees, according to the scale of C major, and the sixth row shows the corresponding probabilities for Note Ni is mapped to its scale degree during the evaluation. The Markov model and the interval model characterize different aspects of a melody. The Markov model evaluates the probability of the second note conditioned on the first one given a note pair, while the interval model evaluates the marginal distribution of the second note given an interval. The conditional probability, p(Ni+1 Ni), can be either greater or smaller than, or equal to the marginal distribution p(Ai), depending on how the second note of a pair is related to the first one. Also note that the evaluation of conditional probabilities requires the zeroth-order scale degree information. In order to evaluate this model, we again calculated the aggregate log probability for the 8,249 test melodies. In this case we compared the Markov model against the earlier interval model. For 99.4% of the melodies using the major 497

Page  00000498 Table 1: Sample outputs of the two models Pitch MIDI Interval p(A) Scale degree p(Ni) p(Ni+1, Ni) G4 67 G4 67 0 0.2005 5 0.1815 0.0409 C5 72 +5 0.0458 1 0.1058 0.0038 B4 71 -1 0.0870 7 0.1824 0.0456 A4 69 -2 0.2010 6 0.1470 0.0538 C5 72 +3 0.0519 1 0.1058 0.0087 D5 74 +2 0.1364 2 0.2065 0.0209 B4 71 -3 0.0595 7 0.1824 0.0333 A4 69 -2 0.2010 6 0.1470 0.0538 5 0.1815 mode, the scale-degree Markov model proved to be superior to the interval model. For 98.4% of the melodies using the minor mode, the scale-degree Markov model proved to be superior. The addition of the zeroth-order scale-degree probabilities renders the ensuing model superior to the interval model. Discussion In general, our study suggests that accuracy in melody prediction is improved when the following types of information are available: (1) the tonic and mode (major/minor) of the predicted melody, (2) the zeroth-order probabilities of scale degrees, and (3) higher-order probabilities of scale degree successions, Since we did not study a Markov model employing an interval representation, we do not know whether interval is superior to scale-degree in predicting melodies. In general, the relative success of Markov models is consistent with recent research pointing to the importance of statistical learning in music. Acknowledgments: This research was supported in part by an AFOSR grant (F49620-04-1-0027) and an AFRL grant (FA8750-04-1-0093). References Coons, E. and D. Kraehenbuehl (1958). Information as a measure of structure in music. Journal of Music Theory 2, 127-161. Desain, P., H. Honing, and M. Sadakata (2003). Predicting rhythm perception from rhythm production and score counts: The Bayesian approach. Paper presented at the Society for Music Perception and Cognition 2003 Conference. Eerola, T. (2004). Data-driven influences on melodic expectancy: continuation in north Sami Yoiks rated by South African traditional healers. In S. D. Lipscomb, R. Ashley, R. O. Gjerdingen, and P. Webster (Eds.), Proceedings of the 8th International Conference of Music Perception and Cognition, pp. 83-87. Evanston, Ill.: Casual Productions. Eerola, T. and P. Toiviainen (2004). Suomen Kansan eSivelmit. Finnish Folk Song Database. Available: Goldstone, J. A. (1979). A general mathematical theory ofexpectation models of music. Ph. D. thesis, University of Southern California. Huron, D. (1995). The Humdrum Toolkit: Reference Manual. Stanford, California: Center for Computer Assisted Research in the Humanities. Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, Massachusetts: MIT Press. Krumhansl, C. L., J. Louhivuori, P. Toiviainen, T. Jirvine, and T. Eerola (1999). Melodic expectancy in Finnish folk hymns: Convergence of statistical, behavioral, and computational approaches. Music Perception 17(2), 151-195. Margulis, E. H. (2005). A model of melodic expectation. Music Perception 21(4), 663-714. Margulis, E. H. and W. H. Levine (2004). Melodic expectation: A priming study. In S. D. Lipscomb, R. Ashley, R. O. Gjerdingen, and P. Webster (Eds.), Proceedings of the 8th International Conference of Music Perception and Cognition, pp. 364-366. Evanston, Ill.: Casual Productions. Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. Chicago: University of Chicago Press. Narmour, E. (1992). TThe Analysis and Cognition of Melodic Complexity: The Implication-Realization Model. Chicago: University of Chicago Press. Saffran, J. R., E. K. Johnson, R. N. Aslin, and E. Newport (1999). Statistical learning of tone sequences by human infants and adults. Cognition 70, 27-52. Schaffrath, H. (1995). The Essen Folksong Collection. In D. Huron (Ed.). Stanford, California: Center for Computer Assisted Research in the Humanities. Schellenberg, E. G. (1996). Expectancy in melody: Tests of the implication-realization model. Cognition 58, 75-125. Schellenberg, E. G. (1997). Simplifying the implicationrealization model. Music Perception 14(3), 295-318. Schellenberg, E. G., M. Adachi, K. T. Purdy, and M. C. McKinnon (2002). Expectancy in melody: Tests of children and adults. Journal of Experimental Psychology: General 131(4), 511-537. Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music Perception 7(2), 109-150. Schmuckler, M. A. (1990). The performance of global expectation. Psychomusicology 9, 122-147. Schmuckler, M. A. (1997). Expectancy effects in memory for melodies. Canadian Journal of Experimental Psychology 51(4), 292-305. Wu, M., D. L. Wang, and G. J. Brown (2003). A multipitch tracking algorithm for noisy speech. IEEE Trans. Speech Audio Processing 11, 229-241. 498