Page  454 ï~~Modeling the Degree of Realized Expectation in Functional Tonal Music: A Study of Perceptual and Cognitive Modeling Using Neural Networks Dan Gang * Abstract We describe a model of music cognition based on fluctuations in the degree of realized expectation (DRE) in which we employ a neural network which receives representations of standard and anomalous chord progressions derived from opening periods of piano sonatas by Mozart and Haydn. In order to account for essential metric information we incorporate sequential data with temporal information, by integrating a subnetwork with the sequential net. This sub-net imposes metric information into the different states of the sequential net. The system demonstrates how a rhetorically based norm provides a listener with a framework within which a normative DRE is established. The DRE is manipulated by composersto produce musical surprises. In analyzing the results of the network, we observe how much of the target is present in the distribution of the units' activation in the output layer. By considering the distribution, we interpret the implications expected by the output. Over time this forms a model of the dynamic process of expectations. Our model provides a quantitative measure of DRE, and a convincing and compelling model of the processes of recontextualization, refocussing and return to a high *Computer Science Department, Hebrew University Givat Ram, Jerusalem, Israel. ph: 972-2-658-5353, fax: 972-2-658-5439, email: dang@cs.huji.ac.il t Center for Studies in Music Technology, Yale University New Haven, CT 06520, USA ph: 203-432-4164, fax: 203-4327542 email: jberger@csmt.music.yale.edu Jonathan Berger t DRE that follows a musical surprise. 1 Introduction In this paper we describe our implementation of a, model of music cognition based on fluctuations in the degree of realized expectation (DRE) (described in [Ber92]) wherein we employ a neural network which receives representations of standard and anomalous chord progressions derived from opening periods of piano sonatas by Mozart and Haydn. Bharucha's pioneering work in neural net based expectancy modeling in music relies on comparing the target and output. In [Bha9l] the author states: "Learning musical signals is a task for which error signals are both available and necessary. They are available because each event is the target to be compared with the expectation generated prior to that event. They are necessary because the error signal plays an important role in the aesthetic or emotional response to music". We expand on this insight by considering the difference between the target and the output (Bharucha's error signal) not just as the mathematical error that drives the learning process in the net but rather as embodying a special meaning representing a quantitative index of an emotional response to music. We propose that our model is (to borrow Bharucha's term) a human network which charts the dynamic changes of expectation of the listener. Furthermore, we believe that the model may account for how a lis Gang & Berger 454 ICMC Proceedings 1996

Page  455 ï~~tener corrects and recontextualizes expectations during the act of music audition. To this goal we examine the durations, rates, and patterns of change after points of great disparity between a current element of a new tested sequence, and the prediction of the net to the next element in the output layer. Within our input data set of sequential harmonic progressions we classify three general situations: " a normative progression in which a high DRE is maintained, " a progression which contains an anomalous chord. Harmonic anomalies can be of two types, an event that rarely occurs in the tonal literature, or an event which may be a well known syntactic feature but occurs in an unexpected context. " a progression that places a normative chord within a normative sequence in an unexpected temporal position. We distinguish between the latter two cases by naming the first a patent ambiguity, and the second a latent ambiguity. We focus our attention on these two specific categories of surprises, (patent and latent). By modeling them we try to create a mechanism with which to quantify the degree of surprise embodied in a particular event in the sequence. We then attempt to describe cognitive processes inherent in the model. Specifically, the processes of refocussing and rebuilding the context after a drop in DRE. We are thus interested not only in the general distribution of the DRE, but also in the local dynamic processes and the patterns which resulted from the prediction of the net in different states (or stages of surprise). 2 The network design 2.1 Representation and architecture of the network In our first experiment we adopted a three-layer sequential network in which the state units rep resent the context of the current chord sequence, and the output layer represents the prediction of the net for the next chord. The output units represent elements of the twelve pitch classes representing triads or tetrads in the sequence. The output layer is fed back to the state units of the input layer. The current value of the state units is a sum of the value fed back from the output layer with a decayed value of the previous state units value. In our second experiment we integrated a second sub-net with that of the chord sequence learning mechanism. The added sub-net provided a simple metrical organizer by supplying a constant periodic beat stream modulo the meter. The output layer of the meteric sub-net represents the index of the meter for one measure. Each unit in the output is fully connected to the hidden unit layer of the sequential net to influence the prediction of the next chord. The decomposition of the task and the existence of the sub-net for meter will enable us to extend the functionality of this component, to learn changes in the meter. The two integrated sub-nets facilitated learning the sequence of chords, with consideration of the index of the meter that is propagated from the metrical sub-net. 2.2 The set of learning examples The learning examples of sequential harmonic progressions consisted of the three classes of harmonic progressions described above. Our initial data consisted of fifteen examples of harmonic progressions used as the training set. These examples were extrapolated from opening periods of selected piano sonatas by Mozart and Haydn. All the progressions were quantized at quarter note resolutions. All the examples were in duple meter. Most of the sequences consisted of simple balanced periods with harmonic rhythms at the beat level and with changes in accordance with metric constraints. Chords were expressed as vectors of pitch classes where PC 0 was the tonic scale degree of the progression. Although ICMC Proceedings 1996 455 Gang & Berger

Page  456 ï~~we hope to refine our method of representation to include voice leading, our initial experiments simply represented root position functional triadic events. 3 Training and running the net 3.1 The Learning phase For the learning phase, the net was given the fifteen learning examples in the set, and learned to reproduce these examples. We present, each chord in the sequence by representing each chord tone as one of twelve pitch classes in the state and the output layers. The meter, in the second experiment was represented by four units, a unit for each beat in a measure in common meter. We have tested the performance of the network with several different learning parameters. Using 30 hidden units the net was able to learn the task quickly, reproducing the chord sequences with few errors within 100 epochs. 3.2 The generalization phase In the generalization phase the network was given new sequences. Each element in the target sequence was compared against the current prediction of the net in the output layer. In analyzing the output, we consider the distribution of the units' activation in the output layer, noting how much of the target is present as well as activations of other PCs. We interpret these amounts as a quantitative measurement of the D RE. Over time this forms a model of the dynamic process of expectancy fulfilment and disruption. The new sequences for the generalization process were harmonic progressions that we deemed to contain disruptions in expectancy fulfillment. Some of these were created by us as controls. These included introduction of a harmonic event unrelated to the tonal context (a B major chord following a C major chord, See Figure 1), and a radical departure from the duple harmonic rhythm of one or two chords per mea ------------------------------ S..,. Un," *i U. n.... Outpu. m.. U U. 0r.Â~. am. U Â~ m Oupu tar"twt to yet5egitl i>-TTTT- 5 ta tspl t 1 7-T 19 Il t+"twt) twwtwtl IN m i= ttar"ts"It) gltl "two t-" tsegitI tarye It) "twtj "tsegit) tw"t-egitl I, T--T!. twv t tar"twtl Figure 1: Simulation of expectancy fulfillment and disruptions. Time procedes from the bottom-up. The right column represents the input and the left column visualizes the net's prediction. Each of the black squares in each rectangular unit represents the PC of a chord tone. In the left columnn, the size of the squares is proportional to the strength of the units' activity (i.e. the expectatinos). Gang & Berger 456 ICMC Proceedings 1996

Page  457 ï~~sure. The others were derived from Mozart and Haydn sonatas that contained harmonic anomalies of various sorts. A detailed discussion and examples of the results are given in [BG96]. 4 Evaluation of Results The hierarchical network architecture creates a compelling model of cognition (see also [GL95]). The metric index encoding differentiates between identical elements that are sounded in different metric positions. Thus, for example, a tonic triad that appears on a downbeat generates entirely different expectations than one that falls on an upbeat. Furthermore, the functional role of specific harmonies seems to be responded to in meaningful ways. This is most clearly evident in the results of submitting a deceptive cadence as target inupt to the architecture. Here the culminating submediant is perceived as (or, more correctly, it generates the corresponding expectations of) a substitute tonic rather than as a subdominant. Since the network is sensitive not only to what and how changes should occur, but also to when they should occur [LK94], the architecture models the perception of both patent and latent ambiguities. Our model provides a quantitative measure of DRE, and a convincing and compelling model of the processes of recontextualization, refocussing and return to high DRE that follow a musical surprise. 5 Conclusions and discussion The system demonstrates how a rhetorically based norm provides a listener with a framework within which a normative DRE is established. The DRE is manipulated by composers to produce musical surprises. Clearly, harmonic progression and harmonic rhythm are but two aspects of the music we consider. Melodic, rhythmic, dynamic, phrase and articulatory attributes contribute to contexts. These other attributes can themselves carry their own sets of implica tions. They can also support or detract from implications suggested by harmonic progression and/or the harmonic rhythm. In a companion paper [BG96] we analyze and compare the results of our experiments with recent work on expectancy, and cognitive modeling. Studies of musical expectancy reveal processes of pattern detection and comparison. The way a listener organizes and recalls music can be revealed in understanding the way one predicts continuation or completion of a musical progression. Understanding the temporal nature of these dynamic processes arouse numerous theoretical questions for future research. References [Ber92] J. Berger. A model of ambiguity in music. Journal of Computers in Music Research, 2.2, 1992. [BG96] J. Berger and D. Gang. Modeling musical expectations: A neural network model of dynamic changes of expectation in the audition of functional tonal music. In Procs. of the Fourth International Conference on Music Perception and Cognition, Montreal, 1996. [Bha9l] J. J. Bharucha. Pitch, harmony and neural nets: A psychological perspective. In P. M. Todd and D. G. Loy, editors, Music and Connectionism. MIT, 1991. [GL95] D. Gang and D. Lehmann. An artificial neural net for harmonizing melodies. In Procs. of the International Computer Music Conference, 1995, Banff, Canada, 1995. [LK94] B. W. Large and J. F. Kolen. Resonance and the perception of musical meter. Connection Science, 6, no.2-3:177 -208, 1994. ICMC Proceedings 1996 457 Gang & Berger