Page  206 ï~~The Musical Expectations of Self-Organizing Neural Networks Michael Page (, Dept. of Physics, University of Wales College of Cardiff, CARDIFF CF1 3TH,WALES, U.K. Abstract: Much has been written about the role of expectation in music. Schmuckler (1989) points out that "almost all contemporary music theoretic analyses have adopted implicit or explicit ideas of expectation". Bharucha (1987) seeks to incorporate these ideas into a general connectionist framework and shows that such an approach is well suited not only to the generation of musical expectations but also to providing an explanation of how such expectations are related to exposure to a particular musical environment. Grossberg et al. have, over many years, developed self-organizing neural networks, with particular emphasis on biological plausibility. This paper shows in detail how such networks can be used to generate melodic musical expectations. Simple neuron-like units organized into a hierarchy of competitive cooperative fields linked by adaptive filters (networks of weighted connections) can learn to recognize, in a stable manner, statistically significant features in a stream of input patterns. The paramount requirement for stability of this process in arbitrary input environments led to the introduction of "top-down expectation patterns" generated by the addition of an adaptive filter from an upper level to a lower level. These expectation patterns can lead to Gestalt-like pattern completion effects and, in the case when the upper level is a so-called masking field, to "predictions" of the way in which a given sequence of events may be completed. Examples are presented showing how such a network, trained in an environment of simple Western tonal melodies, can learn to generate plausible melodic expectations. Further work is suggested, which incorporates harmonic and rhythmic expectations, leading to the development of a system for music generation, which employs the knowledge embodied in the network connections rather than using expert rules or tables of transition probabilities. Introduction: The concept of musical expectation has engendered much discussion in both the fields of music theory and music psychology. Schmuckler (1989) points out that almost all contemporary music theories employ some concept of expectation. Theories such as those of Schenker (1935), Narmour (1977) and Lehrdahl and Jackendoff (1983) have implicit ideas of expectation whereas these ideas are made quite explicit, and even of central concern, in Meyer's (1956,1973) theories of musical comprehension and meaning. Jones (1981,1982) suggested a general framework for the psychology of musical expectation, introducing the concept of an "expectancy scheme" which "captures that private sense of anticipation experienced with an unfolding pattern". She suggests that "interesting" music results from the interplay of "Ideal Prototype Patterns" and "Ordinary Patterns "which deviate from the Ideal in "artful" ways. Dowling, Lung and Herrbold (1987) use the concept of "expectancy windows" to account for the "aiming of attention" at particular points in a melody whilst Bharucha and Stoeckig (1986) study the expectancies generated by a chord within a "priming paradigm". It is generally believed that an individual's musical expectations are most heavily influenced by their musical experience. Jones (1981) describes expectancy schemes being "refined through experience with a culture's musical conventions". Carlsen (1981) showed how differences in expectancy patterns were related to cultural milieu, and Kalmar and Balasko (1987) talk of the ICMC 206

Page  207 ï~~"musical mother tongue" of a set of nursery school children, their fluency being heavily dependent on their exposure to traditional musical material. Bharucha (1987) discusses the role of expectations in both Western music and Indian ragas and introduces a connectionist framework in which priming effects can be described as the spreading of activation over a network of neuron-like units. This approach incorporates concepts of musical memory, "bottom-up (acoustically driven) and top-down (cognitively driven) processes" and learning by passive exposure without the need to resort to explicit grammatical rules. The network described, however, is somewhat ad hoc and this paper will show how the study of musical expectation can be combined with mainstream connectionist (neural network) research. Self-Organizing Neural Networks and Adaptive Resonance Theory: There are many types of neural networks in use today. I have chosen to use a class of networks developed over many years by Stephen Grossberg and his colleagues at the Boston University Center for Adaptive Systems. These networks, unlike other popular types of network such as the Back-Propagation network, have been developed with biological plausibility in mind, the emphasis being on issues of self organization, real-time behavior and stable adaptation in unpredictable input environments. This has led to the introduction of Adaptive Resonance Theory (ART) which describes how networks can overcome the "stability/plasticity dilemma" i.e. the tension between the requirement to be able to learn in a changing environment and the necessity to prevent erasing of established and potentially useful knowledge. It is a huge body of work and I shall only briefly outline those parts which directly refer to the aims of this paper. An excellent introduction to many of the issues introduced here can be found in Gjerdingen (1989). An ART module consists of two layers or fields F1 and F2. Both fields consist of neurons arranged to interact in an "on-center off-surround" manner i.e. each neuron has positive feedback to itself and negative feedback to other neurons in the same field, this feedback being instantiated by the flow of "activity" along weighted connections joining each neuron to itself and other neurons. The behavior of neurons connected in this way can be described by differential equations of the form: dx = -Ax +(B-x) E- (C+x) I dt where x represents the activity of a neuron, which is bounded between the values B (>0) and -C(<O), A represents a decay term, and E and I are the total excitatory and inhibitory inputs to the neuron respectively. E consists of a combination of excitation from a lower field and the neuron's self-excitation, whereas I consists of the inhibition from neighboring neurons. By adjusting the nature of the interactions between neurons in a field various behaviors leading to dynamic equilibrium can be simulated. In the simplest ART pattern recognition module a pattern of activation is instantiated across the lower field's neurons leading to excitation of neurons in the field above via bottom-up excitatory connections. The structure of the upper field leads to a competition for activation between the neurons in this field. That neuron receiving the largest amount of bottom-up excitation will win the competition and its activation will saturate at or near B. The activation of losing neurons will tend towards -C. The winning neuron is said to have classified or categorized the input pattern. This arrangement by itself is not sufficient to guarantee stable classification in an arbitrary input environment, so a further set of connections, top-down excitatory connections from F2 to F1, must be introduced. These encode prototype patterns corresponding to each "committed" upper level node. Thus, if a neuron in F2 wins the competition in response to a pattern across Fl, its prototype pattern is compared with the input pattern and the classification is only accepted if there is a degree of match between the two which exceeds a parameter known as the vigilance. If the degree of match falls short of the vigilance then the activation of the winning neuron is reset to zero and held there until a new input pattern is presented. Meanwhile the competition is "re-run" until a satisfactory classification is found. This may use a previously ICMC 207

Page  208 ï~~uncommitted F2 neuron. When such a bottom-up/top-down match is achieved the circuit enters a resonance, during which the excitatory connections between active neurons can be adjusted (in this case strengthened) thus increasing the likelihood of the same F2 neuron classifying the same or similar F1 pattern the next time it is presented. This process allows the categories, and the prototypical patterns corresponding to those categories, to be learnt. It should be noted that "topdown" feedback or priming was introduced to ensure that stability constraints were met but that once established the feedback can influence the pattern of activation at Fl. This can lead to Gestalt-like pattern completion effects and, in certain circumstances, "predictions" of the likely unfolding of a pattern in time. There are various versions of the ART module: ART1 (Grossberg,1987) allows the classification of binary input patterns; ART2 (Carpenter and Grossberg, 1987) classifies binary or analogue input patterns whereas ART3 (Carpenter and Grossberg, 1990) models neurotransmitter dynamics and is well suited to the distributed classification of continuously varying analogue input patterns. Dynamic Short Term Memory and Masking Fields: Grossberg (1982) has shown how F1 can be configured to behave as a dynamic short term memory, which models STM in humans. Cohen and Grossberg (1987) describe a modification of F2 which they call a "masking field". In the masking field each F2 neuron is connected to a small subset of the F1 neurons, set J. The "size" of an F2 neuron is proportional to the size of set J, i.e. the number of F1 neurons to which it is connected, and larger neurons "dilute" the effect of their inputs to a greater degree. F2 neurons inhibit each other to a degree proportional to the overlap between their respective F1 subsets. A masking field is able to "parse" a pattern of short term memory at F1 into familiar subgroups in a manner similar to the parsing of a sentence into its constituent words. In doing so, the activation of F2 neurons representing longer subgroups can "mask" the activation of neurons corresponding to shorter groups e.g. the neuron corresponding to the word "myself" can mask neurons corresponding to words "my","self" and "elf" (Cohen and Grossberg, 1987) but the neurons responding to the words "my selfishness" will mask all of these neurons if the sentence develops in this way. Indeed as the sentence develops towards this end the masking field neurons corresponding to the longest utterance will be able to compete increasingly well and the activation of the neurons representing the complete utterance "my selfishness" may precede the completion of the utterance. In this way, the top-down prototype pattern corresponding to this neuron is able to prime the STM field F1 to "expect" the pattern completion. For illustration, I have described the way in the STM/masking field module relates to modelling the perception of strings of phonemes, words, phrases, sentences etc, indeed this is the task for which the masking field was developed. It is not difficult, however, to see how a similar approach can be taken to model the phenomenon of, say, melodic expectation: there is a common theme of input streams partitionable into familiar subgroups creating expectation of pattern completion or resolution. The application of masking fields to issues of melodic expectation is the subject of my current work. Current Work: I am currently running simulations consisting of a masking field which forms the top layer of an ART3 network. The input to the ART3 network is from a dynamic STM field as described in Grossberg (1982) which, in the preliminary simulations, is configured to have a "transient memory span" of 3 items. The musical input consists of a set of twelve common nursery rhyme melodies, whose musical information has been simplified to make it suitable for input to the STM. The nature of these simulations and results from them will be discussed at the conference. Further Work: The simulations described above highlight some shortcomings of the dynamic STM/ ICMC 208

Page  209 ï~~masking field model in its basic form. These include issues relating to the representation of repeated notes, rhythm and accent in STM, the size of masking field required, the possibilities of inhibitory learning and the stability of learning in fast input environments. Some of these problems have been discussed in Marshall (1990) and Nigrin (1990) and I will be looking to incorporate their ideas in future simulations. Conclusions: My initial work and the further ideas suggested in the above articles provide a solid framework for the discussion of models of musical expectation. This framework need not only apply to melodic expectation, but can be extended to describe the harmonic expectation, experienced during the unfolding of a chord progression, and even to rhythmic/metric expectation. The fact that this approach is constructed on a basis of biological plausibility and self-organization adds to its appeal as an alternative to traditional Al approaches to music, such as those involving expert systems and systems based on probability generators. My ultimate goal is to use models of musical expectation to produce a system capable of spontaneous music generation. Bharucha, J.J. Music Cognition and Perceptual Facilitation: A Connectionist Framework. Music Perception, Fal11987, Vol.5, No.1, 1-30. Bharucha, J.J. and Stoeckig,K. Reaction Time and Musical Expectancy: Priming of Chords. J. of Experimental Psych., Human Perception and Performance, 1986, Vol.12, No.4, 403-410. Carlsen, J.C. Some Factors which Influence Melodic Expectancy, Psychomusicology, Spring 1981, 12-29. Carpenter, G.A. and Grossberg, S. ART 2: self organization of stable category recognition codes for analog input patterns, Applied Optics, Vol.26, No.23, 1987. Carpenter, G.A. and Grossberg, S. ART3: Hierarchical Search Using Chemical Transmitters in Self-Organizing Pattern Recognition Architectures, Neural Networks, Vol.3, No. 2, 129-152. Cohen, M.A. and Grossberg, S. Masking Fields: A Massively Parallel Neural Architecture etc. Applied Optics, Vol.26. 1866-1891, 1987. Dowling, W.J., Lung, K.M. and Herrbold, S. Aiming Attention in Pitch and Time in the Perception of Interleaved Melodies. Perception and Psychophysics, 1987, 41(6), 642-656. Gjerdingen, R.O. Using Connectionist Models to Explore Complex Musical Patterns. Computer Music Journal, Vol.13, No.3, Fall 1989, 67-75. Grossberg, S. Studies of Mind and Brain, Dordrecht: D. Reidel, 1982. Grossberg, S. Competitive Learning: From Interactive Activation to Adaptive Resonance. Cognitive Science, 11, 23-63, 1987. Jones, M.R. Music as a Stimulus for Psychological Motion: Part I. Some Determinants of Expectancies. Psychomusicology, 1981, Vol 1, No. 2, 34-51. Jones, M.R. Music as a Stimulus for Psychological Motion: Part II. An Expectancy Model. Psychomusicology, 1982, Vol 2, No. 1, 1-13. Kalmar, M. and Balasko, G. Musical Mother Tongue and Creativity in Pre-school Children's Melody Improvisations. Bulletin of the Council for Research in Musical Education, Spring 1987, 77-86. Lehrdahl, F and Jackendoff, R. A Generative Theory of Tonal Music. MIT Press, 1983. Marshall, J.A. A Self-Organizing Scale Sensitive Neural Network. Proceedings of the IJCNN, San Diego, June 1990, Vol.111, 649-654. Meyer, L.B. Emotion and Meaning in Music. University of Chicago Press, 1956. Meyer, L.B. Explaining Music: Essays and Explorations. Univ. of California Press. 1973. Narmour, E. Beyond Schenkerism: The need for alternatives in musical analysis. Univ. of Chicago Press 1977. Nigrin, A.L. The Stable Learning of Temporal Patterns with an Adaptive Resonance Circuit. Ph.D Thesis, Duke University, 1990. Schenker, H. Der Freie Satz, 1935, trans Ernst Oster, New York: Longman 1979. Schmuckler, M.A. Expectation in Music: Investigation of Melodic and Harmonic Processes. Music Perception, Winter 1989, Vol.7, No.2, 109-150. ICMC 209