Page  236 ï~~Neural Networks that Learn and Reproduce Various Styles of Harmonization Johannes Feulner Institut flir Logik, Komplexitat und Deduktionssysteme, Universitat Karlsruhe, Germany johannes@ira. uka. de Abstract An approach to extracting the common harmonic properties of a given set of music pieces is presented. The body of pieces is used to train a collection of feedforward neural networks. These networks can then be used to classify pieces of music whether or not they belong to that style with respect to their harmonization. Furthermore, they can be used to produce new harmonizations of melodic material in the style of the given set. The harmonic properties are extracted through an adaptive learning procedure incorporating the backpropagation learning algorithm. 1 Introduction Though there are many books on harmony theory containing many rules of proper harmonizations in certain styles of music, not one of these books gives a complete theory of any style. While some general guidelines like avoiding parallel fifths or using certain cadences at phrase endings are easily agreed upon, it is very difficult to give a detailed account of all the subtle situations that might possibly occur or might have to be avoided. The adaptive properties of neural network models allow for the extraction of structures through a learning process. Rather than defining a set of rules [Ebcioglu, 1988], a corpus of concrete examples is presented to a set of networks that adapt themselves to reflect the relevant structures of this corpus. This brings the advantage that the corpus can easily be changed if one wants to deal with another style of music. There is no need to make a change to the networks besides running the adaptation algorithm. There is of course a price that has to be paid for this convenience. The information learnt by the networks is not explicitly available in human understandable form. It is coded into the network structure. However, rather than giving a description, the networks can reproduce what they have learnt by composing new pieces in the style they have adapted to and they can evaluate how well a given piece adheres to a certain style. Whereas Todd [1991], Mozer [1991] and Freisleben [1992] tried this approach on melodies, we show that it is applicable to harmonies as well. We could thereby get rid of several restrictions Bellgard and Tsang [1991] had to make when tackling this task in a similar spirit using boltzmann machines. 2 Task Description The task our model is to solve is that of learning the proper harmonization of any given melody in a style that is given as a set of concrete music pieces. There is no restriction on the pieces other than that they have a melody and that the accompanying harmonies can be extracted. Examples for that are folk songs annotated with guitar chords or four-part chorales as composed by J.S. Bach. The assumption is, that the harmonization can be represented as a string of chords, with the possibility that one melody note may be accompanied by several consecutive chords. The following examples are taken from HARMONET [Feulner et al., 1992]. They show how the harmonies of a chorale are extracted by first deleting passing and neighbouring notes and then reducing the chord skeleton to Riemann's standard notation (T for Tonic, D for Dominant, Tp for Parallel of the Tonic and so on). This leads to having one harmony per quarter beat (cf. figure la). l B.3 236 ICMC Proceedings 1993

Page  237 ï~~H 2 J I Harmonic Function Inversion Characteristic Dissonances H Figure 1: a (left): Extracting the harmonies of a chorale b (right) Net topology with task specific networks Neural Networks to Learn Sequences of Harmonies The task of learning the harmonies is split into three subproblems. These are learning the basic harmonic function (T, D, DP, and so on) learning the Inversion (i.e. learning the proper bass note) and learning characteristic dissonances that may be added to the basic harmonic function (e.g. a seventh, sixth or fourth). For each task a separate network is being used (cf. figure lb) The networks always work on a local context, a window. They learn to predict the harmony at time t given information about harmonies and the melody from time t-x to time t+x for some small value x. At any time t the following information is extracted from its local context to form one a training pattern for a network: The target to be learned is the harmony Ht accompanying the melody quarter st. The input consists of the harmonic context to the left (the external feedback Ht.3, Ht-2, It.1) and the melodic context (quarters st-1, st, st+1). Furthermore there is information about the relative position of t to the beginning or end of a musical phrase (phrt) and about the information whether st is stressed or not (strt). I-St-1 St Ht3Ht-2t-.1f4 t strt Vat ~ At At st+i [NeuralHt [Net ICMC Proceedings 1993 237 18.3

Page  238 ï~~The Hi, the descriptions of the Harmonies involved are the most important parts of his context. Each Hi consists of three parts: Most importantly, the harmonic function relates the key of the harmony to the key of the piece. The inversion indicates the bass note of the harmony. The characteristic dissonances are notes that are alien to the prevailing harmony, thus giving it additional tension ore color. Each part is dealt with in another network. Thus there is a network for the harmonic function (NI1), a network for the inversion (N4) and one network for characteristic dissonances (N5). The coding of pitch is decisive for recognising musically relevant regularities in the training examples. This problem is discussed in Mozer [1991]. A different scheme is employed, taking the harmonic necessities of homophonic music pieces into account. A note s is represent as the set of harmonic functions that contain s, as shown below: Performance The performance has bee tested on various sets of music from different composers. Typically a set consisted of about 20 chorales that were cut into approximately 1000 windows. Each window was coded into a training pattern. Due to the RPROP algorithm the networks converged in less than 40 epochs. The training time for one set takes less than 1 hour of time on a sparc 10 computer. Doing a harmonization of a chorale with a set of trained networks takes less than 40 seconds. Figure 2 shows two harmonizations produced by Harmonet. The melodies were not part of the training set. An audience of music professionals judged the quality of these and other chorales produced to be on the level of an improvising church organist. The quality of results compares well to non-neural approaches (e.g. [Ebcioglu, 1986]. A more detailed description of technical Fct. T D S Tp Sp Dp DD DP TP d Vtp SS C 1 0 1 1 0 0 0 0 0 0 0 0 D 0 1 0 0 1 0 1 0 0 1 1 1 E.. T, D, S, Tp etc. are standard musical abbreviations to denote harmonic functions. The resulting representation is " distributed with respect to pitch but " local with respect to harmonic functions. This allows the network to anticipate future harmonic developments even though there cannot be a lookahaed for harmonies yet uncomposed. Besides the 12 input units for each of the pitches st-1, st, st+1, we need 12+5+3=20 input units for each of the 3 components of the harmonies, 9 units to code the phrase information phrt and 1 unit for the stress strt. Thus the net has a total of 3*12+3*20*9*1 = 106 input units and 20 output unis. One hidden layer with 70 units was specified. The nets were trained with RPROP, a highly efficient variant of the backpropagation algorithm [Braun and Riedmiller 1992]. In a more advanced version (Figure lb) three nets (NI, N2, N3) are used in parallel, each of which was trained with local contexts of different sizes. The harmonic function for which the majority of these three nets votes is passed to two subsequent nets (N4, N5) determining the chord inversion and characteristic dissonances of the harmony. Using local contexts of different sizes in parallel employs statistical information to solve the problem of choosing an appropriate size of the context. details may be found in [Feulner et al, 1992]. It is possible to judge the adherence of a composition to a given set music by simply comparing the network predictions with the actual harmonies used in the composition. Summing the differences and normalising by the number of Harmonies gives a correlation value. The closer it is to zero instead of one, the better does it conform with this particular style. Conclusions The results presented show that neural networks are capable of dealing with musical real world problems. An ill-defined domain such as music is ideally suited to the capability of neural networks to learn phenomena by adaptation where it is difficult or even impossible to give concise symbolic descriptions. Using harmonic functions, an abstraction well known to musicians as basis for the harmonic representation of pitch makes it possible to let the network foresee the space of possible harmonic developments it has to decide upon. References [Bellgard and Tsang, 1991] Matthew 1. Bellgard, Chi Ping Tsang. Harmonizing Music Using A Network Of Boltzmann Machines. IEEE Computer, July 1991 [Braun and Riedmiller, 1992] Heinrich Braun Martin Riedmiller.RPROP: A FAst Adaptive Learning Algorithm. In International 1 B.3 238 ICMC Proceedings 1993

Page  239 ï~~Symposium on Computer and Informations Sciences VII 1992, Proceedings. E. Gelenbe, U. Halici, N. Yalabik (eds.), pp. 279-286, EHEI Press, Paris 1992 [Ebcioglu, 1988] Kemal Ebcioglu. An Expert System for Harmonizing Four-part Chorals. Computer Music Journal, vol. 12(3): pp. 43-51 [Feulner et al.., 1992] Johannes Feulner, Hermann Hild, Wolfram Menzel. HARMONET: A Neural Net for Harmonizing Chorales in the Style of J.S. Bach. In Advances in Neural Information Porcessing 4 (NIPS 4), R.P. Lippmann, J.E: Moody, D.S. Touretzky (eds.), pp.267-274, Kaufmann 1992 [Freisleben, 1992] B. Freisleben. The Neural Composer: A Network for Musical Applications. Artificial Neural Networks, no 2: pp. 1663-1666, Elsevier 1992 [Mozer, 1991] Michael C. Mozer. Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints. In Music and Connectionism. P. Todd and G. Loy (eds.), pp.195-211, MIT Press 1991 [Todd, 1991] Peter M. Todd. A Connectionist Approach to Algorithmic Composition. In Music and Connectionism. P. Todd and G. Loy (eds.), pp.173-194, MIT Press 1991 Christus, der ist mein Leben S5 6,, 7 8 9 Happy Birthday to You t,5" 5 Â~6 7 8 J - T - 2-S7 pP 3ST 4 T T T3...J 7 T D - I D sg' u -D T S D7 Tp DP T3 S T D T T T3 DP+ Tp D DDs +D DP +Tp D T Figure 2: Two chorales harmonized by Harmonet. The chorales were not part of the training set. ICMC Proceedings 1993 239 18.3