Page  59 ï~~Learning Musical Structure and Style by Recognition, Prediction and Evolution Dominik Hornel and Thomas Ragg Institut fur Logik, Komplexitat und Deduktionssysteme, Universitat Karlsruhe, D-76128 Karlsruhe, Germany dominik@ira uka. de Abstract We present an approach for modelling the recognition, prediction and evolution of musical structure and style based on unsupervised and supervised learning techniques. Unsupervised learning is proposed to classify musical structure leading to a distributed representation of structural elements. Prediction in time may be learned by neural networks. A multi-scale neural network model is presented which uses the above representation scheme to pick up and reproduce global structure from music examples. To improve the extraction of style-dependant features, an evolutionary neural network approach is suitable. All of these elements are integrated into the system MELONET II, which is able to produce and harmonize simple folk style melodies. 1 Prediction Various neural network models have been developed to successfully predict musical sequences occurring locally in time, e.g. harmonic progressions or melodic variations. They are not able however to capture musical structure occurring over longer time periods. Consider the folk song depicted in figure 1. Although folk songs like this are built by very simple means, they already show many characteristics of melodies invented by composers [de la Motte, 1993; Hirnel, 1993]. Most significant is the hierarchical organization of musical material. Structural elements like pitch, harmony, motif and phrase structure simultaneously occur and develop on different time scales. However top-down approaches will fail to model this kind of melody if they do not take care of the linear progression of structure in time. To address this problem, some approaches connect neural networks operating on different time scales [Todd, 1991; Freisleben, 1992], others [Mozer, 1991; Feulner et Homel, 1994] construct a reduced description of the sequence using hidden units that operate with different time constants in order to make global aspects more readily de tectable. Although these approaches are able to pick up structure that cannot be learned by standard networks, the results still lack coherence. This leads to the conclusion that global structure must be represented in a more explicit way. The multi-scale neural network model presented here is able to learn global structure from music examples. It comprises an assembly of mutually interacting neural networks operating on different time scales. The main idea of the approach is a combination of unsupervised and supervised learning techniques to perform the given task. Unsupervised learning classifies and recognizes musical structure, supervised learning is used for prediction in time. Figure 2 shows a simplified structure of the model. Classification leads to a distributed representation of superordinate structural elements, here motifs (motif recognition component). The arrangement of these elements is learned by a higherscale network used for motif prediction. Motifs are then passed as "proposals" to another network working on a lower time scale. The task of this network is to implement the distributed representation into subordinate elements (e.g. intervals/notes) depending on the proposal and the time context (feedback) of the network (note prediction). The result of this implementation is returned to the m I 1. i. m I. Harmonic Structure Motif Structure Phrase Structure T D D T a' b B T D DT c a A q u T D D T a' b" B' T C' Figure 1 Structural elements in German folk song "Hiinschen klein" ICMC Proceedings 1996 59 Hornel & Ragg

Page  60 ï~~higher-scale network to be considered at a later moment. Thus communication in the sense of proposal and response takes place between two independant networks reflecting the hierarchical melody structure. 2 Recognition In order to realize learning on different time scales as described above, we need an appropriate recognition component which is able to find a suitable classification of musical structure. Furthermore, we would like to have a distributed coding of classified elements to represent similarity between them (as denoted by symbols a, a' and B, B' in figure 1). We propose unsupervised learning to solve the problem. Possible learning algorithms are for instance agglomerative hierarchical clustering [Duda et Hart, 1973] or Kohonen's topological feature maps [Kohonen, 1990]. To use the first technique, an appropriate distance measure is needed. We have developed a system AMA (Automatic Motif Analysis) which determines the distance between small sequences of notes/intervals by functionally transforming one into the other. The functions used are standard music transformations like pitch transposition, inversion and reversion. The result of hierarchical clustering is a dendrogram that allows comparison of classified elements on a distance scale. Figure 3 shows the result of motif classification (fixed motif length is one measure) using an interval representation. The distribution of motifs may be calculated as follows. Let d(m, m2) be a (symmetric) distance measure computing the distance between two mo0,9 0,8 0,7 0,6 0,4 0,3 0,2 feedback I feedback 'response" Figure 2 Simplified structure of MELONET II tifs mi and m2. The average distance d(m, M) between a motif m and a set of motifs M is given by d(m,M) =_Xd(m, x)/IMI, where IMI is the number of motifs in M. Let K,,..., Kn be n motif classes obtained from the classification dendrogram by selecting an appropriate level on the similarity scale. Then the n -didistance scale 6 5 4 3 2 1 0 0m i m 7h1 rr II I LII l ~ I i I 1 I I! I. j.i I i 1 I I I 1 1 I I..i I I ' i, Figure 3 Classification tree and corresponding distributed motif representation (n=3) for melody "Hanschen klein" Hornel & Ragg 60 ICMC Proceedings 1996

Page  61 ï~~mensional distribution vector v1i(m) is computed pendant features from given examples. Small netas follows: works will not be able to perform the learning task at all. The question arises, how a neural network topology adhering to a given musical style can be Si if 3j: d(m, K1) = 0 found automatically. vli(m)= _.d(m, K.)We present an evolutionary neural network apotherwise proach overcoming this difficulty. It combines the _,d(m, K,) advantages of two powerful systems, HARMONET and ENZO, that have been developed at our instiThis distribution formula determines how tute: much motif classes are activated by a given motif. HARMONET [Feulner, 1993] comprises a colThe most active class is the one the motif belongs lection of feedforward networks able to learn and "reproduce harmonizations of given melodies, e.g. to. The sum of all activations vIv(m) is 1. The four-part chorale harmonizations in the style of j_- J.S. Bach. ENZO [Braun et Weisbrod, 1993; Ragg motif distribution corresponding to the dendrogram et al., 1996] represents a hybrid approach for optiis displayed in the upper left corner of figure 3. mizing neural networks by evolution and learning Three motif classes were distinguished (distance using genetic algorithms to add and remove level - 2.5). weights and units from a given network. The opFigure 4 shows the development of the mean timization process may be controlled by several pasquare error on the learning and test set during the rameters as the performance or the size of the netlearning process of the network used for note pre- work, but also by musical parameters. The resulting system is able to evolve a given diction. The learning/test set was gained from HARMONET yewr sn h NOagrtm HARMONETnetwork using the ENZO algorithm. 13/12 children songs. Classification was done us- We tested it on several learning/test sets representing various musical styles, e.g. baroque styles of harmonization from composers like Bach and 'n.. t,_ te Pachelbel. The experiments show that the evolved.a...... tto--. networks perform about 7 % better on the learning,.. -,..- mtavaLxt r lteais--,.- -" -*.... ----... and test set. The number of weights of the networks has been considerably reduced. Depending on the style, more or less units are removed from the 0. input layer reflecting the influence certain features do or do not have on the style of harmonization. ''.4-Thus evolution leads to highly performant and con"' _.siderably smaller networks. The table below shows the network topologies (input-hidden-output with two hidden layers) and performance (number of coro _ _, _, _ _ _, _rectly classified harmonies) before/after evolution....... 0 00 starting with a feedforward topology 102-40-12. Figure 4 Learning and test error development for network without/with/with distributed motif information ing the learning set. The network with no motif but additional context information is not able to generalize well. The network with the distributed motif representation further reduces the error of the network with non-distributed motif information. 3 Evolution Neural Networks have been proved to be successful at learning musical tasks, e.g. finding harmonizations or inventing melodies. However to design a neural network successfully performing the learning task for several musical styles, further considerations can be made concerning size and performance of the network. Fixed-size networks have considerable difficulties learning different musical styles depending on corresponding learning/test sets. Large networks tend to learn by heart instead of extracting style-de Style evolved original per- new perfortopology formance mance Bach major 98-38-12 78% 86% Bach minor 91-37-12 82% 88% Pachelbel 85-37-12 80 % 88 % major The style of harmonization provided by the evolved networks was judged by experts to be of high quality coinciding well with the originals. Furthermore small networks take much less time and memory to perform their tasks and can therefore easily be integrated into time-critical applications or consumer products. 4 Performance The learning techniques presented until now have been integrated into the system MELONET II. It is an extension of MELONET [Feulner et Htmel, 1994], a system that can harmonize and do ICMC Proceedings 1996 61 Horne) & Ragg

Page  62 ï~~melodic variations that are bound to harmonic contexts. Given a short beginning of a melody, MELONET II invents a folk song style continuation and a harmonization in a style learned by the evolved HARMONET networks. The network used for motif prediction (supernet) makes a "proposal" based on the motif context. The note prediction network (subnet) makes its decisions due to this proposal and additional context information. The result is then classified by the motif recognition component and considered by the supernet when it makes its next prediction. To test the generalization behavior of the network model, we trained it with about 20 folk songs and compared the results to melodies produced by a reduced version of the model consisting of the subnet only which is equivalent to a simple standard network. The network model was able to perform the learning task whereas the standard network could not learn some of the melodies. Furthermore melodies produced by the network model show considerable better coherence which demonstrates that unsupervised learning is able to find significant global structure to simplify the learning task. Figure 5 shows a "folk song" composed by MELONET II. 5 Conclusion We have presented learning approaches to model three important elements of music perception, i.e. recognition, prediction and evolution. A distributed representation of musical structure was integrated into a multi-scale neural network model in order to find global structure. Genetic algorithms were able to evolve musical style by improving size and performance of the networks. Further research will have to consider structural elements of variable length to add more flexibility to the model. References [Braun et Weisbrod, 1993] H. Braun, J. Weisbrod. A II P__E Evolving Feedforward Neural Networks. Proc. of the 1993 International Conference on Artificial Neural Nets and Genetic Algorithms [de la Motte, 1993] D. de la Motte. Melodie: Ein Lese- undArbeitsbuch. dtv/Barenreiter 1993 [Duda et Hart, 1973] R.O. Duda, P.E. Hart. Pattern classification and scene analysis. J. Wiley and Sons, New York, 1973 [Feulner, 1993] J. Feulner. Neural Networks that Learn and Reproduce Various Styles of Harmonization. Proc. of the 1993 International Computer Music Conference, ICMA Tokyo 1993 [Feulner et Hornel, 1994] J. Feulner, D. Hornel. MELONET: Neural Networks that Learn Harmony-Based Melodic Variations. Proc. of the 1994 International Computer Music Conference, ICMA Arhus 1994 [Freisleben, 1992] B. Freisleben. The Neural Composer: A Network for Musical Applications. Artificial Neural Networks, no. 2: pp. 1663-1666, Elsevier 1992 [HoSrnel, 1993] D. Hornel. SYSTHEMA - Analysis and Automatic Synthesis of Classical Themes. Proc. of the 1993 International Computer Music Conference, ICMA Tokyo 1993 [Kohonen, 1990] T. Kohonen. The Self-Organizing Map. Proceedings of the IEEE, Vol. 78, no. 9, pp. 1464-1480, 1990 [Mozer, 1991] M.C. Mozer. Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints. In Music and Connectionism. P. Todd and G. Loy (eds.), pp. 195-211, MIT Press 1991 [Ragg et al., 1996] T. Ragg, H. Braun, H. Landsberg. A Comparative Study of Optimization Techniques. Submitted to ICML 1996 [Todd, 1991] P.M. Todd. A Connectionist Approach to Algorithmic Composition. In Music and Connectionism. P. Todd and G. Loy (eds.), pp. 173-194, MIT Press 1991 rn - rr "r fi r rr i" drL I F r 7 F 'F r ' Figure 5 Folk song style melody composed by MELONET II and harmonized by HARMONET (Bach style) (the first two measures of the melody were given which did not belong to the learning set) Hornel & Ragg 62 ICMC Proceedings 1996