Page  53 ï~~Interactive music composer based on neural networks Masako Nishijima and Kazuyuki Watanabe Human Interface Laboratory, Fujitsu Laboratories Ltd. 1015 Kamikodanaka Nakahara-ku Kawasaki 211, JAPAN ABSTRACT Neuro-Musician is an interactive music composer. It uses neural networks to learn a musical style so that, it can play an accompaniment or compose its own music. After being taught several pairs of input melodies and expected output melodies. Neuro-Musician canl play a session with a h, uman musician. Even if the musician plays something unfamiliar, Neuro-Musician responds with a novel and reasonable output. It may even play a completely unexpected phrase that is both novel and exciting. We experimented by having Neuro-Musician lly a jam session with a jazz pianist. To compose improvisations, the neural networks had to learn approximately 30 eight-measure patterns. 1. INTRODUCTION A computer with musical sense can help a hunlan perform musical activities. Given a phrase, for example, such a computer composes a)propriate example of phrases that follow or substitute for it. These examples are helpful for human to compose or play. Neuro-Musician is a computer with musical sense that it acquires by learning a musical style using neural networks. Musical styles are difficult to describe with rules. One of the best ways for a computer to acquire a musical style is to give it actual music. We attempted to teach instances of music to neural networks. 2. NEURAL NETWORK A neural network can be considered as a simple model of the human brain. The network is taught some pairs of input 0 0----------0 data and corresponding desired output data. This teaching 0 0----------0 fixes the strength of connection between nodes in the network, 0 0-- -- which determine its behavior. The network can then calculate *"..". an output for any input. For example, when the network 00- -- is given an input it knows, it generates the output it was 0 0----------0 taught. As the neural network is taught pairs of inputs and Input Hidden Output Desired output desired outputs, it acquires generalized relationships between them. When given an input it does not know. it generates an appropriate output using these generalized relationships. The Figure 1: Neural Network neural network we used is a three-layer hierarchical network that consists of an input layer, a hidden layer, and an output layer. Each layer has several neuron units (Figure 1). The neural network learns generalized relationships by the error backpropagation method [Rumelhart, et al. 1986]. 53

Page  54 ï~~3. HISTORY The "Neuro-Drummer" was our first research project and ran from 1988 to 1989 [Nishijiina 1989, Rheingold 1991. Nishijima 1991]. We attempted to teach a sense of rhythm to a neural network, because rhythm is one as)ect of musical style. After the neural network had learned about forty pairs of an input rhythm pattern and an output rhythm pattern, the professional drummer conceded that the Neuro-Drummer had improved greatly, and that it usually replied with interesting rhythms. The "Neuro-Musician" was our next research project [SM 1992[. During this project, we taught a sense of melody to neural networks. Learning this sense was more difficult, because it involved a lot of factors including pitch, duration, harmony, and rhythm, and because these factors influence each other. The Neuro-Musician takes the place of one musician in an adlib session in which two players take turns playing. The first player plays a piece of music for several measures, and then the second player, the Neuro-Musician, replies to it. Each player needs to accept and adapt to the other player's musical style. Neuro-Musician's replies cannot be random, and they must make musical sense and be artistically satisfying. 4. MUSIC RECOGNITION MODEL We investigated how a professional musician approaches an adjib session, and what factors are important in playing a phrase eight or sixteen measures long. The musician we interviewed listed the following four major factors: " Contour (outline) of the melody. Â~ Pitch change and rhythm (These are used to compose a melody that satisfies the contour). " Note-on timing. * Chord progression and available note scales. When the musician plays an adlib session, he considers all these factors together. For each piece of music. chord progression and available note scales can be determined. Relationships between the first three factors (contour, pitch and rhythm, and note-on timing) are determined when a musician plays phrases. The musician does not determine these relationships logically, but unconsciously. A musician will naturally play example phrases when he explains these relationships. Dm7 G7 (a) Original Melody (b) Contour -.00011 1 root 5th 5th 3rd 7th 3rd Figure 2: Contour We made a music recognition model based on the factors above and discussion with the musician. Our music recognition model is an algorithm that lets a computer process music examples like a human musician would. We pay attention to three loose relationships: between contour of a melody and contour of the immediately following melody, between the contour of a melody and its pitch change and rhythm, and between rhythm pattern and note-on timing. Three factors are sampled as follows: " Contour is described by the first, central, and final notes in each measure using chord construction (Figure 2). In Figure 2, first measure contour is sampled root, 5th note. and 5th note of D minor 7th (Dim7) instead of D, A, and A#. 54~

Page  55 ï~~" Pitch, means the difference between two consecutive notes. Rhythm is in units of a sixteenth of a note with strength (velocity). A number of notes are also sampled from the melody. We call the number of notes note density. * Note-on timing is the sampled difference between punctual playing to a musical score and real playing in units of the MIDI (Musical Instrument Digital Interface) timing clock. 5. IMPLEMENTATION OF MUSIC RECOGNITION MODEL Three kinds of network were used to implement our model, and these networks cooperate to generate an output melody (Figure 3). The first network was taught pairs of an input melody contour and a desired output melody contour. This network consisted of 48 units in the input layer, 60 in the hidden layer, and 48 in the output layer. The second network was taught pairs of contour, note density, pitch, and rhythm (duration and accent). The second network consisted of four sub-networks. Each subnetwork generated two-measure melody data and had 24 units in the input layer, 40 in the hidden layer and 32 in the output layer. The third network was taught pairs of rhythm patterns and note-on timing. This network had 8 units in the input layer, 8 in the hidden layer an~d 13 in the output layer.::,,,.,::: '! i,,/ N o t e ][ O u t p u t /:;(d l:! E I - - density pitch InputiOutput utput oteo contour coturhhmiing Figure 3: Implementation of the Music Recognition Model 6. SYSTEM CONFIGURATION This system ran on an FMR-70 32-bit personal computer and used MIDI to connect instruments to the computer. When a human musician plays an eight-measure melody, MIDI signals are generated and transmitted to the computer. The MIDI signals are converted to input data for the neural networks. The output from the neural networks is converted back into MIDI signals and these signals are transmitted to the sound generator to l)roduce music (Figure 4). Prerecorded music for bass and drums are synchronized with system. 7. EXPERIMENT AND EVALUATION We experimented by having Neuro-Musician play a jam session with a jazz pianist. To compose improvisations, the neural networks had to learn approximately 30 eight-measure patterns from the musician. This experiment was an eight measure session, and the theme was "Satin Doll" by Duke Ellington. When the human plhays eight measures of a melody, the neuro-musician replies with eight measures. This experiment revealed an unexpected constraint onl thle music. In this experiment, we can get very exciting "Jam Session" between the jazz lpianist and the neuro-musician. So it is evaluated that output of neural networks is effective in playing jam session. The neuro-niusicianl can play replies that the musician hikes. However, uninatural 55

Page  56 ï~~Figure 4: System Configuration replies came out and we found several reasons why the computer did not generate humanized music in this trial. A computer can generate arbitrarily long or fast phrases, resulting in music that clearly did not come from a human player. This kind of music rarely pleases or excites human listeners. We must teach our neural networks.some further human characteristics, such a~s limits on hand movement and breathing intervals. It is important, for example, to consider a relationship between length of phrases in the performance and breathing intervals. 8. CONCLUSION An interactive music composer was created by investigating the musical factors involved in an adlib jazz session with a human musician. Three kinds of neural network were able to generalize the relationships between musical factors by learning instances of music. In the future, we are going to refine our music recognition model by adding further human characteristics. 9. ACKNOWLEDGMENTS We would like to thank Dr. Shuzo Morita, Koich Murakami and Masanori Kakimoto for fruitful discussions during the development of the system and we are especially grateful to Takashi Miyazawa and Junichi Tamaru for their performances. 10. REFERENCES [Rumelhart, et al. 1986] D. E. Rumelhart, G. E. Hinton, R. J. Williams, "Learning Internal Representations by Error Propagation," Parallel Distributed Processing, vol.I, pp. 318-364, MIT Press, 1986 [Nishijima 1989] M. Nishijima, Y. Kijima. "Learning on Sense of Rhythm with a Neural Network -THE NEUR() DRUMMER-,." Proceedings of The First International Conference on Music Perception and Cognition, October, 1989 [Rheingold 19911 Howard Rheingold, "From Neuro-Drummers to First-Person Fantasies," VIRTUAL REALITY, pp. 294-299, SUMMIT BOOKS, 1991 [Nishijima 1991] M. Nishijima, K. Murakami, "The Neuro-Drummer."Journal of the society of instrument and control engineers, pp. 344-347, Vol.30, No.4, April 1991 (In Japanese) [SM 19921 "RETI NEURALI." STRUMENTI MUSICALI, pp. 54-59, WINTER 1992 (In Italian) 56