# A Sound Synthesis by Recurrent Neural Network

Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact mpub-help@umich.edu to use this work in a way not covered by the license. :

For more information, read Michigan Publishing's access and usage policy.

Page 420 ï~~A Sound Synthesis by Recurrent Neural Network Ken'ichi Ohya Nagano National College of Technology, Dept. of Electronics and Computer Science 716 Tokuma, Nagano City, Nagano 381, JAPAN E-mail: ohya@ei.nagano-nct.ac.jp ABSTRACT: Some architectures of recurrent neural networks can be trained to learn spatiotemporal pattern [Pearlmutter, 1989] [Sato, 1990a] and chaotic dynamics [Sato, 1990b] [Sato, 1990c]. Adaptive nonlinear pairs oscillators with local connections, APOLONN, is one of the architectures that was applied for speech synthesis [Sato, 1990c]. This paper describes an application of APOLONN for sound synthesis. An APOLONN is trained to learn waveforms, including fluctuations of amplitude and periodicities, of an acoustic musical instrument. 1 Introduction Some architectures of recurrent neural networks composed of continuous-time, continuous-variable neuron model can be trained to learn spatiotemporal pattern [Pearlmutter, 1989] [Sato, 1990a] and chaotic dynamics [Sato, 1990b] [Sato, 1990c]. Adaptive nonlinear pair oscillators with local connections, APOLONN, is one of the architectures that was applied for speech synthesis [Sato, 1990c]. An APOLONN is trained to learn waveforms, including fluctuations of amplitude and periodicities, of an acoustic musical instrument. This paper describes a sound synthesis system using an APOLONN. Since an APOLONN can also learn complex dynamics, it is not difficult to implement a sound synthesis, using a waveform of an acoustic instrument including natural fluctuations of amplitudes and periodicities, on the architecture. A waveform of a piano tone, known as a mixture of attack noise, simple vibrations and their fluctuations, is used for the teacher signal. Fluctuations of the output are quite natural, because of the architecture's features; that does not memorize fluctuations of the original data, but does chaotic or nonlinear dynamics behind the original. 2 Recurrent Neural Network and APOLONN 2.1 Recurrent Neural Network As a model of individual neuron, I use a continuous-time, continuous-variable neuron model. A pair of the neurons, that is fully connected, can generate very complex dynamics pattern depending on the values of the weight connections, and can be considered as a kind of nonlinear oscillator. Only one pair of recurrent neural networks can be used as a rhythm perception model [Ohya, 1994]. Equations of dynamics of the output of this kind of neurons are given as n dui T = +f(LWijj)+Ii dt j=i where u2(t) is the i-th unit output at a time t, Ti a time delay constant, f(x) a sigmoid function, Ii an external input of the i-th unit, Wi, a connection weight from the j-th unit to the i-th unit. 420 I C M C P R OCEE D I N G'S 1995

Page 421 ï~~2.2 APOLONN Some architectures of recurrent neural networks can be trained to learn spatiotemporal pattern [Pearlmutter, 1989] [Sato, 1990a] and chaotic dynamics [Sato, 1990b] [Sato, 1990c]. Adaptive nonlinear pair oscillators with local connections, APOLONN, is one of the architectures that was applied for speech synthesis [Sato, 1990c]. An APOLONN consists of many pairs of oscillators (Fig.]). Output ).).) ) )Figure 1: APOLONN A pair of oscillators is locally connected with its neighboring pairs, and all neurons are connected to one neuron; the output neuron. Each pair of oscillators generates various kinds of complex patterns depending on its parameters, such as 'r or its weight connections. Since each pair is locally connected to the neighboring pairs, oscillations of a pair are not independent from another. The total system can produce further complex nonlinear patterns that are rich in frequencies. 3 Simulations and Results 3.1 Simulations An APOLONN is trained to learn waveforms, including fluctuations of amplitude and periodicities, of an acoustic musical instrument. (See [Sato, 1990c] for learning algorithm details.) A waveform of a piano tone (A3, 440Hz), known as a mixture of attack noise, simple vibrations and their fluctuations, is used for the teacher signal. The data were sampled in 16-bit integer format at a sampling rate of 44.1 kHz. This part, shown in Fig.2, is relatively flat; 46 periods after the attack. 5016 samples are used for the training. To look into the chaotic dynamics of the data, 3-dimensional phase space trajectory is also shown (Fig.4). The learning process was done by a software simulation on a Sun SPARCstation 10. 20 pairs of oscillators were used in the simulation. ri of each pair was set to slightly different from the neighboring pair. The ratio of the Ti between two neighboring pairs was 0.9. ICM C PROCEEDINGS 199542 421

Page 422 ï~~3.2 Results After 990 iterations, error signal became sufficiently small. An output of the recurrent neural network, shown as Fig3, indicates that the APOLONN can learn some complex temporal pattern. 3-dimensional phase space trajectory, presented as Fig_5, shows that the APOLONN has learned even fluctuations of the original sound data. By changing some parameters, such as -r and weight connections, new sound data is easily synthesized. 0 1.5 1 0.5 0 -0.5 -1 -1.5 A3: 5016 samples (25016-30031) - I I 1000 2000 3000 4000 5000 Figure 2: input data: A3 3 o 1.5 1 0.5 0 -0.5 -1 -1.5 iter=990 - I I 10 20 30 40 50 Figure 3: output 4 Summary I proposed a sound synthesis using an APOLONN, which is one of the architectures of recurrent neural networks. Acoustic piano sound data was used for learning process. After the learning process, the 422 I C M C P RO C EE D I N GS 1995

Page 423 ï~~A:.-30.-2500030000 -- OUPA: imr-990 -- Figure 4: phase space trajectory of training data Figure 5: phase space trajectory of output APOLONN was able to generate acoustic piano sound data with fluctuations of the original data. Sound synthesis can be easily done by changing some parameters, such as -i and weight connections. This synthesis is characterized by its naturally processing fluctuations of the sound. References [Murakami, 1991] Murakami, Y., and Sato, M., (1991). "A recurrent network which learns chaotic dynamics". Proc. of ACNN'91, pp. 1-4. [Ohya, 1994] Ohya, Ken'ichi. (1994). "A Rhythm Perception Model by Neural Rhythm Generators". Proceedings of the 1994 International Computer Music Conference, pp.129-130. [Pearlmutter, 1989] Pearlmutter, B. A. (1989). "Learning state space trajectories in the recurrent neural network". Neural Computation, 1 (2), pp.263-269. [Sato, 1990a] Sato, M. (1990). "A learning algorithm to teach spatiotemporal patterns to recurrent neural networks". Biological Cybernetics, 62, pp.259-263. [Sato, 1990b] Sato, M., Murakami, Y., & Joe K. (1990). "Chaotic dynamics by recurrent neural networks". Proceedings of the International Conference on Fuzzy Logic and Neural Networks, pp.601 -604. [Sato, 1990c] Sato, M., Joe, K., & Hirahata T. (1990). "APOLONN brings us to the real world: Learning nonlinear dynamics and fluctuations in nature". Proceedings of the International Joint Conference on Neural Networks, San Diego, I, pp.581-587. I C M C P ROC EE D I N G S 199542 423