Page  244 ï~~An Object Oriented ARTMAP System for Classifying Pitch Ian Taylor and Mike Greenhough Department of Physics and Astronomy University of Wales College of Cardiff email: ijt@uk.ac.cf.cm Abstract Pitch-determining systems must tolerate wide variations in the spectrum of musical signals. Earlier approaches have involved algorithms based on certain pitch-perception theories. Here, an alternative system is described which uses an Adaptive Resonance Theory neural network called ARTMAP. Such a network can be trained to emulate subharmonic summation but, given a wider variety of training examples, it can learn to cope with more spectrally ambiguous musical signals. The ART networks have been programmed as a collection of objects which can be connected together easily and applied to numerous musical and other pattern-recognition tasks. I Introduction Since musical instrument sounds have a wide variety of spectra, a pitch-determining system must be able to cope with wide variations in the amplitude of, and degrees of inharmonicity in, the frequency components. The success of many proposed methods is often qualified by the need to set parameters, in an ad hoc fashion, so that the results fit empirical data from psychoacoustic experiments, e.g. [Duifhuis, Willems and Sluyter, 1982] and [Terhardt, Stoll and Seewann, 1982]. Other, computer-based, systems have been developed ([Brown, 1992] and [Piszczalski and Galler, 1979]) which use similar methods to that of subharmonic summation [Hermes, 1988]. Although these can be effective, they may have difficulties with spectra which depart significantly from the harmonic ideal The Adaptive Resonance Theory (ART) neural network topology presented here is capable of classifying pitches and chords from a distribution of 'semitone-bins', derived from a Fourier spectrum of harmonics. Such networks can fit psychoacoustic data themselves by associating input signals with the desired output states. 2 Outline of Pitch-Processing System The acoustic input is initially sampled at just over 8 kHz for one eighth of a second. This is then Fourier transformed to produce an amplitude spectrum. This spectrum is mapped on to a distribution of semitone-bins, characterised by a single intensity level representing the total activity within that bin. This is calculated in a similar fashion to that in [Sano and Jenkins 1989]. This semitone-bin distribution is then presented to the ARTMAP neural network. 3 Adaptive Resonance Theory Adaptive Resonance Theory (ART) was introduced in Grossberg [1976a and 1976b], which has led to a number of neural network models. These include ARTI [Carpenter and Grossberg 1987a], ART2 [Carpenter and Grossberg 1987b], ART3 [Carpenter and Grossberg 1990] and ARTMAP [Carpenter, Grossberg and Reynolds 1991]. ART1 selforganises recognition codes for binary input patterns, ART2 does the same for analogue input patterns. ART3 is the same as ART2 but includes a model of the chemical synapse that solves the memory-search problem of ART systems. ARTI, ART2 and ART3 are unsupervised neural networks. Any ART module consists of two fields, F1 and F2, connected by two sets of adaptive connections: bottom-up connections, F 1--F2; and top-down connections F2-+F1. In an ART module, the input pattern is presented to the F1 field which normalises and contrast-enhances features of the pattern. F2 activation is then calculated by multiplying the F1 pattern with the bottom-up weights. Lateral inhibition in the F2 field then finds a winning F2 node. The degree of match between the top-down expectation pattern of the winning F2 node and the F1 pattern is then evaluated in a vigilance test to determine whether 5B.1 244 ICMC Proceedings 1993

Page  245 ï~~it is sufficient. If it is, then learning occurs in both the top-down and bottom-up connections of the winning F2 node, otherwise the winning F2 node is reset and the search continues. ARTMAP is a supervised neural network which consists of two unsupervised ART modules, ARTa and ARTb and an interART associative memory, called a map-field (see Figure 1). a b Self-organize Self-organize Categories for ap Categories for bp Figure 1. A Predictive ART, or ARTMAP, system includes two ART modules linked by an "inter-ART associative memory". Internal control structures actively regulate learning and information flow [Carpenter, Grossberg and Reynolds 19911. In our implementation ARTa self-organises pitch information and ARTb selforganises pitch names. ARTa and ARTb are linked by fully-connected adaptive connections between ARTa's F2 layer and the map-field, and non-adaptive bidirectional one-to-one connections from the mapfield to ARTb's F2 layer. The ARTb network self-organises the 'predictive consequence' or 'desired output' patterns for each input pattern presented to ARTa. A pair of vectors ap and bp are presented to ARTa and ARTb simultaneously. The ARTa and the ARTb networks then choose suitable F2 categories, as described above, and then the map-field checks to see if the ARTarS choice can correctly predict the choice at ARTb. If it can, then outstar learning between Fa2 and the map-field takes place i.e. learning takes place between the mapfield node corresponding to the winning Fb2 node and the Fa2 pattern. Connections to all other Fb2 nodes are inhibited. If not, the map-field increases ARTa'S vigilance so that ARTa does not choose the same category and searches on until either a suitable Fa2 category is found or ARTa chooses an uncommitted node. In this case, learning can always take place. 4 Neural Network Objects The power and flexibility of our system stems from the implementation of various ART networks as a collection of objects (written in Objective C on a NeXT Workstation). These include ART1, ART2, ART2-A [Carpenter, Grossberg and Rosen 1991] and a map-field [Carpenter, Grossberg and Reynolds 1991], all of which can be dynamically created at run-time by a set of simple instructions and then connected together to produce more complicated neural network topologies tailored to fit the task in hand. The user needs no knowledge of the internal workings of each network. Knowledge of network parameters such as vigilance is required but their function in' the network's performance is well defined and guidelines can be given for their use. Here, an ARTMAP network (and object) is created by connecting two unsupervised neural networks called ART 2 -A and ART 1 by a map-field, hence making the resulting system capable of supervised learning (see Figure 1). The ARTMAP object now created can in turn be used in the same way as the other neural network objects to create higher level network topologies that may require the use of many ARTMAP networks. Recently, a general ARTMAP object has been created which can link together any two of the above ART modules by a map-field at run-time. This is accomplished by the re-allocation of objects dynamically so that only the ART modules being used are resident in computer memory. 5 Pitch and Chord Classification by Multi-ART networks Two of our classifying experiments will be outlined here. The first compares training the ARTMAP network with pitches from various instruments to instil a tolerance to the spectral variations, with a method which effectively emulates sub-harmonic summation. The second demonstrates how multi-ART networks can be connected to determine pitch and chords from a guitar. Each of the networks is initially trained with a set of pitched notes and then tested using pitches that were not in the training set. ICMC Proceedings 1993 245 5B.1

Page  246 ï~~5.1 Determining Pitch on a Variety of Instruments Network I was trained with a total of 198 examples of pitched notes from the C-major scale on 10 different instruments in the range C3 to C6. Instruments were chosen to cover a wide variety of spectral shapes so that the network could pick out the characteristic features for each pitched note, and yet acquire an insensitivity to timbre. These instruments included: Soprano, Contralto and Tenor voices, saxophone and French horn, violin, acoustic guitar and sitar, piano, and some examples of whistling. In order to assess the system's robustness, the majority of testing examples were chosen from different instruments but some were taken from instruments from the training set and sung or played with vibrato. The test set altogether consisted of 443 patterns. The network was also tested on the 198 training patterns which showed that it had learned all the pitch examples. Test-set instruments included: Alto and Bass voices, classical, 12-string and electric guitar, recorder, clarinet, mandolin and steel drums, as well as some synthetic sounds (produced by computer) which consisted of a harmonic series without a fundamental. Other test examples were: Soprano, Contralto (x2) and Tenor (x2) voices either sung with vibrato or taken from different singers, saxophone and violin with vibrato, and piano played softly. The network was trained and tested a number of different times with different values of base-rate vigilance (ARTa's lowest vigilance level), number of training presentations and levels of 0 (noise compression). Changing ARTa'S vigilance base-rate will alter the amount of Fa2 categories the network creates. If it is set too low, then too few categories will be formed and the network will have learned too broad a category for each pitch. If it is set too high then the network will learn the individual pitch patterns in too much detail and thus will not be very good at generalising. 0 is set so that it cuts out as much noise as possible without effecting the harmonic information in the spectrum. A number of training periods are required to allow the network to gain a tolerance to spectral-shape variations. This is known as slow learning in ART networks. In a number of experiments, training and testing on this pitch data, we found that no more than 30 training presentations were required to give the desired results. The network was also run 24 times to find the optimum level of 0. A level of between 0.006 and 0.008 was found to be the best. This was consistent with our intuitive estimate that noise - to - signal ratio number of input nodes 0.05 i.e., here, 0 - =0.0065 The results of this experiment are summarised in the table below for different levels of vigilance:Vigilance % Correct % Correct Classification of Classification of Absolute Pitch Chroma 0.5 96.61 99.77 0.6 96.84 99.77 0.7 98.19 99.77 0.8 96.84 99.55 0.9 96.61 99.10 Network 2 was trained to emulate the method of subharmonic summation. This was accomplished by training it with complete harmonic series for each pitch, setting the amplitude of the nth harmonic to be hn = 0.84n'1, as used in [Hermes, 1988], so that higher harmonics contribute less than lower ones. After training, the network learns a normalised version of the above pattern, so that each output node has a total input equal to the sum of the amplitudes of the frequency components in the input spectrum which match the learned template for each pitch. The results of testing this network on the training and test set are summarised below:% Correct % Correct Classification of Classification of Absolute Pitch Chroma Training Set 96.96 99.49 Test Set 97.06 98.19 Network 1 classified all the training examples correctly. This was to be expected as it was trained on them. But, interestingly, in the test set, which neither network had seen before, Network 1 performed better than Network 2 (the method of subharmonic summation) in both absolute pitch and chroma classification. This 5B.1 246 ICMC Proceedings 1993

Page  247 ï~~shows the ability of the ARTMAP network to pick out features in harmonic pitch patterns which are important in the pitch-determining process, rather than just matching pitches against the harmonic series. 5.2 Guitar Chord Classification on an Electric This experiment used two ARTMAPs linked together (as in Figure 1). The first was trained to recognise pitches in a two-octave chromatic scale from C4 upwards on an electric guitar. Tolerance to varying timbre was instilled by playing each note (with a plectrum) both Sul Tasto and Sul Ponticello and on as many different strings as possible. Subsequent testing revealed that the network could correctly classify all notes on the guitar, in this range, regardless of string or plucking position. The pitch information from this network was then fed into a second ARTMAP network which was trained to recognise chords. The network was trained on minor, major, dominantseventh, major-seventh and minor-seventh chords with three different root pitches, strummed with a plectrum mid-way between Sul Ponticello and Sul Tasto positions. In testing, all chords were identified correctly irrespective of strumming position. Furthermore, the output activation of the winning node was similar for chords played in both strumming positions, reflecting the network's acquired insensitivity to timbre when classifying. 6 Summary It has been shown that an ARTMAP neural network can classify pitch and chords from a variety of musical instruments. The implementation of ART networks as objects has proved to be flexible, enabling them to be easily connected to produce multi-ART networks for the classification of pitch and chords. Such an approach can also be applied to other musical problems as well as other pattern-recognition tasks. References J. C. Brown 1992. Musical fundamental frequency tracking using a pattern recognition method. J. Acoust. Soc. Am. 92(3), 1394-1402. G.A. Carpenter and S. Grossberg 1987a. A massively parallel architecture for a selforganizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54-115. G.A. Carpenter and S. Grossberg 1987a. ART 2: Self-organization of stable category recognition codes for analog input patterns. Vol. 26, No.23, pp.4919-4930, Applied Optics. G.A. Carpenter and S. Grossberg 1990. ART3: Hierarchical search using chemical transmitters in self-organising pattern recognition architectures. Neural Networks, 3, 129-152. G.A. Carpenter, S. Grossberg and J.H. Reynolds 1991. ARTMAP: supervised realtime learning and classification of nonstationary data by a self-organizing neural network. Neural Networks, 4, 565-588. G.A. Carpenter, S. Grossberg and D.B. Rosen 1991. ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition. Neural Networks, 4, 493-504. H. Duifhuis, L.F. Willems, and R.J. Sluyter 1982. Measurement of pitch in speech: An implementation of Goldstein's theory of pitch perception. J. Acoust. Soc. Am. 71, 1568-1580. S. Grossberg 1976a. Adaptive pattern classification and universal recoding, I: Parallel development and coding of neural feature detectors. Biological Cybernetics, 23, 121-134. S. Grossberg 1976b. Adaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 23, 187-202. D. J. Hermes 1988. Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83, 257-264. M. Piszczalski and B. F. Galler 1979. Predicting musical pitch from component frequency ratios. J. Acoust. Soc. Am. 66, 710 -720. H Sano and B.K. Jenkins 1989. A neural net model for pitch perception. CMJ, 13,3,41-48. I. Taylor, M. Page and M. Greenhough 1993. Neural networks for processing musical signals and structures. Acoustics Bulletin, 18, 4 (in print) E. Terhardt, G. Stoll and M. Seewann 1982. Algorithm for extraction of pitch and pitch salience from complex tonal signals. J. Acoust. Soc. Am. 71, 679-688. ICMC Proceedings 1993 247 5B.1