NEAREST CENTROID ERROR CLUSTERING FOR RADIAL/ELLIPTICAL BASIS FUNCTION NEURAL NETWORKS IN TIMBRE CLASSIFICATION Tae Hong Park Tulane University Music Department 102 Dixon Hall New Orleans, LA 70118 USA ABSTRACT This paper presents a neural network approach for classification of musical instrument sounds through Radial and Elliptical Basic Functions. In particular, we discuss a novel automatic network fine-tuning method called Nearest Centroid Error Clustering (NCC) which determines a robust number of centroids for improved system performance. 829 monophonic sound examples from the string, brass, and woodwind families were used. A number of different performance techniques, dynamics, and pitches were utilized in training and testing the system resulting in 71% correct individual instrument classification (12 classes) and 88% correct instrument family (3 classes) classification. 1. INTRODUCTION Examples of Radial Basis Functions can be readily found in pattern classification applications such speech recognition and prediction [14, 3], phoneme recognition [1], and face recognition [7]. However, they have not been sufficiently explored for automatic timbre recognition research. Considering that there exists only one study with RBFNs [6] and no studies of EBFNs that we know of in machine-based timbre classification, this paper may provide some insights on the prospect and possibilities for RBFN/EBFNs in automatic timbre classification. This paper does not elaborate on feature extraction algorithms or explain RBFN/EBFNs in depth (details can be found in [16]) but rather focuses on the NCC method which automatically fine-tunes the network by spawning additional finer centroids to improve performance of the system. 2. SYSTEM OVERVIEW The architecture of the system is built around a bottomup model with a front-end feature extraction module and back-end neural network training and classification module. A sampling frequency of 22.05 kHz and 2 second excerpts with attack and steady-state portions were used for each of the 829 monophonic samples (86% Siedlaczek Library [2], 14% personal collection). The 12 features that were used for the 12 instruments (elec. bass 30, violin 105, cello 102, viola 75, clarinet 100, flute 99, oboe 55, bassoon 35, French horn 56, trumpet 82, tuba 32 examples) included spectral shimmer, spectral jitter, spectral spread, spectral Perry Cook Princeton University Computer Science Department and Music Department Princeton, NJ 08544 USA centroid, LPC noise, inharmonicity, attack time, harmonic slope, harmonic expansion/contraction, spectral flux shift, temporal centroid, and zero-crossing rate (see [16] for details). Various performance articulations were present in the majority of the samples including pizzicato, spiccato, sordino, long/sustained/short, detache, espressivo, vibrato/nonvibrato, pianissimo, piano, mezzo-forte, forte, and fortissimo with pitches ranging between 1-3 octaves. 3. RBFN/EBFN OVERVIEW 3.1. RBFN/EBFN Characteristics The basic structure of a RBFN/EBFN system is shown in figure 1. Some of the main attributes of a RBFN/EBFN system are the location of the weights found at the output of the basis functions and the characteristic single hidden layer. y(x) output I M () basis functions Xi X2... Xi inputs Figure 1. Basic RBF/EBF Network Exploiting the configuration of activation functions and weights, RBF/EBF networks can take non-linear input spaces and output linear activation outputs, effectively modeling complex patterns which Multi-Layered Perceptrons (MLP) can only achieve through multiple hidden layers [11]. Each basis function consists of a unique centroid, spread, and particular activation function (Gaussian type was used in this paper). The objective in the training phase is to adjust the weights and basis function parameters to reduce the error between the known network outputs and the actual computed outputs. This is determined via gradient descent and back-propagation (see [16] for details). 0
Top of page Top of page