NEAREST CENTROID ERROR CLUSTERING FOR
RADIAL/ELLIPTICAL BASIS FUNCTION NEURAL
NETWORKS IN TIMBRE CLASSIFICATION
Tae Hong Park
Tulane University
Music Department
102 Dixon Hall
New Orleans, LA 70118
USA
ABSTRACT
This paper presents a neural network approach for
classification of musical instrument sounds through
Radial and Elliptical Basic Functions. In particular, we
discuss a novel automatic network fine-tuning method
called Nearest Centroid Error Clustering (NCC) which
determines a robust number of centroids for improved
system performance. 829 monophonic sound examples
from the string, brass, and woodwind families were
used. A number of different performance techniques,
dynamics, and pitches were utilized in training and
testing the system resulting in 71% correct individual
instrument classification (12 classes) and 88% correct
instrument family (3 classes) classification.
1. INTRODUCTION
Examples of Radial Basis Functions can be readily
found in pattern classification applications such speech
recognition and prediction [14, 3], phoneme recognition
[1], and face recognition [7]. However, they have not
been sufficiently explored for automatic timbre
recognition research. Considering that there exists only
one study with RBFNs [6] and no studies of EBFNs that
we know of in machine-based timbre classification, this
paper may provide some insights on the prospect and
possibilities for RBFN/EBFNs in automatic timbre
classification. This paper does not elaborate on feature
extraction algorithms or explain RBFN/EBFNs in depth
(details can be found in [16]) but rather focuses on the
NCC method which automatically fine-tunes the
network by spawning additional finer centroids to
improve performance of the system.
2. SYSTEM OVERVIEW
The architecture of the system is built around a bottomup model with a front-end feature extraction module and
back-end neural network training and classification
module. A sampling frequency of 22.05 kHz and 2
second excerpts with attack and steady-state portions
were used for each of the 829 monophonic samples
(86% Siedlaczek Library [2], 14% personal collection).
The 12 features that were used for the 12 instruments
(elec. bass 30, violin 105, cello 102, viola 75, clarinet
100, flute 99, oboe 55, bassoon 35, French horn 56,
trumpet 82, tuba 32 examples) included spectral
shimmer, spectral jitter, spectral spread, spectral
Perry Cook
Princeton University
Computer Science Department
and Music Department
Princeton, NJ 08544
USA
centroid, LPC noise, inharmonicity, attack time,
harmonic slope, harmonic expansion/contraction,
spectral flux shift, temporal centroid, and zero-crossing
rate (see [16] for details). Various performance
articulations were present in the majority of the samples
including pizzicato, spiccato, sordino,
long/sustained/short, detache, espressivo, vibrato/nonvibrato, pianissimo, piano, mezzo-forte, forte, and
fortissimo with pitches ranging between 1-3 octaves.
3. RBFN/EBFN OVERVIEW
3.1. RBFN/EBFN Characteristics
The basic structure of a RBFN/EBFN system is shown
in figure 1. Some of the main attributes of a
RBFN/EBFN system are the location of the weights
found at the output of the basis functions and the
characteristic single hidden layer.
y(x) output
I
M () basis
functions
Xi X2... Xi inputs
Figure 1. Basic RBF/EBF Network
Exploiting the configuration of activation functions and
weights, RBF/EBF networks can take non-linear input
spaces and output linear activation outputs, effectively
modeling complex patterns which Multi-Layered
Perceptrons (MLP) can only achieve through multiple
hidden layers [11]. Each basis function consists of a
unique centroid, spread, and particular activation
function (Gaussian type was used in this paper). The
objective in the training phase is to adjust the weights
and basis function parameters to reduce the error
between the known network outputs and the actual
computed outputs. This is determined via gradient
descent and back-propagation (see [16] for details).
0