ï~~onsets
Figure 3. timbrelD in a training configuration.
should be struck a few times at different dynamic levels.
For each strike, an onset detector like bonk~ will send a
bang message to bfcc--the Bark-frequency cepstral analysis object. Once a training database has been accumlated
in this manner, bfcc~'s output can be routed to timbrelD's
second inlet, so that any new instrument onsets will generate
a nearest match report from the first outlet. A match result
is given as the index of the nearest matching instance as assigned during training. For each match, the second outlet
reports the distance between the input feature and its nearest match, and the third outlet produces a confidence measure based on the ratio of the first and second best match
distances.
For many sound sets, timbrelD's clustering function will
automatically group features by instrument. A desired number of clusters corresponding to the number of instruments
must be given with the "cluster" message, and an agglomerative hierarchical clustering algorithm will group instances
according to current similarity metric settings. Afterward,
timbrelD will report the associated cluster index of the nearest match in response to classification requests.
Once training is complete, the resulting feature database
can be saved to a file for future use. There are four file
formats available: timbrelD's binary.timid format, a text
format for users who wish to inspect the database, ARFF
format for use in WEKA1, and.mat format for use in either
MATLAB or GNU octave.
3.1. timbrelD settings
Nearest match searches are performed with a k-nearest neighbor strategy, where K can be chosen by the user. Several
'WEKA is a popular open source machine learning package described
in [4]
other settings related to the matching process can also be
specified. Four different similarity metrics are available:
Euclidean, Manhattan (taxicab), Correlation, and Cosine Similarity. For feature databases composed of mixed features,
feature attribute normalization can be activated so that features with large ranges do not inappropriately weight the
distance calculation. Specific weights can be dynamically
assigned to any attribute in the feature list in order to explore the effects of specific proportions of features during
timbre classification or sound set ordering. Alternatively,
the feature attributes used in nearest match calculations can
be restricted to a specific range or subset. Or, the attribute
columns of the feature database can be ordered by variance,
so that match calculations will be based on the attributes
with the highest variance.
Further aspects of timbrelD's functionality are best illustrated in context. The following section describes four of
the example patches that accompany the timbrelD package.
4. APPLICATIONS
4.1. Vowel recognition
Identification of vowels articulated by a vocalist is a task
best accomplished using the cepstrum~ object. Under the
right circumstances, cepstral analysis can achieve a rough
deconvolution of two convolved signals. In the case of a
sung voiced vowel, glottal impulses at a certain frequency
are convolved with a filter corresponding to the shape of the
vocalist's oral cavity. Depending on fundamental frequency,
the cepstrum of such a signal will produce two distinctly
identifiable regions: a compact representation of the filter
component at the low end, and higher up, a peak associated
with the pitch of the note being sung. The filter region of the
cepstrum should hold its shape reasonably steady in spite
of pitch changes, making it possible to identify vowels no
matter which pitch the vocalist happens to be singing. As
pitch moves higher, the cepstral peak actually moves lower,
as the so-called "quefrency" axis corresponds to period
the inverse of frequency. If the pitch is very high, it will
overlap with the region representing the filter component,
and destroy the potential for recognizing vowels regardless
of pitch2
Having acknowledged these limitations, a useful pitchindependent vowel recognition system can nevertheless be
arranged using timbrelD objects very easily. Figure 4 shows
a simplified excerpt of an example patch where cepstral coefficients 2 through 40 are sent to timbrelD's training inlet every time the red snapshot button is clicked. Although
identical results could be achieved without splitting off a
specific portion of the cepstrum3, pre-processing the feature
2These qualities of cepstral analysis can be observed by sending
cepstrum~'s output list to an array and graphing the analysis continuously
in real-time.
3The alternative would be to pass the entire cepstrum, but set timbrelD's
226
0