A Timbre Analysis And Classification Toolkit For Pure Data

Brent, William

« Prev Next »

ï~~onsets Figure 3. timbrelD in a training configuration. should be struck a few times at different dynamic levels. For each strike, an onset detector like bonk~ will send a bang message to bfcc--the Bark-frequency cepstral analysis object. Once a training database has been accumlated in this manner, bfcc~'s output can be routed to timbrelD's second inlet, so that any new instrument onsets will generate a nearest match report from the first outlet. A match result is given as the index of the nearest matching instance as assigned during training. For each match, the second outlet reports the distance between the input feature and its nearest match, and the third outlet produces a confidence measure based on the ratio of the first and second best match distances. For many sound sets, timbrelD's clustering function will automatically group features by instrument. A desired number of clusters corresponding to the number of instruments must be given with the "cluster" message, and an agglomerative hierarchical clustering algorithm will group instances according to current similarity metric settings. Afterward, timbrelD will report the associated cluster index of the nearest match in response to classification requests. Once training is complete, the resulting feature database can be saved to a file for future use. There are four file formats available: timbrelD's binary.timid format, a text format for users who wish to inspect the database, ARFF format for use in WEKA1, and.mat format for use in either MATLAB or GNU octave. 3.1. timbrelD settings Nearest match searches are performed with a k-nearest neighbor strategy, where K can be chosen by the user. Several 'WEKA is a popular open source machine learning package described in [4] other settings related to the matching process can also be specified. Four different similarity metrics are available: Euclidean, Manhattan (taxicab), Correlation, and Cosine Similarity. For feature databases composed of mixed features, feature attribute normalization can be activated so that features with large ranges do not inappropriately weight the distance calculation. Specific weights can be dynamically assigned to any attribute in the feature list in order to explore the effects of specific proportions of features during timbre classification or sound set ordering. Alternatively, the feature attributes used in nearest match calculations can be restricted to a specific range or subset. Or, the attribute columns of the feature database can be ordered by variance, so that match calculations will be based on the attributes with the highest variance. Further aspects of timbrelD's functionality are best illustrated in context. The following section describes four of the example patches that accompany the timbrelD package. 4. APPLICATIONS 4.1. Vowel recognition Identification of vowels articulated by a vocalist is a task best accomplished using the cepstrum~ object. Under the right circumstances, cepstral analysis can achieve a rough deconvolution of two convolved signals. In the case of a sung voiced vowel, glottal impulses at a certain frequency are convolved with a filter corresponding to the shape of the vocalist's oral cavity. Depending on fundamental frequency, the cepstrum of such a signal will produce two distinctly identifiable regions: a compact representation of the filter component at the low end, and higher up, a peak associated with the pitch of the note being sung. The filter region of the cepstrum should hold its shape reasonably steady in spite of pitch changes, making it possible to identify vowels no matter which pitch the vocalist happens to be singing. As pitch moves higher, the cepstral peak actually moves lower, as the so-called "quefrency" axis corresponds to period the inverse of frequency. If the pitch is very high, it will overlap with the region representing the filter component, and destroy the potential for recognizing vowels regardless of pitch2 Having acknowledged these limitations, a useful pitchindependent vowel recognition system can nevertheless be arranged using timbrelD objects very easily. Figure 4 shows a simplified excerpt of an example patch where cepstral coefficients 2 through 40 are sent to timbrelD's training inlet every time the red snapshot button is clicked. Although identical results could be achieved without splitting off a specific portion of the cepstrum3, pre-processing the feature 2These qualities of cepstral analysis can be observed by sending cepstrum~'s output list to an array and graphing the analysis continuously in real-time. 3The alternative would be to pass the entire cepstrum, but set timbrelD's 226 0

« Prev Next »