/ A Timbre Analysis And Classification Toolkit For Pure Data
ï~~particular sound set. Figure 1 demonstrates how to generate a feature list composed of MFCCs, spectral centroid, and spectral brightness. Subsets of mel-frequency cepstral coefficients (MFCCs) are frequently used for economically representing spectral envelope, while spectral centroid and brightness provide information about the distribution of spectral energy in a signal. Each time the button in the upper right region of the patch is clicked, a multi-feature analysis snapshot composed of these features will be produced. Figure 1. Generating a mixed feature list. Capturing the temporal evolution of audio features requires some additional logic. In Figure 2, a single feature list is generated based on 5 successive analysis frames, spaced 50 milliseconds apart. The attack of a sound is reported by bonk~ [6], turning on a metro that fires once every 50 ms before turning off after almost a quarter second. Via list prepend, the initial moments of the sound's temporallyevolving MFCCs are accumulated to form a single list. By the time the fifth mel-frequency cepstrum measurement is added, the complete feature list is allowed to pass through a spigot for routing to timbrelD, the classification object described below in section 3. Recording changes in MFCCs (or any combination of features) over time provides detailed information for the comparison of complex sounds. These patches illustrate some key differences from the Pd implementation of libXtract, a well developed multi-platform feature extraction library described in [2]. Extracting features in Pd using the libXtract~ wrapper requires subpatch blocking, Hann windowing, and an understanding of libXtract's order of operations. For instance, to generate MFCCs, it is necessary to generate magnitude spectrum with a separate object, then chain its output to a separate MFCC object. The advantage of libXtract's cascading architecture is that the spectrum calculation occurs only once, yet two or more features can be generated from the results. While timbrelD objects are wasteful in this sense (each object redundantly calculates its own spectrum), they are more efficient with respect to downtime. As mentioned above, features are not generated constantly, only when needed. Further, from a user's perspective, timbrelD objects require Figure 2. Generating a time-evolving feature list. less knowledge about analysis techniques, and strip away layers of patching associated with blocking and windowing. In order to have maximum control over algorithm details, all feature extraction and classification functions were written by the author, and timbrelD has no non-standard library dependencies. 3. THE CLASSIFICATION OBJECT Features generated with the objects described in section 2 can be used directly as control information in real-time performance. In order to extend functionality, however, a multipurpose classification external is provided as well. This object, timbrelD, functions as a storage and routing mechanism that can cluster and order the features it stores in memory, and classify new features relative to its database. Apart from the examples package described in the following section, an in-depth help patch accompanies timbrelD, demonstrating how to provide it with training features and classify new sounds based on training. Figure 3 depicts the most basic network required for this task. Training features go to the first inlet, and features intended for classification go to the second inlet. Suppose the patch in Figure 3 is to be used for percussive instrument classification. In order to train the system, each instrument 225
Top of page Top of page