ï~~sible within Pd using GEM for two- and three-dimensional
plotting. In the provided example, the axes of the space can
be assigned to a number of different spectral features, including zero crossing rate, amplitude, frequency, or any of
47 Bark-frequency cepstral coefficients. By editing the analysis sub-patch, additional features can be included. Figure
6 shows speech grains plotted in a space where values of
the second and third BFCCs are mapped to the x and y axes
respectively. RGB color can be mapped to any available
features as well.
Mousing over a point in the space plays back its appropriate grain, enabling exploration aimed at identifying
regions of timbral similarity. The upper left region of figure 6 contains a grouping of "sh" sounds, while the central
lower region contains a cluster of "k" and "ch" grains. Other
phonemes can be located as well. In order to explore dense
regions of the plot, keyboard navigation can be enabled to
zoom with respect to either axis (or both simultaneously),
and move up, down, left, or right in the space.
Figure 7. 2400 string grains mapped with respect to amplitude and fundamental frequency.
Figure 7 shows a plot of string sample grains mapped according to RMS amplitude and fundamental frequency. Because the frequencies in this particular sound file fall into
discrete pitch classes, its grains are visibly stratified along
the vertical dimension.
Mapping is achieved by recovering features from timbrelD's database with the "featurelist" message, which is
sent with a database index indicating which instance to report. The feature list for the specified instance is then sent
out of timbrelD's fifth outlet, and used to determine the instance's position in feature space.
5. CONCLUSION
This paper has introduced some important features of the
timbrelD analysis/classification toolkit for Pd, and demonstrated its adaptability to four unique tasks. Pd external
source code, binaries, and the example patches described
above are all available for download at the author's website: www.williambrent.com. The remaining patches in the
example package-a cepstrogram plotting interface and a
percussion classification system that identifies instruments
immediately upon attack-were not described. The example patches are simple in some respects and are intended to
be starting points that can be expanded upon by the user.
Future development will be focused on adding new features to the set of feature extraction objects, implementing
a kD-tree for fast searching of large databases in order to
make concatenative synthesis more efficient, and developing
strategies for processing multiple-frame features of different
lengths in order to compare sounds of various durations.
6. REFERENCES
[1] W. Brent, "Cepstral analysis tools for percussive timbre
identification," in Proceedings of the 3rd International
Pure Data Convention, Sio Paulo, Brazil, 2009.
[2] J. Bullock, "Libxtract: A lightweight library for audio
feature extraction," in Proceedings of the International
Computer Music Conference, 2007.
[3] M. Casey and M. Grierson, "Soundspotter/remix-tv:
fast approximate matching for audio and video performance," in Proceedings of the International Computer
Music Conference, Copenhagen, Denmark, 2007.
[4] G. Holmes, A. Donkin, and I. Witten, "Weka: a machine learning workbench," in Proceedings of the second Australia and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia, 1994,
pp. 357-361.
[5] S. K6nig, http://www.popmodernism.org/scrambledhackz.
[6] M. Puckette, T. Apel, and D. Zicarelli, "Real-time audio
analysis tools for pd and msp," in Proceedings of the
International Computer Music Conference, 1998, pp.
109-112.
[7] D. Schwarz, G. Beller, B. Verbrugghe, and S. Britton,
"Real-time corpus-based concatenative synthesis with
catart," in Proceedings of the COST-G6 Gonference on
Digital Audio Effects (DAFx), Montreal, Canada, 2006,
pp. 279-282.
[8] G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Transactions on Speech and
Audio Processing, vol. 10, no. 5, pp. 293-302, 2002.
[9] X. Zhang and Z. Ras, "Analysis of sound features for
music timbre recognition," in Proceedings of the IEEE
CS International Conference on Multimedia and Ubiquitous Engineering, 2007, pp. 3-8.
229
0