ï~~Another key question is how to make these systems usable for musicians. With the exception of [4], the software described in the literature is not publicly available.
Where it is available, it does not interoperate with other
music software in a way that is appealing to musicians,
for example using MIDI or OSC. There is one commercially available system - the Nord Modular G2 patch mutator [10], but this is an interactive GA where the fitness
measure is dictated by the user.
In the present work, both of these questions are addressed. Firstly, Mel Frequency Cepstrum Coefficients
are used as audio feature vectors, forming the core of the
fitness function. The MFCC is considered a suitable measure as it is pitch independent and based on a perceptual
model. It is well established in speech recognition [5] and
more recently it has been used as a measure of musical
similarity [2]. Secondly, a DSP plug-in architecture familiar to many musicians, Steinberg's VST is used to allow
our system to access any VSTi compatible synthesizer for
the synthesis engine.
The computer music software that has been developed
falls in the category of "composer's assistant".
2. IMPLEMENTATION
2.1. Overview
SynthBot is an unsupervised programmer for software synthesizers. It takes as input a target sound file and a software synthesizer, and returns the set of parameters for the
synthesizer which produce as similar a sound to the target as possible. MFCCs are used to evaluate sounds similarly to the human ear, and the inverse sum squared error
between the target and candidate MFCCs is used to determine the candidate fitness. The application is primarily
implemented in the Java programming language, allowing
for rapid prototyping and simple GUI development.
The first incarnation of SynthBot works only with VSTi
software synthesizers. VSTi plug-ins are written in C++
and compiled natively. They are represented as bundles in
Mac OS X, dynamically linked libraries (DLLs) in Windows, and shared objects (SOs) in UNIX/Linux. The SynthBot interface for VSTi plug-ins is thus necessarily also
written in C++, as the software synthesizer libraries are
only available in that language. A Java Native Interface
(JNI) wrapper exposes native VSTi functionality to Java
and to the bulk of the SynthBot logic. In this way, SynthBot acts as a VST host, able to control and communicate
with VST synthesizers. There are no other known Javabased VST host.
The MFCCs are computed using a custom library called
BoomMFCC, which is also a native library written in the
C programming language and made available to SynthBot
via a JNI interface wherein the FFTW library [6] is used
to compute discrete fourier transforms. BoomMFCC is
optimized for computing MFCCs in multithreaded envi
ronments allowing SynthBot to improve performance on
modern multicore processors. BoomMFCC was developed following the trial of available MFCC libraries such
COMIRVA [11] and LibXtract [3]. The use of BoomMFCC allows increased performance of almost two orders
of magnitude over the latter packages. Over the course of
development and testing various methods, MFCC computation times were reduced from about 250 milliseconds to
roughly 5ms on a modern machine. Performance figures
depend heavily on MFCC parameters and implementation
details.
2.2. Sound Synthesis
Since the system is compatible with any VSTi plug-in, it
is capable of working with any sound synthesis algorithm
available in this plugin format. In the evaluations presented here, the freely available mda synthesizer plug-ins
mdaDXO10 and mdaJXO10 [9] were used. The mdaDX10 is
a single modulator FM synthesizer with 16 paramters and
the mdaJX10 is a subtrative synthesizer with 2 tuned and
one noise oscillator and 40 parameters.
2.3. Parameter Search
In the VSTi standard, each synthesizer parameter is represented as a real number (a float) between zero and
one, inclusive. Modern synthesizers may have hundreds
of parameters, making the search space high dimensional.
Even basic characteristics of search space, such as continuity or variance, are unknown. A stochastic search algorithm is used in order to effectively search for the best
parameters. In the present case, this is a genetic algorithm
(GA), though others are possible, such as particle swarm
optimization (PSO) or simulated annealing (SA).
The GA population begins in a random state. Each individual of the population is represented as an array with
length equal to the number of parameters of the synthesizer. Individuals are assessed by loading their parameters into the synthesizer and generating a candidate sound
with the same length as the target by passing a fixed MIDI
note on message into the synthesizer. The MFCCs of the
candidate are computed and the reciprocal of the square
distance (sum squared error) to the target MFCCs is used
to characterize its fitness. A fitness based proportional
roulette wheel as described in [13] is generated and used
to select the individuals who will be able to contribute to
the next generation. Fitter individuals are more likely to
contribute. Crossover occures between each pair of chosen individuals by exchanging subarrays at a uniformly
randomly chosen crossover point. Each parameter of the
resultant arrays are then mutated by adding a gaussian random variable with zero mean and variance of 0.05. As
parameters are constrained to between zero and one, such
a mutation will be minor most of the time, and naturally
allowing for larger and rarer deviations.
3. EVALUATION
SynthBot has been evaluated in two ways. Firstly, is has
been evaluated technically to assess its search behaviour