SYNTHBOT: AN UNSUPERVISED SOFTWARE SYNTHESIZER PROGRAMMER

Yee-King, Matthew; Roth, Martin

« Prev Next »

ï~~Another key question is how to make these systems usable for musicians. With the exception of [4], the software described in the literature is not publicly available. Where it is available, it does not interoperate with other music software in a way that is appealing to musicians, for example using MIDI or OSC. There is one commercially available system - the Nord Modular G2 patch mutator [10], but this is an interactive GA where the fitness measure is dictated by the user. In the present work, both of these questions are addressed. Firstly, Mel Frequency Cepstrum Coefficients are used as audio feature vectors, forming the core of the fitness function. The MFCC is considered a suitable measure as it is pitch independent and based on a perceptual model. It is well established in speech recognition [5] and more recently it has been used as a measure of musical similarity [2]. Secondly, a DSP plug-in architecture familiar to many musicians, Steinberg's VST is used to allow our system to access any VSTi compatible synthesizer for the synthesis engine. The computer music software that has been developed falls in the category of "composer's assistant". 2. IMPLEMENTATION 2.1. Overview SynthBot is an unsupervised programmer for software synthesizers. It takes as input a target sound file and a software synthesizer, and returns the set of parameters for the synthesizer which produce as similar a sound to the target as possible. MFCCs are used to evaluate sounds similarly to the human ear, and the inverse sum squared error between the target and candidate MFCCs is used to determine the candidate fitness. The application is primarily implemented in the Java programming language, allowing for rapid prototyping and simple GUI development. The first incarnation of SynthBot works only with VSTi software synthesizers. VSTi plug-ins are written in C++ and compiled natively. They are represented as bundles in Mac OS X, dynamically linked libraries (DLLs) in Windows, and shared objects (SOs) in UNIX/Linux. The SynthBot interface for VSTi plug-ins is thus necessarily also written in C++, as the software synthesizer libraries are only available in that language. A Java Native Interface (JNI) wrapper exposes native VSTi functionality to Java and to the bulk of the SynthBot logic. In this way, SynthBot acts as a VST host, able to control and communicate with VST synthesizers. There are no other known Javabased VST host. The MFCCs are computed using a custom library called BoomMFCC, which is also a native library written in the C programming language and made available to SynthBot via a JNI interface wherein the FFTW library [6] is used to compute discrete fourier transforms. BoomMFCC is optimized for computing MFCCs in multithreaded envi ronments allowing SynthBot to improve performance on modern multicore processors. BoomMFCC was developed following the trial of available MFCC libraries such COMIRVA [11] and LibXtract [3]. The use of BoomMFCC allows increased performance of almost two orders of magnitude over the latter packages. Over the course of development and testing various methods, MFCC computation times were reduced from about 250 milliseconds to roughly 5ms on a modern machine. Performance figures depend heavily on MFCC parameters and implementation details. 2.2. Sound Synthesis Since the system is compatible with any VSTi plug-in, it is capable of working with any sound synthesis algorithm available in this plugin format. In the evaluations presented here, the freely available mda synthesizer plug-ins mdaDXO10 and mdaJXO10 [9] were used. The mdaDX10 is a single modulator FM synthesizer with 16 paramters and the mdaJX10 is a subtrative synthesizer with 2 tuned and one noise oscillator and 40 parameters. 2.3. Parameter Search In the VSTi standard, each synthesizer parameter is represented as a real number (a float) between zero and one, inclusive. Modern synthesizers may have hundreds of parameters, making the search space high dimensional. Even basic characteristics of search space, such as continuity or variance, are unknown. A stochastic search algorithm is used in order to effectively search for the best parameters. In the present case, this is a genetic algorithm (GA), though others are possible, such as particle swarm optimization (PSO) or simulated annealing (SA). The GA population begins in a random state. Each individual of the population is represented as an array with length equal to the number of parameters of the synthesizer. Individuals are assessed by loading their parameters into the synthesizer and generating a candidate sound with the same length as the target by passing a fixed MIDI note on message into the synthesizer. The MFCCs of the candidate are computed and the reciprocal of the square distance (sum squared error) to the target MFCCs is used to characterize its fitness. A fitness based proportional roulette wheel as described in [13] is generated and used to select the individuals who will be able to contribute to the next generation. Fitter individuals are more likely to contribute. Crossover occures between each pair of chosen individuals by exchanging subarrays at a uniformly randomly chosen crossover point. Each parameter of the resultant arrays are then mutated by adding a gaussian random variable with zero mean and variance of 0.05. As parameters are constrained to between zero and one, such a mutation will be minor most of the time, and naturally allowing for larger and rarer deviations. 3. EVALUATION SynthBot has been evaluated in two ways. Firstly, is has been evaluated technically to assess its search behaviour

« Prev Next »