Cluster-Weighted Sampling for Synthesis and Cross-Synthesis of Violin
Family Instruments
Bernd Schoner, Chuck Cooper* Neil Gershenfeld
Physics and Media Group
MIT Media Laboratory
20 Ames Street
Cambridge, MA 02139
{schoner,cmc,gersh}@media.mit.edu
Abstract
A digital model of a violin/cello is presented
that computes violin sounds in real time given
gesture input from a violinist. The model
is derived from recorded data using clusterweighted sampling, a probabilistic inference
technique based on kernel density estimation
and wavetable synthesis. Novel interface technology is presented that records gesture input of
acoustic instruments and also serves as a standalone controller for digital sound engines.
1 Introduction
In this paper, we show how to synthesize the sound of
violin-family instruments from a player's gestural input. The input-output model describes the mapping
between physical input time series and a perceptual parameterization of the output time series. It is trained
on a recorded data set consisting of gesture input data
of the violinist, such as bow and left hand finger position, along with audio output data. The audio signal is
reconstructed from the estimated parameterization and
from audio samples stored in memory.
The probabilistic network architecture clusterweighted modeling (CWM) was developed earlier to
model, characterize, and predict input-output time series. For the purpose of synthesizing musical signals, we
have extended the framework to cluster-weighted sampling (CWS), which generates sound output directly
from the sensor input. CWS uses pre-sampled sounds
as a basis for function approximation. In the training
step, brief sound segments that best represent a certain
playing situation are selected. In the synthesis step, the
segment that is most likely given the current input data
is used as sound output.
*Present address: Plangent Systems Corporation, 25 Fairfield
St., Newton, MA 02460.
Our approach to musical synthesis lies conceptually
in between physical modeling and global wavetable synthesis. We do not model the instrument in terms of
first principle governing equations, but it is our goal to
build a model that, from the point of view of a listener
or player, appears to obey the same physical laws as
the acoustic instrument. On the other hand, we choose
to represent the audio data in terms of sequences of
samples.
2 Previous and related work
[SCDG99] showed how to synthesize violin sounds in
a data-driven machine learning approach. Given a sinusoidal sound representation, the inference technique
CWM was used to describe the mapping from real-time
player input to a parameterization of relevant harmonic
components of violin sound. The frequency domain parameters were translated into real-time sound through
additive synthesis.
CWM is a mixture density estimator for function approximation. The framework has been discussed in detail in [SCDG99, GSM99]. Here we extend the general
framework to create cluster-weighted sampling (CWS),
a sampling (wavetable) synthesis approach that uses
the probabilistic structure of CWM to allocate and parameterize samples.
Wavetable synthesis has become the predominant
synthesis technique for commercial digital instruments
[Mas98]. [Roa95] and [Mas98] provide extensive reviews
of the fundamental algorithms as well as the ad hoc
techniques behind wavetable synthesis. The technique
works well for instruments with low dimensional control space and without aftertouch, such as the piano.
However, the input space for instruments like the violin
tends to be too complex to cover all the possible outputs with sampled data. In this paper, we demonstrate
how to parameterize the audio efficiently and reconstruct sound from samples and estimated parameters.
0