Cluster-Weighted Sampling for Synthesis and Cross-Synthesis of Violin Family Instruments Bernd Schoner, Chuck Cooper* Neil Gershenfeld Physics and Media Group MIT Media Laboratory 20 Ames Street Cambridge, MA 02139 {schoner,cmc,gersh}@media.mit.edu Abstract A digital model of a violin/cello is presented that computes violin sounds in real time given gesture input from a violinist. The model is derived from recorded data using clusterweighted sampling, a probabilistic inference technique based on kernel density estimation and wavetable synthesis. Novel interface technology is presented that records gesture input of acoustic instruments and also serves as a standalone controller for digital sound engines. 1 Introduction In this paper, we show how to synthesize the sound of violin-family instruments from a player's gestural input. The input-output model describes the mapping between physical input time series and a perceptual parameterization of the output time series. It is trained on a recorded data set consisting of gesture input data of the violinist, such as bow and left hand finger position, along with audio output data. The audio signal is reconstructed from the estimated parameterization and from audio samples stored in memory. The probabilistic network architecture clusterweighted modeling (CWM) was developed earlier to model, characterize, and predict input-output time series. For the purpose of synthesizing musical signals, we have extended the framework to cluster-weighted sampling (CWS), which generates sound output directly from the sensor input. CWS uses pre-sampled sounds as a basis for function approximation. In the training step, brief sound segments that best represent a certain playing situation are selected. In the synthesis step, the segment that is most likely given the current input data is used as sound output. *Present address: Plangent Systems Corporation, 25 Fairfield St., Newton, MA 02460. Our approach to musical synthesis lies conceptually in between physical modeling and global wavetable synthesis. We do not model the instrument in terms of first principle governing equations, but it is our goal to build a model that, from the point of view of a listener or player, appears to obey the same physical laws as the acoustic instrument. On the other hand, we choose to represent the audio data in terms of sequences of samples. 2 Previous and related work [SCDG99] showed how to synthesize violin sounds in a data-driven machine learning approach. Given a sinusoidal sound representation, the inference technique CWM was used to describe the mapping from real-time player input to a parameterization of relevant harmonic components of violin sound. The frequency domain parameters were translated into real-time sound through additive synthesis. CWM is a mixture density estimator for function approximation. The framework has been discussed in detail in [SCDG99, GSM99]. Here we extend the general framework to create cluster-weighted sampling (CWS), a sampling (wavetable) synthesis approach that uses the probabilistic structure of CWM to allocate and parameterize samples. Wavetable synthesis has become the predominant synthesis technique for commercial digital instruments [Mas98]. [Roa95] and [Mas98] provide extensive reviews of the fundamental algorithms as well as the ad hoc techniques behind wavetable synthesis. The technique works well for instruments with low dimensional control space and without aftertouch, such as the piano. However, the input space for instruments like the violin tends to be too complex to cover all the possible outputs with sampled data. In this paper, we demonstrate how to parameterize the audio efficiently and reconstruct sound from samples and estimated parameters. 0
Top of page Top of page