/ Analysis-by-Performance: Gesturally-Controlled Voice Synthesis as an Input for Modelling of Vibrato in Singing
ï~~Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July- 5 August 2011 ANALYSIS-BY-PERFORMANCE: GESTURALLY-CONTROLLED VOICE SYNTHESIS AS AN INPUT VIBRATO SINGING Nicolas d'Alessandro Media and Graphics Interdisciplinary Centre University of British Columbia, Vancouver, BC nda@magic. ubc.ca Thierr Dutoit Institute for New Media Art Technology M University of Mons, Belgium Un thierry.dutoit@umons.ac.be ABSTRACT In this paper we introduce Analysis-by-Performance, a new methodology for addressing several signal modelling issues. This approach studies the gestural behaviour of a performer, while he/she is imitating a given sound effect with an appropriate digital musical instrument. New insights observed in the performing gestures eventually lead to a new sound production model. The Analysis-byPerformance technique is applied to the study of vibrato in singing. Indeed for several years the HANDKSETCH digital instrument gave performers the ability to imitate vibrato in singing in a highly natural and expressive way. Results from gestural analysis of the HANDKSETCH practice are presented and a new vibrato model based on glottal flow parameters is proposed. 1. INTRODUCTION As any other model-based engineering application, parametric sound synthesis faces two main issues, related to modelling and parameter estimation. Modelling typically implies to make assumptions about the sound production mechanism. Estimating the parameters of the model requires tuning algorithms to match the output of the model to recorded waveformins in the time and spectral domains. Then comes the classical design trade-off between modelling and estimation eTrrors: the finer the model is, the higher the number of parameters to be tunted, and the more chances to fail estimating these parameters correctly. Voice modelling is a particularly critical case for which appropriate tuning of the sound production model makes a huge difference on synthesis results. Indleed vocal sounds result from a complex coupling between mechanical and aerodynamical oscillations [8]. Most of the actual laryngeal behaviour is still really difficult to observe both on the vocal apparatus [11] and recorded audio signals [3]. Consequently, voice modelling has been addressed in two very different ways. On the one hand, spectral models and concatenative synthesis produce high quality sounds but Christophe Ooge Institute for New Media Art Technology University of Mons, Belgium christophe.ooge@umons.ac.be Sidney Eels [edia and Graphics Interdisciplinary Centre iversity of British Columbia, Vancouver, BC ssfels@ece.ubc. ca are usually detached from any refined laryngeal description, making the integration of expressive control difficult. On the other hand, articulatory synthesis gives full access to physiological properties but tuning complexity has a significant impact on the overall synthesis quality. In this paper we introducIe a new approach to signal modelling, called Analysis-by-Performance. We present it as the extension of the well-known Analysis-by-Synthesis technique [1] to a performative level. If a performer can convincingly imitate a given sound effect on an appropriate digital musical instrument, we assume that observing the performed gestures brings new insights in the understanding of the studied sound effect. Then these new insights eventually lead to proposing a new production model. We explain this new approach in Section 2. In order to assess our methodology, we take the modelling of vibrato in singing as a case study. Vibrato is a sustained vibrating quality of the sound encountered in a large amount of musical instruments. In singing vibrato is achieved by a complex laryngeal modulation [2]. Consequently the fine tuning of vibrato in singing synthesis suffers from the above-mentioned voice modelling issues. Section 3 gives a more detailed background on the modelling of vibrato in singing and discuss modelling issues. The Analysis-by-Performance concept is motivated by the natural and expressive vibrato imitations that can be achieved on the HANDSKETCH musical instrument, a digital device for performing singing synthesis [6]. Although the HANDSKETCH voice synthesizer is based on a simple source-filter model, the acquired skills of the performer allow him to produce a high-quality vibrato effect. The HANDSKETCH musical instrument and its imitative practice are described in Section 4. Therefore the impact of these performing gestures on low-level glottal source parameters are observed, leading to a new production model for vibrato in singing, as presented in Section 5. 189
Top of page Top of page