Page  356 ï~~A real-time singing voice analysis/synthesis system Peter Pabon Royal Conservatory Institute for sonology Juliana van Stolberglaan 12595 CA Den Haag Holland aun4nl 1 koncon I Abstract This paper involves an interactive computer/software environment for the parametric modelling of the singing voice. The system combines a real-time voice synthesis option with a real-time voice analysis option. The synthesizer and the analyser are running at the same time on one Motorola DSP 96000 processor. The setup is developed for both performance and pedagogical objectives. The system can be used for choral synthesis, voice recognition, voice transformation, pitch tracking, as well as for singing voice training, voice evaluation, voice classification, research and demonstration.. If the voice is used to control a process, such as sound synthesis, then efficient schemes exist for measuring simple parameters like pitch and level. For these two parameters the relations between percept and physical parameters is rather straightforward and therefore the "intuitive control" is usually correct, but this is still a very elementary application. As soon as we want vowels or even more complex voice features as controls, we soon run into problems. More elaborate detection schemes are needed to recognize these these qualities efficiently. Moreover, features like formant frequencies are extremely variable. For instance, natural shifts as a function of pitch and loudness, although not perceived as a vowel change, can still represent large changes in traced formant values. Even if some physically defined parameter is correctly traced, it can still conflict with our intuition/expectation. Usually, a rather precise analysis, that is an exact modelling of many features together, is needed before efficient controls can be defined. In general, we need first to be complete before we can be specific, and in fact, the efficient interpretation of the output of a musical instrument is synonymous with modelling that instrument. The analyzer extracts in real-time a series of voice parameters: starting with fundamental frequency (FO), and sound pressure level (SPL), next a series of F0/SPL-related perturbation/noise (time-domain) parameters are calculated, followed by a period-by-period spectral analysis. This ensemble of parameters together aranges a rather complete summary of the vocal quality. The synthesizer is of the source-filter type. A four-parameter functional source model is followed by an eight-formant resonance filter. Additional inputs allow noise sources to interfere/perturb on several stages of the synthesis chain. In our setup, both analyzer and synthesizer are closely integrated at all levels. The synthesis is done on a periodby-period basis but also the analysis is done on a period-by-period basis. This specific choice gives us the option to transform quality features at the period leveL This allows very subtle voice (quality) transformations. The system has two basic modes: synthesis-after-analysis and analysis-after-synthesis. The first mode supports options like voice transformation and voice recognition. The last mode is implemented to check out the system and to improve on the model. Due to the direct coupling, the quality of the analyser is directly reflected by that of the synthesizer (and the other way around). Moreover, we can always get precise numbers for the synthesis in analysis parameters, and vice-versa. This allows the user to experiment with the translation of specific vocal qualities, but the immediate checks also help in the development of a feasible musical performance application. (This paper will be provided as addenda upon receipL (editor)) Audio Analysis and Re-Synthesis 356 ICMC Proceedings 1994