Page  315 ï~~A NEW DIGITAL SYSTEM FOR SINGING SYNTHESIS ALLOWING EXPRESSIVE CONTROL Gunilla Carlsson, Sten Ternstrm, Johan Sundberg & Tamas Ungvary Dept. of Speech Communication and Music Acoustics, KTH, Box 700 14, S-100 44 Stockholm, Sweden Abstract An analog singing synthesizer, MUSSE, an important tool for the research on singing, was built in our department in 1976. Recently we have completed the construction of a new, digital version of MUSSE. The synthesizer can be controlled either by on-line controllers or from a score-file complemented with lyrics, chords, and phrase markers. The score-file is processed by a system of context-dependent pronunciation rules producing reasonably natural-sounding consonants and vowels. In addition, the system contains a set of rules providing expressive deviations from the score depending on the musical context. MUSSE can also be controlled interactively from the computer console, using graphical emulations of a control panel and a musical keyboard, or from external MIDI devices. A Sentograph ad modum Clynes, i. e., a touch-controlled input device sensitive to force in three dimensions, has proven particularly useful for attaining expressive sound effects. Introduction Singing synthesis is an important tool for examining the perceptual relevance of acoustic characteristics of musical sounds, such as singing. Over the years, an analog singing synthesizer, MUSSE (Music and Singing Synthesis Equipment) has proven to be very useful in our research [1],[2]. The first analog synthesizer has recently been succeeded by a new digital MUSSE. This synthesizer resides in a portable laptop computer and the realtime computation of the sound wave is performed by a TMS 320C30 floating-point signal processor. Digital filters are used to model the resonances of the vocal tract, the so-called formants. The new singing synthesis model combines experiences from the previous analog singing synthesizer and the speech synthesis model constructed by the speech group in our department [3]. The MUSSE synthesizer can be controlled interactively from a graphical panel or keyboard on the computer screen, from external MIDI devices such as the Sentograph, or from a score-file processed by pronunciation and performance rules. ICMC 315

Page  316 ï~~MUSSE SYNTHESIZER FUNCTIONAL OVERVIEW GLOTTAL FORMANT PULSES >FILTERS OUTPUT NOISE NOISE FILTERS "-- Fig.] The singing synthesis model for the new digital MUSSE synthesizer. It has a vowel branch (upper) with 7 serial formant filters and a fricative branch (lower) with two filters for consonants. The shape of the glottal pulses can be controlled. Interactive Control Individual synthesis control parameters can be changed manually from a panel on the computer screen, programmed in the Microsoft Windows environment. There is also a facility for exploring any two parameters pairwise in a 2D-diagram. An interesting use of the 2D-diagram is to change vowel quality by altering the first two formants. By changing the upper formants, the personal voice quality can be altered. For instance, the center frequency of formants 3, 4 and 5 (the so-called singer's formant) is a useful indication of the voice timbre [4]. For the vibrato, a sinusoidal waveform is used by default, but any other waveform is applicable. Both amplitude and frequency of the vibrato can be controlled. Measurements on nonprofessional singers have revealed small random changes in frequency, called flutter [5]. Such frequency instability is also modeled, for synthesis of non-operatic singing. The idea of letting the touch-sensitive Sentograph control the MUSSE synthesizer has been introduced by Tamas Ungvary. The original Clynes design [6], which senses the horizontal and vertical components of finger force (often referred to as "touch" or "pressure" in keyboard jargon), has been improved at the Dept. of Psychology, Uppsala University, Sweden. The new design allows independent encoding of finger force in all three dimensions. The three output voltages of the Sentograph are patched into a Fadermaster [7], a control device with eight sliders, modified so as to convert the input voltages into MIDI signals. The modified device transmits MIDI data when the sliders are moved and when it receives a changing input voltage from the connected Sentograph. The transmitted data can control any parameters of the singing synthesis. The effects can be very expressive. Rule Control For synthesizing a song with lyrics, a great number of control parameters must be ICMC 316

Page  317 ï~~changed continuously. This would be impossible to achieve by hand. Instead, notes, lyrics, chords, and phrase markers can be written into a score-file, which is processed by a system of rules [8],[9]. The resulting control parameters are fed to the MUSSE synthesizer. The tool for developing this system results from fruitful cooperation with Bjom GranstrOm and Rolf Carlson of the speech group at our department, who adapted their text-to-speech conversion environment, RULSYS, to a note-to-tone system for music [10]. The rules are context-dependent, thus enabling coarticulation. We have used analysis-by-synthesis to develop a rule system producing expressive deviations in synthesized performance. The rules have been formulated in collaboration with Prof. Lars Fryd6n, a professional violinist. Thus, he has taught MUSSE to sing more musically, in much the same way that he would teach a musician. The rules not only modify nominal values of properties such as note duration and amplitude but also, in case of pronunciation rules, change formant frequencies. SINGING SYNTHESIS BY RULE Score - words and music J ~Rule system (RULSYS) Control parameters I <~ MUSSE synthesizer [Singing voice Fig. 2 Note-to-tone conversion of the singing synthesis using rules Choir Synthesis Even though RULSYS and the MUSSE synthesizer are monophonic, they may also be used to explore aspects of choral sounds, by mixing several synthetic voices. An experiment was recently made in which choral experts were asked to adjust pitch and formant scatter in choir sounds to their preferred and tolerated levels [11]. In order to synthesize polyphonic singing with lyrics, the score is passed through two rule systems. First, musical rules and part synchronization are applied by the RULLE program [12]. RULLE contains no phonetic rules but supports polyphony and can generate score files for input to RULSYS. Then RULSYS, stripped of musical rules, applies phonetic rules to the text, with timing constrained to the values computed by RULLE. The individual voices are synthesized one at a time and finally mixed into an ensemble. ICMC 317

Page  318 ï~~Acknowledgment This work has been supported by the Bank of Sweden Tercentenary Foundation. References [1] Sundberg J. (1987): The Science of the Singing Voice. Dekalb, Illinois: Northern Illinois University Press. [2] Larsson, B. (1977): "Music and singing synthesis equipment, (MUSSE)." Speech Transmission Laboratory, Quarterly Progress and Status Report, 1/1977, pp. 38-40. [3] Carlsson, G., Neovius, L. (1990): "Implementations of synthesis models for speech and singing." Speech Transmission Laboratory, Quarterly Progress and Status Report, 2 -3/1990, pp. 63-67. [4] Dmitriev, L., Kiselev, A. (1979): "Relationsship between the Formant Structure of Different Types of Singing Voices and the Dimensions of Supraglottic Cavities." Folia phoniat. 31: pp. 238-241. [5] TemstrOm, S., Friberg, A. (1989): "Analysis and simulation of small variations in the fundamental frequency of sustained vowels," Speech Transmission Laboratory, Quarterly Progress and Status Report, 3/1989, pp. 1-14. [6] Clynes, M. (1980): "The Communication of Emotion; Theory of Sentics." in. Emotion, Theory, Research and Experience. Volume 1: Theories of Emotion, R.Plutchik and H. Kellerman eds.; New York, Academic Press. [7] Lennard V. (1989): "Fadermaster." Music Technology, October 1989. [8] Sundberg J. (1989): "Synthesis of Singing by Rule," in Mathews, M. & Pierce, J. (ed.), Current Directions in Computer Music Research, MIT Press, Massachusetts. [9] Carlsson, G. (1988): "The KTH program for synthesis of singing," Thesis work at. Dept. of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm, Sweden. [10] Carlson, R. & Granstrdm, B. (1975): "A phonetically oriented programming language for rule description of speech." Speech Communication, (G. Fant, ed.), Vol. 2, pp. 245-253. Stockholm: Almqvist & Wiksell. [11] Ternstrim, S. (1991): "Subjective evaluation of voice scatter in unison choir sounds", Speech Transmission Laboratory, Quarterly Progress and Status Report (forthcoming). [12] Friberg, A. (1991): "Generative Rules for Music Performance: A formal description of a rule system," Computer Music Journal 15/2, pp.56-71. ICMC 318