Page  00000001 Trends on the synthesis of the singing voice: technical problems and perspectives Anastasia Georgaki Music Department, School of Philosophy National and Kapodistrian University of Athens georgaki @ music. uoa.gr ABSTRACT In the first section of this paper we refer to the various projects on the synthesis of the singing voice, as well as, to new software commercial synthesizers. In the second section we present the most common technical problems in the synthesis and control of the singing voice and in the last part we describe the features and functions of a "cognitif" vocal synthesizer which could be accessible to composers, musicians and musicologists, as a creative tool for composing, performing, and interpreting forgotten voice techniques of various cultures and bygone times. 1. Introduction " Je m'occupe maintenant a trouver la maniere de faire prononcer les syllabes aux tuyaux d'orgue. J'ai desja rencontre les voyelles a, e, o et u mais il me fait bien de la peine, et puis j'ay treuve la syllabe ve et fl. Je ne scay si je paourray prendre le losir de trouver les autres consonnes, a raison des differentes experiences qu'il faut faire sur ce sujet, lesquels estant de coust, je laisseray le reste a ceux qui voudront passer outre." 1 (Marin Mersenne, Musurgia Universalis) Since the late antiquity, scientists have been interested in the conception of automata such as talking machines, focusing their interest on the speaking voice and its utilities; voice has been always the privileged model of automatophons [Tubach J.P., 1989; Georgaki, 1998a]. In the last twenty years, there has been a special research interest on the synthesis of the singing voice; the conception of synthesizers has been quite fruitful, due to the exploitation of 1 I have already tried the method to the vowels a,e,o and u but I need some more, and I have already found the syllable /Ve/ and /Fe/. I don't know if I am free to find out the other consonants because of the different experiments that must be done on this subject, but I will let the rest to those which wish to continue. the data extracted from the specialized analysis of the singing voice, and due to a big effort by scientists to separate the speaking voice from the singing voice by focusing their interest on their acoustical and cultural differences: frequency, displacement of the formants, vibrato, attack, spectral envelope etc. [Benade, 1986; Fant, 1973; Sundberg, 1989]. In this article we would like, on the one hand, to proceed briefly with the presentation and evaluation of the current research in various institutes concerning the synthesis of the singing voice, in order to outline the technical problems and orientation of the research in our days; on the other hand, we would like to propose our ideas about the features of a vocal synthesizer which could be used in various domains of music research, creation and performance. 2. From MUSSE to VOCALOID Since 1980 various projects have been carried out all over the world having as focus point the synthesis of the singing voice and principle language English2 [Chowning, 1981; Sundberg, 1989; Rodet et al, 1984, 1985, 1987,1988; Gael, 1990; Rossitier et al,1994; Berndtsson 1995; Carlsson et al, 1991, Tisato et al, 1991; Cook Perry, 1989, 1993; Depalle et al 1994, Paabon 1994; Lomax 1996; Pierucci et al, 1997; Macon, 1997; Gibson 1998,Meron 1999, H.L-Lu 2002, Bonada and Loscos 2003, Yoongmoo 2003 ]. In fact, every project has its own goals and directions. Among the various projects using a multitude of techniques and rules, concerning the analysis and synthesis of the singing voice, we have selected as a path of our presentation the research projects which tend to have a complete point of view about the synthesis of the singing voice (in order to speak about 'models' which fulfil the expectations of a synthesizer not only from the acoustic point of view but also from the phonetic one). According to our research, we concluded that the most complete vocal synthesizers in the domain of the synthesis of the singing voice are: MUSSE/RULSUS (Sweden, J. Sundberg), CHANT (France, X. Rodet) and SPASM/SINGER (United States, P.Cook) because of reasons such as the original analysis/synthesis technique, the appropriate rules for singing, the interface for musical applications and the performability/utility by the 2 The languages which have been studied on singing synthesis are English, French, Swedish, Italian, Spanish, Japanese. Proceedings ICMC 2004

Page  00000002 users, as well as their prospects [Georgaki, 1998a]. In order to outline the current research on synthesis of the singing voice we would like to make some remarks concerning the advantages and status of every synthesizer, along with its applicability in contemporary music: a) one of the most complete "cognitif" synthesizer is the Mu(s)se/Rulsus [Sundberg, 1987; 1989] as it is equipped with many rules that are describing classical singing. It gives to the user the possibility to produce his own sung phonemes, words and phrases of reasonably good quality, allowing for vocal expressivity [Gael, 1990, Carlsson 1991, Berndtsson, 1995] and it is used also for the synthesis of polyphonic choral singing. One of its cues is the capacity of controlling the fundamental frequency and the formants with more speed and precision than human singers but we expect the amelioration of the control interface in order to be more accessible to composers and musicians. b) other research projects, like Chant [Rodet et al, 1985, 1995] have been oriented towards more artistic applications, equipped with the proper interface and environment, in order to afford software tools to the composers or reconstruct ambivalent voices of the past [Depalle et al,1994]. More precisely, Chant [Rodet et al., 1984] is a software program designed for compositional needs rather than for scientific ones, as it stresses the continuity between sound processing and synthesis by filtering synthetic or real voices and instruments. During the last years, special research on the concatenation of the high quality of vowels obtained by CHANT by the System ABS/OLA3 (Rodet, 2002) suggest the concatenation of transition and consonants units. c) the SPASM/Singer model [Cook, P. 1993} is more advantageous compared with the others because of its proper control environment: it offers the user a subtle control on the parameters of the vocal signal which are related directly to the vocal pedagogy and physiology of speech (tongue's or lips' position, the form of the vocal tract, the vocal effort, etc.) through a user-friendly interface (the form of the vocal tract on the screen). Lastly, SPASM/singer can be controlled by a special interface Squeeze Vox [Cook, 2000] that brings the control of the synthesis of the singing voice closer to the musicians' abilities. d) Despite the importance of the research projects on the singing voice, the majority of these projects are based on the analysis synthesis of the classical singing technique allowing for some exceptions such as [Tisato 1991; Rodet 1985; Kamarotos et al, 1994] who have been studying extra-European voice techniques (Diphonic singing, Thibetan singing, or traditional Greek singing). e) The new projects carried out the during last years in academic institutes propose new ways of singing synthesis4 which is less fastidious than the previous one and exalt the score to singing synthesis as the future of the singing software development [Lomax 1996; Macon, 1997; Meron 1999, H.L-Lu 2002, Bonada and Loscos 2003, Yoongmoo 2003 ]. The first commercial score to singing software synthesizers have appeared only recently, since 2003 (VOKALOID5 by Yamaha and CANTOR6 by Virsyn) and promise the inauguration of a new era. In any case all these projects, and especially the models in which we have been referring to more extensively, differ not only in their synthesis technique or the implemented rules (describing singing) but also in the control interface and the resulting sound (every model has its own particular voice signature). They differ also in the performability of the model and its applications in the computer music field (composition, psychoacoustics, vocal training and education, or a powerful tool for performance). Some acoustic examples will give evidence for these observations. 3. Technical problems in encoding the expressive singing voice How difficult is to imitate the human "Psyche"? (Georgaki, 1998). The unique These new techniques are mostly based on the concatenation: sampled synthesis or sinusoidal models with a MIDI output. 5 VOCALOID uses Frequency-Domain Singing Articulation Splicing and Shaping, a vocal (singing-voice) synthesizing system developed by Yamaha. With this system, the "singing articulations" (collections of voice snippets, such as of phrases, and snippets of vocal expression variations like vibrato) needed to reproduce vocals are collected from custom-produced recordings of accomplished singers and put into a database after conversion into frequency domains. 6 Virsyn presented in the Frankfurt Musikmesse Prolight+sound2004a new 8-part vocal synthesizer, CANTOR(Mac/Win)- (http:/Aiwww.kvrvst.com/get/984.html) - which lets users enter words in English and play them melodically from a MIDI keyboard in real-time. According to the manufacturer, Cantor's Voice editor lets you edit the character of the virtual singer by defining the base spectrum for vowels and consonants. The application also includes a Phoneme editor and offers realtime control over vibrato rate and depth as well as the gender of the singing voice. 3 Analysis-by-synthesis /Overlapp-Add Proceedings ICMC 2004

Page  00000003 acoustic qualities which are the result of a combination of innate physical factors and expressive characteristics of performance, reflect our vocal identity. Even if all voices, out of language restriction are capable of producing the common sounds necessary for understanding and communication, each voice possesses distinctive features independent of phonemes and words. These phonemes, their concatenation, their articulation, their prosody, their variation consist of a special acoustic signal that differs form person to person. The new synthesizer VOCALOID has three singers, Lola, Myriam and Leo. In the future it will be enriched with more singers but will ever these non-human voices obtain the flexibility of a real voice? Which are the obstacles that relate to the synthesis of natural vocal sounds and to their smooth concatenation? 3.1. Vowels, consonants and singing phrases The first and basic technical problems are related with the complexity of the vocal signal and more specifically to: a) the huge quantity of parameters and data, describing the complex voice singing model, related to the incapacity of the machines to elaborate these satisfactorily (for example in order to have an entire command of the French language we must sample about 2200 sound units, just for one type of voice only) [Depalle, 1994].7 b) the specificity of the voice concerning biological functions of the human body (organic and psychological) which affect the timbre, the intensity and articulation of the voice. For example, any aleatoric microvariations due to stress or other factors, influences the periodicity of the vocal signal. [Fonagy 1987; Rodet 1994]. c) the fact that every language has its proper phonetic rules and phonemes renders the creation of an international phonetic database very difficult (a big vocabulary of phonemes and diphones in several languages and an interdisciplinary connection between them), and for evident reasons prevents the commercialisation of a vocal synthesizer. 7 This is one of the major cues for differentiating the vocal signal from an ordinary instrumental signal, as the voice is closely connected to the human being, not only from the physiological point of view but also from the acoustic one. In other acoustic signals, it is not necessary, in the same detailed manner to describe the formant trajectories or the microvariations of the signal (which, in the case of voice, are related very closely with the biological function of the vocal apparatus). 3.1. From Formant synthesis to the concatenation method Apart from these general technical problems which relate to the conception of a 'universal'8 synthesizer let us focus our interest on the current scientific research and its problems. a) Researchers are still experimenting with several methods of analysis-synthesis in order to find the proper combination of techniques that could better describe the vocal signal during singing and could be adapted in musical applications (Rodet, 2002). For the moment researchers haven't yet achieved the refinement of the techniques concerning the analysis-synthesis of the vocal signal, and in some cases they either try to establish new techniques addressed to some particularities of the vocal signal, or they use existing classical sound synthesis techniques. For example one of the techniques of estimating the spectral envelope, although it allows the precise extraction of formants of high frequencies of female voices, poses many difficulties in the level of analysis [X. Rodet, 1992]. From the other hand we are confronted with problems of non-linearity of the vocal signal [P.Cook, 1996]. b) The other technical problem is the insuffiency of data extracted by analysis, concerning the nature of the vocal signal, the behaviour of the vocal cords and the resonators and more specially in the case of physical modelling, the dynamic movement of the tongue and the articulators, the air flux in the vocal tract (interior form of the vocal tract, tongue and articulators, vocal cords) or the facial muscles which help the expression during singing. [Cook, 1996]. c) The unit selection and concatenation method9 is a new promising method and is founded on the use a large database of recordings of a singing voice segmented into units of one to several successive phones. For a given score with the corresponding lyrics, a sequence of phones aligned on notes is to be produced, along with the specific characteristics o the performance (pitch, duration, timbre, etc.) (Rodet, 2002). 3.1.2. The singing vowels and consonants 8In the words of Feruccio Busoni: L'esthbtique musical, Minerva, Paris, 1989:..."Louons les novateurs et les lib6rateurs, si mince soit leur pouvoir. Car quelle serait la machine, que les hommes d6couvriraient et feraient fonctionner, qui ferait retentir des milliers des voix? Oi est et sera jamais la technique qui ferait jouer les mille registres de l'orgue universel?" 9 This method was firstly been applied on speech synthesis Proceedings ICMC 2004

Page  00000004 After the fastidious study of some vocal models (or techniques of synthesis) and especially of the models SPASM, CHANT and MUSSE [Georgaki, 1998a] we can assume that:a) the synthesis of vowels is very satisfactory (especially in intermediate pitches) even if we perceive sometimes a synthetic attribute due to the trajectories of formants. The only problem that has to be solved, concerning the vowels, is the dynamic simulation of the signal and, especially, the control of the transition from one vowel to another. b) The synthesis of consonants, static or dynamic, remains a problem to be solved by scientists, because of the complexity of the signal itself. Consonants are signals which evolve very quickly in time and the pertinent tools for their analysis haven't been perfected yet. We cannot preserve the notion of formant trajectory for the control of a consonant; the only solution is that during the emission of a consonant we extend the formants corresponding in every vowel concerned. The formants models (e.g. Chant) poses many problems in this case, and the researchers are trying to find the solution for the control of the trajectories of formants related to the consonants. The most difficult consonants to be synthesized are the fricatives and plosives (b, d, g). The sound consonants like (f, s, ch) are noisy and in order to synthesize them, researchers may calculate and use aleatoric or pseudo-aleatoric numbers by multiplication, testing the distribution energy [Depalle, 1996]. The result of the synthesis depends also on the conditions and quality of recording and the evolution of quick-rapid formants in time. The next step of our discussion on the technical problems of the singing synthesis, is the adjunction between consonants and vowels in singing phrases as also the way that energy is distributed between diphones which determine the meaning of a word [Zera and al 1984; Rodet and al, 1988 ] 3.1.3. Adjunction between vowels and consonants in singing phrases Now, in order to synthesize a word, or a complete singing phrase (in the case of singing the problem of intonation is already defined by melody), apart from the transition problems (consonant-vowel) or from one diphone to the other, we would like also to underline two other major problems: a) the control of the evolution of the fundamental frequency. In order to describe the passage from one note to another (despite the problems that may be posed by the consonants) we must take care not only of the formant trajectories but also of the vibrato rate which determines along with jitter10 the naturalness of the vowels. If we resolve the problem of trajectories between the phonemes we will be able to interpolate very easily between different registers. b) We cannot ignore of course the affinities of the formants' variations which affect the quality of the synthesized singing phrase because of: -the unique identity of formants for every individual -the constant evolution of the articulation during singing speech -the dependence of the formant frequency to the fundamental frequency and the intensity [Fant, 1973; Sundberg, 1987]. -the dependence of the transfer function to the dimension of the vocal tract [Fant, 1973; Sundberg, 1987] Regarding musical applications, the need imposed by composers for the construction of quasi-vocal timbers, must urge researchers to invest more on the utilisation and control of formant bands for the voiced sounds, and of the noisy sounds for the consonants. 3.2. Controlling voice timbers and singing phrases Researchers are still preoccupied with problems concerning the control of a synthesized singing phrase; a) on a first level, the most important element that they try to assume by manipulating parameters in the frequency domain (frequencies, bandwidths, amplitudes of formants) is that the control must be coherent with the direct perception. b) on a second level researchers are preoccupied with rules of evolution of the formants, rules of interactions between the parameters or questions concerning the reception of control signals from physical peripherals, in order to introduce them into the synthesizer. c) on a third level they try to ameliorate the control of the oscillation of the glottis, of respiration muscles and other acoustic mechanisms. Researchers must also construct rules related to the vocal cords' tension during singing, as also rules that describe the interaction between vocal effort (muscles, cords etc.) and vocal result (articulators). Jitter is a non-periodic modulation of the phonation frequency which is referred to the aleatoric variations of the waveform. The vocal jitter is minimal in a professional singer's voice but it affects the quality of the perceived voice. Proceedings ICMC 2004

Page  00000005 As an accordion player I have imagined of a versatile instrument which "breaths" like the voice, in order to give manually the timevariation transitions of the vowels and of diphones (Georgaki, 1998). I have tested vocal sounds by my MIDI-accordion (Victoria) making expressive vocal-like sounds (by manipulating the breath controller) but I couldn't produce singing phrases: I had to reconstruct my MIDI accordion and add more controllers on the left hand for the processing of the formants and the concatenation of the diphones. Today new controllers have been designed in Princeton University by the team of Perry Cook which allow a more sophisticated control of the vocal synthesis in real-time even though it is yet under development and it is not a natural "fit" due to many parameters: Squeezevox and COWE 1. In brief, the problem of the control can be treated not only by the physiological point of view but also from the biological one, which in collaboration with the researchers of the cognitive sciences will give the proper information about the influence of psychological situations on singing interpretation and the importance of the body anatomy on singing (resonators, tongue, vocal cords etc..). 4. Towards the "instrumentalisation" of the voice: conception of a vocal synthesizer. Having already mentioned some aspects of the problems of the singing synthesis, we would like, in this last paragraph of this paper, to discuss the conception of a vocal -synthesizer, in instrument form [Cadoz,1994; Dufourt 1996] by formalising the vocal "gesture", which could be used not only by composers but also by the musicians12 The multitude of models and techniques doesn't provide the musical world with a solution. What is needed is a complete model that could give the user the possibility to reproduce, by the means of a written musical phrase, voices of a good quality which can be extended in several registers and treated by different techniques; this has not yet been materialized. Let us imagine, for example, the way in which we could synthesize a singing phrase of an aria (for example 'lasciate mi morire', Cl. Monteverdi, Ariadne in Naxos) in different 1lhttp://soundlab.cs.princeton.edu/research/controllers/sque ezevox 12 Who 'dream' of a vocal electronic instrument for performance. registers, and also in a different voice technique other than this of classical singing. The first operation is possible with the new software synthesizers VOCALOID and CANTOR, but we are expecting the second one. On the other hand our expectations of this vocal synthesizer are not restricted only in the domain of research (use the computer as a tool for the simulation and extrapolation, hybridisation of singing phrases or resurrection of forgotten voice techniques) but also as a powerful tool for composers implicated in vocal compositions and in the domain of performance. More specifically, after all these research projects done on the synthesis of the singing voice we need to start conceiving 'instruments' designed for the vocal synthesis during musical performance in a hardware form. In line with the research of the "lost instrument" [Dufourt, 1996] our ambition here is to discuss the possible form of a hardware vocal synthesizer which can reproduce a big palette of vocal sounds, control prerecorded voices, imitate a cultural vocal model, and could give to the composer the possibility of investigation and experimentation (sound chimeras) and to the performer the liberty of expression. From this point of view, we want to discuss the eventual conception of a musical tool at the borders of an instrument (with the proper flexible interface and control) in order to make more evident the relation between computer music research and creation/performance (leaving aside programming languages etc.). Since it's not our objective today to talk about the exact form of the final synthesizer, we will limit the discussion to tour anticipations. We would like to witness the creation of a synthesizer that could provide us with the most concrete model for the voice in synthesis in the field of music. Our first idea, which has been discussed with some researchers in the field, is the conception of a vocal synthesizer which could: -reproduce not only a wide palette of timbres (by the means of diphones), but also cover a wide range of registers whilst preserving timbre homogeneity between them. -be equipped with the proper rules of different vocal techniques and the appropriate modal musical systems (ancient Greek modes, ecclesiastic modes, Indian modes, etc.) -have the possibility to combine the singing technique and languages (for example combine the vocal technique of Byzantine singing with Portuguese language) -be used not only like a studio instrument but also as a performance one (like the analog and digital synthesizers) Proceedings ICMC 2004

Page  00000006 In order to enrich this 'virtual vocal singer' with elements concerning not only timbre, but also the technique and the language, we must extract the data by analysis studying different vocal models (thibetan singing, byzantine singing, algerian singing, Mongolian singing etc..) and implement all these techniques on the computer In order to justify our ideas, about the eventual conception of a vocal synthesizer, we are going to discuss the construction this instrument according to the classical instrumental model of the acoustician E. Leipp [Leipp, 1989] which depends on the: a) perceptive imperatives, b) fabrication imperatives, c) anatomo-physiological imperatives, d) commercial imperatives and e) the liberty domains concerning pitch, intensity, timbre, form f) the possibility to implement new vocal techniques in a cognitive form. According to the perceptive imperatives, music is destined to be heard and we must avoid surpassing certain limits which could eventually deform the vocal character. We must not neglect our auditory system and the FletcherMunson's curves. According to Morozov's experiments [Sundberg, 1981] has been found that in sung female and male voices the identification of vowels becomes easier in the low tessiture. If male voices overpass G4 and female voices B5 the singing vowels and phrases are non-intelligible. The more we descend in low registers, the weaker the frequency difference is but when we are ascending towards higher frequencies, the difference between the frequencies may reach 500Hz. According to the fabrication imperatives we must take care not only of the innumerable technical imperatives as for the construction of an ordinary synthesizer, but also of the control buttons which must be conceived in a different way than in an ordinary 'instrumental synthesizer'. In our case we must have the6 pertinent control environment in order to control not only the timbre of vowels (fundamental frequency, amplitude, formants and trajectories, frequency modulation, etc.) but also the combination of consonants and vowels into phrases as well as the adjunction of preset vocal diphones. Following this line of thought, the ordinary keyboard which controls the pitches of synthetic instrumental sounds is insufficient because one of the most important factors for the simulation of a singing voice is 'artificial respiration'. Thus, we are proposing that the form of this synthesizer should have been designed more towards the form of an accordion (equipped with a blow control) is an instrument which like the church organ is functioning by air (free reeds) and as is closely carried by the body which can add more expressivity and liberty of movement to the interpretator. The anatomo-pysiological imperatives can be overlooked in all synthesizers, as a simple movement can produce all the audible scales. The only advantage that is given by this new instrument-concerning the human anatomophysiological imperatives- is the possibility to overcome problems that a human being cannot. For example in order to construct vocal sounds of 100Hz, with the proper intensity level which makes the sound perceptible (curves Fletcher Munson), the human lungs are insufficient. In this way a synthesizer gives us the possibility to construct voices extremely forte or piano by the pertinent equalisation of the registers or by a simple gesture of control on the parameters (Fo, formants positions, amplitude and bandwidth). According to the commercial imperatives we could remark that the commerce of electronic instruments often have a superficial relationship with the musical research; the compromise between researchers and big instrument manufacturers (Yamaha, Korg, Roland, etc.) must, if not avoided, at least be based on new terms in order to push forward the development of tools which are addressed not only to the popular musicians but also to the composers of contemporary music. Additionally, the fact that every language has its proper phonetic rules and phonemes, prevents the commercialisation of a vocal synthesizer, for evident reasons. Now, if a vocal synthesizer is being commercialised, first of all the problems of control (language, timbres, techniques) must be resolved in a convenient way and second the designer must preserve the sophistication of the instrument by addressing the people who have the knowledge to handle it in an artistic or scientific way (people who are working in the field of electroacoustic and computer music, institutions, composers and performers, phoneticians, singers etc.). The new software VOCALOID by Yamaha and CANTOR by Virsyn seem promising in the field of the score to singing synthesis but not yet, in the field of the voices' quality. Other parameters that could affect the performability of the synthesizer is the liberty domains concerning the pitch, intensity, timber, form. First of all, according to the liberty domain concerning pitch we are expecting from the synthesizer to give the possibility to the user for pitch modulation, change of registers and easy transition from the low to the high register. Proceedings ICMC 2004

Page  00000007 We must also examine the case of polyphony (how many voices can be produced simultaneously in the same or different timbres?). According to the liberty domain concerning the intensity and the dynamics, we are expecting a better control than the ordinary synthesizer in order to enhance to expressive quality of singing. We have been confronted many times till now with disadvantages of expression in the most electronic instruments of the commerce. In addition, the control by sensitive touch or pedals is insufficient in the case of voice as we have already describe above. According to the liberty of choosing the timbres by pertinent controllers or by selecting a pre-constructed timbre-diphone from the database, we can create instrumental sounds which approach the vocal sound by extrapolation, hybridisation or interpolation. Finally, according to the liberty of forms, the interpreter-performer is given the possibility of controlling with virtuosity three multidirectional parameters (timbre, intensity, pitches) in a convenient way (we must not forget the excellent example pf Clara Rockmore playing the Moog/ Theremin)13. An ergonomics study should help the instrument designers who are interested in the design of a 'cognitive' instrument. Finally, an interesting instrument for music is not necessarily a complicated instrument, which gives the possibility to interpret many notes (like the piano) but also an instrument which allows a number of effects and different musical forms. Our last expectation of this vocal synthesizer concerns the field of ethnomusicological research: to which degree this synthesizer could be a pertinent tool for integrating several voice techniques, which could be studied and modelled with the aid of ethnomusicologists, phoneticians, and system engineers or study forgotten voice techniques of other cultures and times (e.g. Greek ancient singing)? All tools are extensions of human intention. The musical instrument that are in essence an extension of human voice and touch, are conceived in order to combine the emotion of the speaking voice and of the singing voice, with the possibility of dexterity than can be achieved with the fingers and hands. Yet, the immediate musical expression is still the voice, the most difficult to master instrument. In order to overcome the difficulty of mastering the voice, we are dreaming of such an instrument, 13We should not forget the excellent example pf Clara Rockmore playing the Moog/Theremin. that we will allow everyone of us (without the gift of a resonant voice) to model with the hands our own vocal environment! Conclusion Though systems and software that synthesize singing voices exist, the work entailed in making them sound real was complicated and onerous. Even after much fine-tuning, synthesizing vocals to be indistinguishable to listeners from real singing was still virtually impossible. The last years, some new commercial models (Vocaloid, Cantor) present new aspects on the synthesis of the singing voice and special controllers (like Squeeze vox) make easier the performance of it. Ameliorating the technical problems concerning this domain, enriching the models with different vocal techniques en languages, researchers can in the future orientate their research in the construction of a "universal" vocal synthesizer which can be more accessible to composers, musicians and musicologists. Future challenges include synthesizer models improvements, automatic estimation of model parameter values from recordings, learning techniques for automatic rule construction and, last, gaining a better understanding of the technical, acoustical and interpretive aspects of the singing voice. REFERENCES Berndtsson, Gunilla (1995) Systems for synthesising singing and for enhancing the acoustics of music rooms. Dissertation, KTH, Department of Speech communication and Music Acoustics, Royal Institut of Technology, Stockholm. Bonada j., Loscos A.(2003): "Sample-based singing voice synthesizer by spectral concatenation", Proceedings of Stockholm Music Acoustics Conference2003, Stockholm Sweden. Carlson,G. and Ternstrim,S. and Sundberg, J. (1991) A new digital system for singing synthesis allowing expressive control.In ICMC'91 proceedings, Montreal Chowning John (1981) Computer Synthesis of Singing Voice. In ICMC '81 proceedings, La Trobe University, Melbourne. Cook Perry (1993) Spasm, a real -time Vocal Tract Physical Model Controller; and Singer the companion Software Synthesis System. Computer Music journal, 17(1),MIT, Boston Cook, P. R. (1996)"Singing Voice Synthesis History, Current Work, and Future Directions," Computer Music Journal, 20:2, Cook, P. R., and C. Leider (2000) "Making the Computer Sing," Proceedings of the XIII Colloquim on Musical Informatics, L'Aquila, Italy, September, Cook, P. R., and C. Leider (2000) "Squeeze Vox: A New Controller for Vocal Synthesis Models," International Computer Music Conference, Berlin, August, Cook. Perry (1996) Singing voice synthesis: History, current work, and future directions. In Computer music journal, vol. 20 no 3,. MIT Press. p.27 Proceedings ICMC 2004

Page  00000008 Depalle Philippe, G. Garcia, X. Rodet (1994) A virtual castrato. In ICMC 1994 proceedings, Aarhus, Denmark. Dufourt Hugues(1996) L'instrument philosophe. Entretien avec p. Szendy, Cahiers de l'Ircam, no7,Paris. Fant Gunnar (1973) Speech sounds and features. MIT press.Cambridge. Gael Richard (1990) Rules for fundamental frequency transition in singing synthesis. Dept of Speech Communication and acoustics, Royal Institute of Technology, Stockholm. Georgaki Anastasia (1998a) Problemes techniques et enjeux esthitiques de la voix de synthese dans la recherche et creation musicales. These de doctorat, EHESS/IRCAM, Paris. Georgaki Anastasia (1998b)Synthesis of the singing voice: links between research and creation. In the proceedings of the First symposium on music and computers, Ionian University, Corfu. Gibson, I.S., Howard, D.M., Tyrell, A.M.(1998) Real-time singing synthesis using a parallel processing system. Proceedings of the IEE colloquium on Audio and music technology;the creative challenge of DSP, IEEDigest 98/470, 8/1-8/6. H.-L. Lu. (2002) Toward a High-Quality Singing Synthesizer with Vocal Texture Control, PhD thesis, Stanford University,. Howell P. and N. Harvey (1975)Voice techniques. In musical structure and Cognition,.ed. P. Howell, I Cross and R. West. New York Academic Press. Lomax Ken (1996) The development of a singing synthesizer. In JIM'96 proceedings. Lomax Ken (1997).The Analysis and Synthesis of the Singing Voice. PhD thesis, Oxford Kamarotos D., Diamantopoulos T., Philippis G. (1993) IGDIS: A modern Greek text to speech/singing program for the SPASM /singer instrument. In ICMC'93proceedings, Tokyo.. Leipp Emile(1989)Acoustique et Musique. Masson, Paris. Macon Michael (1997) A singing voice synthesis system based on sinusoidal modelling. Proceedings of the International Conference on Acoustics, Speech and signal processing. Macon M.,, L Jensen-Link, J. Oliverio, M. Clements, and E. B. George, "Concatenation-based MIDI-to-singing voice synthesis", 103rd Meeting of the Audio Engineering Society, New York, 1997. Mellody. (2001) Signal Analysis of the Female Singing Voice: Features for Perceptual Singer Identity. PhD thesis, University of Michigan. Meron Y.(1999) High Quality Singing Synthesis using the Selection-based Synthesis Scheme. PhD thesis, University of Tokyo. Paabon Peter(1994) A real-time singing voice analysis/ synthesis system. ICMC 1994 Proceedings. Aarhus, Denmark. Pierucci P, Paladin A.(1997) Singing voice analysis and synthesis system through glottal excited formant resonators in ICMC '97 Proceedings, Thessaloniki. Puckette Miller (1991) Music and speech synthesis using nonlinear distortion and amplitude modulation", Journal of audio society. Risset J.C.(1991) Timbre analysis by synthesis: representations, imitations, and variants for musical composition. In Musical signals and representations, De poli et al. Rodet X. et Al. (1984) The Chant project: From Synthesis of the singing voice to sythesis in general. Computer Music Journal 8 (3) (pp. 15-31 ), MIT Press. Rodet X., Depalle Ph. (1992) Spectral enveloppes and inverse FFT synthesis. CNMAT, Paris. Rodet X., Depalle Ph., Poirot G.(1988) Diphone Sound Synthesis", Int. Computer Music Conference, Koeln, RFA. Rodet X(2002): Synthesis and processing of the singing voice, Proc.1st IEEE Benelux Workshop on Model based Processing and Coding of Audio (MPCA-2002), Leuven, Belgium. Rossitier D., Howard D. (1994) Voice source and acoustic output qualities for singing synthesis. In ICMC'94 Proceedings. Aarhus, Denmark. Sundberg Joahn (1981) To perceive one's own voice and another person's voice.In research aspects of singing(80-84). Royal Swedish Academy of Music, Stockholm. Sundberg John (1989) Synthesis of singing by rule. In Current directions of computer music research, ed. Max Mathews et John Pierce, MIT Press.. Tisato Graziano, Maccarini Andrea Ricci (1991) Analysis and synthesis of diphonic singing", Bulletin d'Audiophonologie, vol. VII no 5 et 6 - 619 -648, Ann. Sc. Aniv. Franche -Comte. Tubach J.P (1989) La parole et son traitement automatique. Calliope, Masson.Paris. Y. Meron (1999) High Quality Singing Synthesis using the Selection-based Synthesis Scheme. PhD thesis, University of Tokyo,. Yamaha Corporation Advanced System Development Center. New Yamaha VOCALOID Singing Synthesis Software Generates Superb Vocals on a PC, 2003.http://www.vocaloid.com Youngmoo Edmund Kim (2003) Singing Voice Analysis/Synthesis, PhD thesis, Program in Media Arts and Sciences, School of Architecture and Planning, MIT, Boston. Zera Jan, Gauffin Jan, Sundberg Johan (1984) Synthesis for the selected vcv syllabes in singing. In ICMC'84 proceedings, Ircam, Paris. Proceedings ICMC 2004