users, as well as their prospects [Georgaki,
1998a].
In order to outline the current research on
synthesis of the singing voice we would like to
make some remarks concerning the advantages
and status of every synthesizer, along with its
applicability in contemporary music:
a) one of the most complete "cognitif"
synthesizer is the Mu(s)se/Rulsus [Sundberg,
1987; 1989] as it is equipped with many rules
that are describing classical singing. It gives to
the user the possibility to produce his own sung
phonemes, words and phrases of reasonably
good quality, allowing for vocal expressivity
[Gael, 1990, Carlsson 1991, Berndtsson, 1995]
and it is used also for the synthesis of
polyphonic choral singing. One of its cues is
the capacity of controlling the fundamental
frequency and the formants with more speed
and precision than human singers but we expect
the amelioration of the control interface in order
to be more accessible to composers and
musicians.
b) other research projects, like Chant [Rodet et
al, 1985, 1995] have been oriented towards
more artistic applications, equipped with the
proper interface and environment, in order to
afford software tools to the composers or
reconstruct ambivalent voices of the past
[Depalle et al,1994]. More precisely, Chant
[Rodet et al., 1984] is a software program
designed for compositional needs rather than for
scientific ones, as it stresses the continuity
between sound processing and synthesis by
filtering synthetic or real voices and
instruments. During the last years, special
research on the concatenation of the high
quality of vowels obtained by CHANT by the
System ABS/OLA3 (Rodet, 2002) suggest the
concatenation of transition and consonants
units.
c) the SPASM/Singer model [Cook, P. 1993}
is more advantageous compared with the others
because of its proper control environment: it
offers the user a subtle control on the
parameters of the vocal signal which are related
directly to the vocal pedagogy and physiology
of speech (tongue's or lips' position, the form of
the vocal tract, the vocal effort, etc.) through a
user-friendly interface (the form of the vocal
tract on the screen). Lastly, SPASM/singer can
be controlled by a special interface Squeeze Vox
[Cook, 2000] that brings the control of the
synthesis of the singing voice closer to the
musicians' abilities.
d) Despite the importance of the research
projects on the singing voice, the majority of
these projects are based on the analysis
synthesis of the classical singing technique
allowing for some exceptions such as [Tisato
1991; Rodet 1985; Kamarotos et al, 1994] who
have been studying extra-European voice
techniques (Diphonic singing, Thibetan singing,
or traditional Greek singing).
e) The new projects carried out the during
last years in academic institutes propose new
ways of singing synthesis4 which is less
fastidious than the previous one and exalt the
score to singing synthesis as the future of the
singing software development [Lomax 1996;
Macon, 1997; Meron 1999, H.L-Lu 2002,
Bonada and Loscos 2003, Yoongmoo 2003 ].
The first commercial score to singing software
synthesizers have appeared only recently, since
2003 (VOKALOID5 by Yamaha and
CANTOR6 by Virsyn) and promise the
inauguration of a new era.
In any case all these projects, and especially
the models in which we have been referring to
more extensively, differ not only in their
synthesis technique or the implemented rules
(describing singing) but also in the control
interface and the resulting sound (every model
has its own particular voice signature). They
differ also in the performability of the model
and its applications in the computer music field
(composition, psychoacoustics, vocal training
and education, or a powerful tool for
performance).
Some acoustic examples will give evidence
for these observations.
3. Technical problems in encoding the
expressive singing voice
How difficult is to imitate the human
"Psyche"? (Georgaki, 1998). The unique
These new techniques are mostly based on the
concatenation: sampled synthesis or sinusoidal models with
a MIDI output.
5 VOCALOID uses Frequency-Domain Singing
Articulation Splicing and Shaping, a vocal (singing-voice)
synthesizing system developed by Yamaha. With this
system, the "singing articulations" (collections of voice
snippets, such as of phrases, and snippets of vocal
expression variations like vibrato) needed to reproduce
vocals are collected from custom-produced recordings of
accomplished singers and put into a database after
conversion into frequency domains.
6 Virsyn presented in the Frankfurt Musikmesse
Prolight+sound2004a new 8-part vocal synthesizer,
CANTOR(Mac/Win)- (http:/Aiwww.kvrvst.com/get/984.html) - which lets users enter words in
English and play them melodically from a MIDI keyboard
in real-time. According to the manufacturer, Cantor's Voice
editor lets you edit the character of the virtual singer by
defining the base spectrum for vowels and consonants. The
application also includes a Phoneme editor and offers realtime control over vibrato rate and depth as well as the
gender of the singing voice.
3 Analysis-by-synthesis /Overlapp-Add
Proceedings ICMC 2004