Page  483 ï~~Voice source and acoustic output qualities for singing synthesis David Rossiter and David M Howard Parallel and Signal Processing Applications Research Group, Department of Electronics, University of York, Heslington, York, Y01 5DD, ENGLAND Electronic mail: {dpr,dh} Abstract This paper describes previous and current research into voice source and acoustic output qualities of the singing voice which may support a more advanced model for synthesis. Our research indicates that control over the closed quotient of the vocal folds and the distribution of energy across frequency are of considerable relevance for singing synthesis, but that the relationship between vocal training and the development of these qualities is not straightfoward. Keywords: voice source, larynx, acoustic output, singing synthesis 1 Introduction Fundamental differences between the male and the female voice are well known within the field of vocal synthesis. For example, the female voice has a naturally higher fundamental frequency than the male voice and consequently has relatively less harmonic partials. The popular model for synthesis used by tools such as SPASM [Cook, 1993] and the Klatt synthesiser [Klatt, 1980] is known as the source-filter model of the human voice. In this model the action of the larynx is regarded as producing a periodic source signal which is subsequently altered by the filtering actions of the articulatory organs in the supralaryngeal vocal tract. Basic synthesis can be achieved by applying a band of filters representative of the articulatory actions of the vocal tract to a periodic signal representative of the glottal wave. 2 Voice analysis Analyses of voice data has revealed two trends which appear to be highly relevant; one at the voice source level and one relating to the acoustic output. " Voice source The ratio of how long the vocal folds are in contact relative to how long they are apart in each cycle (called the 'closed quotient' (CQ) of the laryngeal period) is shown to change across different levels of vocal ability and between different styles of singing. Research has demonstrated different patterns of CQ for different styles of male [Howard, 1992], and female [Estill, 1988][Evans, 1993] singing, and across different levels of training [Howard, 1990]. " Acoustic output Sundberg [1974] identified a spectral presence, termed the 'singers formant', in the region of 2100Hz to 3800Hz in the acoustic output of trained singers which was less evident for untrained singers. Several studies have identified this phenomena in different contexts, including the analysis of tenors from CD recordings [Howard & Lindsey, 1989]. 3 Developmental trends These studies do not reveal how these parameters may change as a singer develops. This has important implications for singing synthesis. For example, it is likely that an assumption would be made that synthesis of two voices, one more trained than the other, would involve a higher level of CQ and of spectral energy in the singers formant range for the more trained voice. However, it may be that there are natural fluctuations in the developing voice which imply that such an assumption cannot reliably be made. In order to examine whether these trends could be discernable in a developing singer, a male subject with no previous singing experience was recorded over a period of 27 months during which he attending singing lessons. The subject was required to sing a two octave scale up and down. Analysis was made of the voice source through the use of an electrolaryngograph, which measures the degree of vocal fold contact. From the output of this device the average level of CQ was derived. Analysis was also made of the distribution of energy of the vocal output of the subject in the region 2KHz to 4KHz relative to the rest of the vocal spectrum. The results for these measures are shown in figures 1 and 2 respectively. The subject exhibitted a general increase in CQ and energy in the singers formant band, although not as a linear function of training. This implies that synthesis of the male singing voice may consider the selection of an appropriate level of CQ and singers formant as a function of vocal training, but that a strictly linear relationship cannot be assumed. 4 An example implementation One tool for exploring the applicability of CQ and the singers formant to singing synthesis is through the use of the CSound system. An example of electrolaryngograph signals demonstrating low and high CQ periods is shown in figure ICMC Proceedings 1994 483 Acoustics

Page  484 ï~~n: 9 4096 Sample Figure 1: Developments in CQ with training or - 48000 kr " 4800 k=v - 10 nc nis - 1 t. ling rate 1 Control rate t orkr s mnoaoutput uavJ avib al a2 a3 a4 as sf instr 1 j define instrunen isf-100 change value for different level. of jsingers fomenat lineg gp3*0.O5,0.2,p3*0.9,.2,p3*0.05,0 o envelope oscil 0.02p4.S.5.1 vibrato oscil 1S0,p4+avib,2 voice source reson al+avib,780,100 jformant 1 reson al+avib, 1000,75; foaant 2 reson a1+avib,3000,75; fomeant 3 reson a1+avib,3300.75 foment 4 reeon al+avib,3000,200 singers foment - (a2-a3+(a4)-(aS)) + (aaf*ief) j add foments out ae *aenv endin I output audio I end of dfinition Figure 2: Developments in ratio with training 3, together with approximate CSound representations of glottal flow and the wavetable directive used to create them. The instrument shown in figure 4 was designed for basic investigations into the synthesis of both voice source and singers formant trends. Control over energy in the singers formant region is enabled through the constant isf within the instrument. The higher this value, the higher the embellishment of the spectral region associated with the singers formant. 5 Perceptual results In relative terms, the low CQ wavetable resulted in a soft sound which gave the impression of breathiness. In contrast, the high CQ wavetable produced a sharper, more 'professional' sound. A relative increase in the professional quality of the synthesised sound was also noted with spectral enhancement of the singers format region, (for example, isf=100 in contrast to isf=1), although there appears to be some loss of vowel quality. Figure 4: Example CSound instrument 6 Conclusions Our research indicates that control over the closed quotient of the vocal folds and the distribution of spectral energy would support more natural singing synthesis, but that the relationship between vocal training and the development of these qualities is not straightfoward. This must be considered in the design of synthesis instruments. 7 Acknowledgements Thanks to Michael Clarke for useful comments. This work was funded by SERC research grant GR/H14878. Further reports are available from, in directory /pub/distrib/voice. References [Cook, 1993] COOK, P.R. SPASM, a Real-time vocal tract physical model controller; and Singer, the companion software synthesis system. Computer Music Journal, 17:1, pp 30-44, 1993 [Estill, 1988] ESTILL, J. Belting and classic voice quality: some physiological differences. Medical problems of performing artists, Philadelphia: Hanley and Belfus, pp 37-43, March 1988 [Evans, 1993] EVANS, M. & HOWARD, D.M. Larynx closed quotient in female belt and opera qualities: a case study. Voice, 2(l), pp7-14, 1993 [Howard, 1992] HOWARD, D.M. Quantifiable aspects of different singing styles - a case study. Voice, 1(1), pp47-62, 1992 [Howard et al., 1990] HOWARD, D.M., LINDSEY, G.A., & ALLEN, B. Toward the quantification of vocal efficiency. Journal of Voice, 4:3, pp205-212, 1990 {see also Errata:Journal of Voice, 5, pp93-95, 1991}. [Howard & Lindsey, 1989] HOWARD, D.M. & LINDSEY, G. Spectral features of renowned tenors in CD recordings. Proceedings of the International Conference on Speech Research, pp17-20, 1989 [Klatt, 1980] KLATT, D.H. Software for a Cascade/Parallel formant synthesizer. Journal of the Acoustical Society of America, 67, pp971-995, 1980 [Sundberg, 1974] SUNDBERG, J. Articulatory interpretation of the 'singing formant'. Journal of the Acoustical Society of America, 55, pp838-844, 1974 LowCQ IghCQ Eecargogmaph Example CSownd J t glottal flowt!wefr 7tti waforma f0 a CSowod 0140 s.s11 ai0 0 o2bi..10 i wavatb" ith tow I tbl ith blob N 1 aor fil 0 113 e2.5 r1 2O.1r o 1 1 20 3 270 1o.4 tf o Figure 3: Voice source measurement and CSound implementation Acoustics 484 ICMC Proceedings 1994