Page  00000005 Synthesis of the laryngeal source of throat singing using a 2 x 2-mass model Ken-Ichi Sakakibara*l, Hiroshi Imagawa*2, Seiji Niimi*3, Naotoshi Osaka*l kis~brl.ntt.co.jp, imagawa~m.u-tokyo.ac.jp, niimi~iuhw.ac.jp, osaka~brl.ntt.co.jp *1 NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi-shi, Kanagawa, 243-0198, Japan *2 Department of Speech Physiology, The University of Tokyo 7-3-1, Hongou, Bunkyo-ku, Tokyo, 113-0033, Japan *3 Speech and Hearing Center, International University of Health and Welfare 2600-6, Kitakanemaru, Ohtawara, 324-0011, Japan Abstract Singing voices have various timbres. Throat singing and some other Asian traditional singing voices have a pressed timbre that is significantly different from the European classic singing voice. In our previous study on throat singing, the vibration of the false vocal folds as well as that of the vocal folds was observed and was found to be essentially due to the pressed timbre. This paper describes a 2x 2-mass model as a physical model, defines an adduction parameterization of its parameters, and presents a simulation of vocal fold and false vocal fold vibrations in the larynx. Furthermore, a visual simulator of the laryngeal movements is demonstrated. By using this model, the vibration patterns of the two different laryngeal voices in throat singing (the pressed and karygraa voices) and the normal pressed voice have been simulated. The results show the possibility of synthesis of various timbres for singing. 1 Introduction The singing voice has numerous variations of timbre. There are considerable differences, for instance, between European classical singing voice, such as bel canto and German lied, and the Asian traditional pressed singing voices, such as throat singing, Japanese Youkyoku, and Korean Pansori. The laryngeal source is an essential factor in determining the timbre of the singing voice, especially for pressed quality. In general, the pressed quality is obtained by excessive adduction of the supraglottal structure. The laryngeal adjustments in Asian traditional pressed singing are much different from that in European classic singing [4, 5, 7]. Synthesizing such varying timbres in singing voices requires a flexible laryngeal source model. A glottal waveform model allows us to control its parameters to approximate the perception of voice [6, 8]. On the other hand, a physical model allows us to controll its parameters according to the physical and physiological mechanism of laryngeal adjustment. Based on the physiological observations, we have constructed a 2x2-mass model as a physical model which is devised by attaching a two-mass for the false vocal fold to ordinary two-mass model for the vocal folds [2, 8]. In this paper, after summarizing the physiological observations in throat singing, we describe the mechanism of a 2x2-mass model and its adduction parametrization. We also present a visual simulation tool for the model. Finally, using the model, we simulate the laryngeal sources of throat singing and the normal pressed voice. 2 Laryngeal source in throat singing 2.1 Throat singing Throat singing is a traditional singing style of people who live around the Altai mountains. Khoomei in Tyva and Kh6imij in Mongolia are representative styles of throat singing. Throat singing is sometimes called biphonic singing, or overtone singing because two or more distinct pitches (musical lines) are produced simultaneously in one tone. One is a low sustained fundamental pitch, called a drone, and the second is a whistle-like harmonic that resonates high above the drone. The production of the highly pitched overtone is mainly due to the pipe resonance of the cavity from the larynx to the point of articulation in the vocal tract [1]. On the other hand, the laryngeal voice of throat singing has special pressed timbre and supports the generation of the overtone. The laryngeal voices of throat singing can be classified as pressed and kargyraa based on the listener's impression, acoustical characteristics, and the singer's personal observation on voice production. The pressed voice is the basic laryngeal voice in throat singing and used as drone. The kargyraa voice is a very low pitched voice that ranges out of the modal register. 2.2 False vocal folds The false vocal folds (ventricular folds) are a pair of soft and flaccid folds which attach to anterolateral

Page  00000006 surface of the arytenoid cartlages (Fig. 1). While the vocal folds (VFs) have a mechanism that change the stiffness, thickness, and longitude by the muscles (mainly by the action of thyroarytenoid muscle), the false vocal folds (FVFs) are incapable of becoming tense, since they contain very few muscle fibres. The FVFs are capable of moving with the arytenoid cartlages. They are also abducted and adducted by the action of certain laryngeal muscles. In normal phonation, they do not vibrate [9]. Vocal tract False / False vocal glottis / ld Vocalfold Laryngeal Vocalfold ventricle Glottis Trachea Fig.l: Colonal section through the larynx 2.3 Physiological observation of laryngeal movements Here, we summarize the results of the physiological observation of laryngeal movements using simultaneous recording of high-speed digital images, EGG, and sound waveforms in [8]. The common features of the pressed and kargyraa voices are an overall constriction of the suprastructures of the glottis and vibration of the FVFs. The differences lies in the narrowness of the constriction and the manner of FVF vibration. In the pressed voice, the FVF vibrates at the same frequency as the VF and both vibrate in the opposite phase. In the kargyraa voice, the FVFs can be assumed to close once for every two periods of closure of the VFs, and this closing blocks airflow and contributes to the generation of the subharmonic tone of kargyraa. 3 Physical model 3.1 Two-mass model The VF vibrations are modeled via the two-mass model [3], which make it possible to simulate the movements of the upper and lower portions of the VFs in different phase. The model parameters are defined as follows. mi, m2: paired masses of the upper and lower portions of the VF; dl,d2: thickness; k1, k2: stiffness; r1,r2: viscous resistances; (1, (2: damping ratios which satisfy rl = 21i, v/m.1k, 7'2 = 2(2 r/rn2k; kc: stiffness of the linear coupling spring for the upper and lower portions, lg: the length of the glottis; Ag,, A92: the cross-sectional areas between masses; Agol Ag92: the cross-sectional areas between masses in rest. A tension parameter Q which controls pitch of a synthesized sound is to parameterize several model parameters which are related to physical properties of the VF as follows: ki = kioQ, rni = rioQ, di = dio//Q for i = {1,2} and kc = kcoQ where ki0, mi0, di0, kco are initial values. 3.2 2x2-mass model For a physical simulation of the VF and FVF vibrations, we have proposed a 2 x 2-mass model as a self-oscillating model of VF and FVF vibrations[8]. The model (Fig. 2) was devised by attaching a two-mass model for FVFs to the ordinary two-mass model for VFs with a laryngeal ventricle space between the models. The laryngeal ventricle is assumed to be a cylinder and not to be deformed. The mechanical transmission of vibrations between the VFs and FVFs were not considered. The the shape of area of vocal tract which have acoustic interaction with the VF vibration is time-variable by the FVF vibrations. Control parameters for the FVFs: rm, d, k, rI, (, A', A0I, for i = 1,2, and k' are similarly defined as in a two-mass model. S, is transverse cross-sectional area and h, is the height of the laryngeal ventricles. vocal tract k'2 k 4 False vocal fold d2,ri k e 4 Vocal fold Fig. 2: Colonal section of 2x2-mass model We adopt a two-mass model instead of a one-mass model for the FVF because the FVFs are as thick as the VF and a two-mass model reveals the same movement as a one-mass model does, if kc is set sufficiently large. 3.3 Adduction parameter for the false vocal folds As stated above, the FVFs contain few muscle fibres and, unlike the VFs, their physical properties essentially do not change. Therefore, it is meaningless to define a tension parameter for FVFs. Hence some other paramerization is necessary. It is a physiological fact that the FVFs are adducted by the action of certain laryngeal muscles, but it is unclear whether their physiological properties, such mass and stiffness, are changed or not

Page  00000007 by the adduction. We take into account the changing shapes of the FVF and, as one possible parameterization of the model parameters by introducing an adduction parameter Q' for the FVFs that satisfies k = koQ', mr = m o/Q', AA i = A//Q' for i = 1,2 and k' - k'\, where k, m, A' are initial values. Before we can discuss the validity of this parameterization, we must wait for the detailed measurements of physical properties of the FVFs by using fresh excited human larynx. 4 Visual environment A visual simulation tool called VibLaVie (vibrated larynx viewer) is implemented on a Windows PC. Fig. 3 shows its main panel. Fig. 3: Parameter setting The default initial values are given, but users can set arbitrary initial parameters using the initial parameter setting panel, After setting the initial parameters, users can also set segmentally linear envelopes that describe time-variable information for parameters. Fig. 4 shows the displacements of the masses, a laryngeal airflow, and a synthesized mouth-output sound obtained by convoluting the laryngeal airflow and vocal tract resonator whose formant-parameters can be also set by users. Fig. 5 shows the VF and FVF vibration visualization panel. Users can see the vibrations in larynx. This visual environment is very useful in simulating the model, which has many complicated parameters and acts as a chaotic complex system. 5 Experiments 5.1 Basic parameter setting We set the initial values of VF parameters as follows: Ps = 20 cm H20, 19 = 1.4 cm, Ag01 = A902 = 0.02 cm2 di = 0.25 cm, d2 = 0.05 cm, mi = 0.125 g, m2 = 0.025 g, ki = 80 kdyn/cm, k2 = 8 kdyn/cm, 1 = 0.1, (2 = 0.6, kc = 25 kdyn/cm. These constants are the same as the ones in [3], which was deduced from physiological measurements. We also set the initial values of the parameters for the FVFs and laryngeal ventricles as follows: 1' = 1.3 cm, A'0 = A'0 = 0.04 cm2 d = d' = 0.22 cm, m = m2' = 0.11 g, k' = k' = 50 kdyn/cm, (1 = i 2 = 0.4, kc = 80 kdyn/cm, S, = 2 cm2, h, = 0.5 cm. These constants are not precisely based on the physiological measurements. However, the longitude and width of the false glottis and thickness of the FVFs were estimated from images and are not far from the real values. It was verified by using MRI, that the laryngeal ventricle space exists in throat singing phonation. The vocal tract is assumed to be a uniform pipe, 16 cm long, 5 cm2 in cross-section. 5.2 Results and discussions We chose several values from 0 to 1.0 cm as an adduction parameters Q'. The results are shown in Fig. 6; for each Q', horizontal displacements of i21, m2 ml, m77 is shown at the top and a laryngeal airflow (volume velocity) Ug is shown at the bottom. In the bottom, the solid line, dashed line, dotted line, and dashed-dotted line show the displacement of ml, in22, m1, and m' respectively. Fig. 4: Sound synthesis Fig. 5: Laryngeal movement visualization

Page  00000008 -01 160Im ts 170 1190 150 160 170 IS0 Q' = 0.05 o.2 1r 160o 170 180 Cos 150 160 170 180 190 200 150 ms throat singing, is observed when Q' = 0.85, 1.0. The difference of the phase between vibrations of the VF and FVF at Q' = 1.0 is different from that at Q' = 0.85. The shape of the simulated laryngeal airflow was in agreement with the estimated laryngeal airflow by inverse-filtering [8]. 90 2W From the physiological observation [7, 8], the vibration patterns depend on how close the FVFs are approximated. The pressed voice vibration was observed in the close approximation, and the kargyraa voice vibration in the middle approximation. The results of the simulation also agree with these physi1 0 ological observations. Q' = 0.1 190 J0 ms J1 200 rns y' = 0.35 Q' = 0.5.........0.3 *,,,,,,,,,,,: 0.:::;., " I:::.1-. ' -,_ _ I0.1 I i 0 160 170 180 190 200 150 160 170!80 cc!s 8000,, ~ 26001 160 170 180 190 200 150 fk 170 180 I Q = 0.6 Q' = 0.7 c,.) 0.3 0.2 eo' Oý 800. 200t 1o0 150 160 170 180 190 200 160 170 100 190 200 Q' - 0.85 150 160 cus CoIe 60()" 400 170 1.0 Q' = 1.o 6 Conclusions We simulated laryngeal movements for throat singing using a 2 x2-mass model. The results were in good agreement with physiological observations. By using the model, it is possible to synthesize various laryngeal voices. As future work, we should measure the realistic physical properties of the FVF and improve the model, and investigate details of this model from the viewpoint of physics and chaos. Acknowledgments We thank Seiji Adachi, Takafumi Hikichi, Kiyoshi Honda, Emi Z. Murano, Sayoko Takano, Niro Tayama, and Masahiko Todoriki for their helpful discussions. Bibliography [1] S. Adachi and M. Yamada. An acoustical study of sound production in biphonic singing xoomij. J. Acoust. Soc. Am., Vol. 105, No. 5, pp. 2920-2932, 1999. [2] H. Imagawa, K.-I. Sakakibara, T. Konishi, E. Z. Murano, and S. Niimi. Throat singing synthesis by a laryngeal voice model based on vocal fold and false vocal fold vibrations. Proc. of Study Group on Musical Info. of IPSJ., Vol. 01 -MUS-39, pp. 71-78, 2001. in Japanese. [3] K. Ishizaka and J. L. Flanagan. Synthesis of voiced soudns from a two-mass model of the vocal cords. Bell System Tech. J., Vol. 51, No. 6, pp. 1233 1268, 1972. [4] N. Kobayashi, Y. Tohkura, S. Tenpaku, and S. Niimi. Acoustic and physiological characteristics of traditional singing in Japan. Tech. Rep. IECE, Vol. SP89-147, pp. 39 45, 1990. [5] T. C. Levin and M. E. Edgerton. The throat singers of tuva. Scientific America, Vol. Sep-1999, pp. 80 87, 1999. [6] H.-L. Liu and J. 0. Smith III. Glottal souroce modelling for singing voice synthesis. In Proc. ICMC 2000, pp. 90 97. ICMA, 2000. [7] K.-I. Sakakibara, S. Adachi, T. Konishi, K. Kondo, E. Z. Murano, M. Kumada, M. Todoriki, H. Inagawa, and S. Niimi. Analysis of vocal fold vibrations in throat singing. Tech. Rep. Musical Acoust. of Acoust. Soc. Jpn., Vol. 19, No. 4, pp. 41-48, 2000. in Japanese. [8] K.-I. Sakakibara, T. Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal fold and false vocal fold vibrations and synthesis of kh6omei. In Proc. ICMC 2001. ICMA, 2001. [9] W. R. Zemlin. Speech and hearing science - anatomy and physiology. Allyn and Bacon, 4th edition, 1998. Fig. 6: Results of the simulation for valiable Q' The pressed voice without vibration of the FVFs that is observed when Q' = 0.05, 0.1. In general, this type of phonation is observed in normal phonation and some Japanese traditional singing voices, the false glottis is somewhat wider than that in throat singing [4]. The simulation based on the 2 x2-mass model is in good agreement with the observations. A period-triple kargyraa, in which the FVFs vibrate once every three periods of VF vibration, is observed when Q' = 0.35. In this pattern, the pitch of the subharmonic tones should be perceived an octave and a perfect fifth lower than that of the basic phonation. Some throat singers are known to be able to sing the period-triple kargyraa. The normal kargyraa vibration occurs when Q' = 0.5. For this vibration, the shape of laryngeal airflow also agreed with the shape of the laryngeal airflow estimated by using inverse-filtering [8]. When Q' = 0.6, vibration is not periodical or might have very long period (> 1 s). When Q' = 0.7, the period-triple kargyraa is observed again. The pressed voice, in the realm of