Page  00000001 A MULTI-MODAL CONDUCTING SIMULATOR Satoshi USA*1 and Yasunori MOCHIDA*2 *1*2 Graduate Course in Informatics, Kogakuin University 1-24-2, Nishi-Shinjuku, Shinjuku-ku, Tokyo, 163-8677, Japan: *1 YAMAHA Corp. Musical Instrument Research Laboratories 10-1, Nakazawa-cho, Hamamatsu-shi, Shizuoka, 430-8650, Japan: Abstract: This paper describes a multi-modal conducting simulator which behaves like an orchestra in realtime. The system recognizes important and universal conducting elements conforming to the grammar of conducting: i.e. beginning and ending of a piece, Einsatz (cue for a player or a part) with eyes, beat timing including Agogik (rubato, changes in tempo), the beat number in a measure, fermata, Dynamik(dynamics), some aspects of articulation (i.e. espressivo - staccato), and breathing. The system also simulates specific sounding delay and autonomy of actual orchestra. The right-hand conducting gesture is recognized with HMM (Hidden Markov Models). The beat recognition process has been built on a fuzzy model of actual orchestra musicians' recognition. Key-words: Conducting, Multi-Modal, HMM, Fuzzy, MIDI, Gesture Recognition, Human Interface 1 Introduction Trials of a machine conducting recognition system are under way. In early research, tempo, dynamics or timbre are controlled with some special sensors [M.V.Mathews and D.L.Barr'88, D.Keane and P.Gross'89, C.Roads'95]. Conducting gesture detection has been tried using various sensor: e.g. a gripped pressure sensor [M.Hasimoto, K.Iwata and S.Nisida'91], a temperature sensor [Iwashita, S.Ohonuki, T.Hiuga and T.Hosono'94], a pen input device [N.Ide, S.Kato, T.Higuchi and S.Horiuchi'95], a mouse interface [Y.Ito and H.kiyono'96], and a hand-held marker captured by a camera [H.Morita, S.Ohteru and S.Hashimoto'89, P.Carosi]. The left hand gesture was detected by a data glove to control musical parameters, i.e. slur, portament and pianissimo [H.Morita, S.Hashimoto and S.Ohteru'91, T.Machover and J.Chung'89]. As for gesture smoothness, Q values of the local peaks of the gesture signal were extracted [T.Harada, H.Morita, S.Ohteru and S.Hashimoto'91]. However, using image processing, more than 100[msec] delay is inevitable to detect beat timing because of the image updating-rate [H.Sawada and S.Hashimoto'96], and cameras are over-sensitive to the environmental light, especially flashlight. And the sampling rate of the data glove is not enough for real-time beat detection. In recent research, therefore, accelerometers are adopted [H.Sawada, S.Hashimoto and T.Matsushima'97]. Beat timing and volume are controlled [T.Tsuneyama'91]. Ten kinds of gesture patterns (i.e. circle, star, triangle) are distinguished by neural network, and are used as system commands, i.e. start, stop [H.Sawada, S.Hashimoto and T.Matsushima'97, P.Hartono, K.Asano, W.Inoue, and S.Hashimoto'94, H.Sawada, S.Ohkura and S.Hashimoto'95]. The Digital Baton detected not only acceleration but the orientation of and grip pressure on the baton to control various parameters [T.Marrin and J.Paradiso'97, J.A.Paradiso'97]. A new 3-dimensional position sensor was applied to a conducting system [H.Katayose, T.Kanamori and S.Inokuchi'97]. We applied the HMM technique to the conducting gesture recognition in the narrow sense [S.Usa and Y.Mochida'98.7]. Since the HMM recognition can deal with the variance of the gesture, it produces effects in conducting recognition. In speech recognition, the HMM method is adopted by many systems, and became the standard technique because of its high performance. On the other hand, the research using neural network has declined significantly [S.Furui'95]. The conducting system has been developed into multi-modal simulator [S.Usa and Y.Mochida'98.8]. The watching point and breath of a user is detected with an eye camera and a breath sensor, respectively. The system simulates orchestra response to breath and Einsatz by a user. With this system, user can experience the orchestra response. Various gesture-control methods for electronic tone generator have been tried [C.Roads'95, T.Suzuki'94]. On the other hand, the conducting technique has been systematized through historical polish as the conducting grammar, and has been used by orchestras and conductors all over the world. Accordingly, our system conforms to the basic conducting grammar [M.Rudolf'50]. 2 About Conducting About 90% messages on human communication are non-verbal [T.Kurokawa'94]. A conductor also uses both of hands, eyes, breathing, face, and body independently or at the same time to convey his intention to orchestra musicians, instead of words. Accordingly, conducting is a typical form of multi-modal communication [S.Usa and Y.Mochida'98.7]. According to the grammar of conducting [M.Rudolf'50], however, an orchestra can be led

Page  00000002 completely without the left hand, or most conducting elements can be directed with only the right hand. The important, universal and basic conducting elements prescribed in the grammar of conducting, or taught in some college of music are listed in the tablel. Problems in conducting recognition are explained below. halt halt Beginning a Piece: To help orchestra musicians start (n) denotes h playing a musical piece simultaneously, and to direct beat point beginning tempo and Dynamik in advance, a conductor j \ halt beats one extra beat before beginning the.........halt2.............h..alt.... performance. The preparatory-beat gesture starts 1 (2) from the position of attention. The extra beat corresponds ( (1 (1) to the breathing by wind instruments players or singers, (a)2-beat, (b)2-beat, (c)3-beat, (d)3-beat, for example. The conductor had better breathe lightly. espressivo stacato espressivo stacato Einsatz is cue for a player or a part. It's one of Fig. 1 Examples of Conducting Beat-Pattern important direction by a conductor. Toward the player who should begin his performance, a conductor cues with eyes, left hand, baton or body direction. The left hand is often used in opera for distant singers, but the cue with eyes is the best manner in orchestra performance. Examples of right hand beat-patterns (2-dimensional loci of conducting gesture) are shown in Fig. 1 [M.Rudolf'50]. It shows examples of 2-beat and 3-beat patterns, and their espressivo (rich expression) and staccato (to play notes separately). The beat-patterns have some variations. The Number of Conducting Beats: A meter of a piece and the number of conducting beats in a measure are not necessarily equal. In quadruple time rhythm and duple time rhythm, 8-beat, 4-beat, 2-beat or 1-beat may be used. In triple time rhythm, 6-beat, 3-beat or 1-beat may be used, for example [A.Ohtsuka'66]. Although waltz is triple time rhythm, it is conducted in 1-beat in most cases. Four-quarter (4/4) time is sometimes conducted in alla breve (2/2). Because the number of conducting beats isn't decided only by tempo and meter, and because there are no specified rules, a conductor must tell it to the orchestra musicians in advance [A.Ohtsuka'66]. This often is understood tacitly. The conducted beat number in a measure should be recognized to distinguish a change of the number of beats in a measure from a sudden change of tempo. Table 1 Conducting Elements and Gestures (universal and important elements) Conducting Element Gesture timing beginning a piece One extra beat is directed before beginning of the performance with O "in tempo (constant tempo)" to direct beginning tempo and Dynamik. It corresponds to the breathing. The conductor had better breath lightly. Einsatz (cue) Before the entrance, the cue is directed mainly with eyes to the corresponding part player. O ending a piece The baton is halted clearly, or clicked with the wrist. (cutting-off) O meter and tempo With conducting beat-pattern (N-beat). The first (down) beat in a measure is indispensable. O beat timing (pulse) The moment when the baton passed through the beat point defined on the beat-patterns. O The sounding timing, not always at the lowest point of the pattern, is depend on an orchestra. Agogik Agogik should be directed before the beat when the tempo-change starts. (rit.-accel., tempo rubato) O The number of beats in a measure is changed when the tempo changes widely. ritenuto (to make the At the specific beat, wait a moment without the baton halt. tempo slow suddenly) The baton movement becomes slow immediately after the beat. fermata With forte,the baton is lifted gradually and shaken delicately.With piano,the baton is just stopped. (to pause the music) The end timing of fermata is directed with a cut-off. O Dynamik The size of the gesture pattern. Dynamics: E.g., f, mf, mp, p, cresc., dim., sfz. (forte-piano) O accento (accents) The preparatory-beat is directed one beat before the accento beat. (with breathing in) O And the accento beaten strongly with breathing out. articulation staccato The baton halts at each beat clearly. The gesture is quick and straight. O espressivo (legato) The gesture is curved and smooth. The degree of staccato or legato can be directed by only the baton. O non-espressivo Its gesture characteristic is between staccato and espressivo. O marcato The baton halts at each beat. Its gesture is gentler than staccato. tenuto The baton stops at each beat. It's lighter than marcato. phrasing The conducting gesture force becomes weak at the end of phrase. We do not deal here with a conductor's character, personality, charisma, spirit, and congeniality. Beat Timing: Same as most of preceding research, while the local peak timing of the time-domain acceleration was adopted as the beat timing in this system, all skilled conductors pointed out that the system produced sound too early or the conducting feeling was unnatural. Actual orchestras produce sound with specific delay from the corresponding conducted beat. The delay is understood tacitly among the conductor and the orchestra musicians. Generally, German orchestras respond to a conductor's beat slowly, and American orchestras respond quickly. Among the many factors

Page  00000003 for the delay, the closest is tempo. As the tempo becomes slower, the delay becomes longer. In the trial phase of this research, the relation between the delay and tempo, which provides the most natural feeling to a Japanese conductor, has been investigated with trial playing. The setting in which the sound production is delayed 100[msec] after the recognition process at the tempo M.M.= 50, and no delay inserted at the tempo M.M.= 110 provides satisfaction. In this case, the conductor presumed thaLt this system was a Japanese orchestra. Most unskilled beginning conductors feel that an actual orchestra responds slowly. Since an inexperienced conductor often tends to follow the orchestra sound, his performance tempo often becomes slower gradually [S.Usa and Y.Mochida'98.7]. Articulation of the performance are directed with the beat-pattern (gesture locus) between the beat points. Autonomy of an orchestra The orchestra musicians can perform well without always staring at the conductor. Because both the conductor and the orchestra know the musical score and the timings should be directed e.g. beginning and ending (cut-off) of a music piece, Agogik, Einsatz before an important entrance. The orchestra obeys these directions completely. On the other hand, there are cases in which conducting is not necessary: e.g. a section in which both tempo and musical feeling are constant in a march. The autonomy of orchestra, therefore, varies widely even in one piece. Breathing is important not only for wind instruments player or singer, but all music performer. It's taught in some department of conducting of musical college. Even H. ViKarajan practiced breathing intentionally with his orchestras [R.Chesterman'90]. Breathing in at extra beat or at preparatory of attack, and breathing out at down beat, for example. 3 Beat Recognition Process 3.1 System Composition The machine recognition of the conducting is difficult. Because, even for the same intention, both the time-domain structure and the wave characteristics vary depending on many factors, e.g. the conductor, and on the connection between previous or latter parts. To cope with the variation and to recognize the conducting intention, the softcomputing technique has been applied [S.Usa and Y.Mochida'98.8]. The system composition is shown in Fig.2. Since the right hand conducting gestures are defined as the 2-dimensional beat-patterns shown in Fig. 1, two orthogonal electro static accelerometers (TOPRE:TPR7 OG - 1 00) are attached to a baton. The accelerometer' s raw agent-2 i la ifl tern o MIDI Player,i/f G-Seinso agent-i beat occurrence tempo, S Recognition expectation. agent-4 dynamics, L 03 with HMM. _________Integration play/stopH beat tilmng, of each control C ~ G-Seso beat number, get3agents' Einsatz, breath 0> X ~ articulation. resnn fI output. checking on the Baton beat positionL * HM M~::::-:~ii on the score. Jmeter, etc.Ilui r2Ia t tcigaradteto (MIDI & Control Marks) Fig.2 System Composition of the Conducting Symulator output includes unnecessary frequency component. They interfere with the gesture chaLracteristic detection for the conducting recognition. Therefore the raw output is passed a band pass filter. The examples of the filtered output of the accelerometers are shown in Fig.3. They show careful conducting in accordance with the beat-pattern in Fig. 1.

Page  00000004 Einsatz by eyes is detected with a goggles type eye camera (TAKEI-Kiki-Kogyo ophthalmograph TKK2901). It detects 1 [G] the users watching point in every 16 [msec]. The user conducts [G with gazing an orchestra image on a 29 inches monitor display..... set at 1 meter distant from the user. The computer detects an 1 2 [sec] -1 +1[G] orchestra part gazed by the user. (a) 2beat: espressivo There are some breath sensing methods. But a thermistor set in a nostril can't detect breath through a mouth. The MIDI 1 +I 1[G] breath controller can't detect breathing in. The system, [G therefore, adopts an elastic sensor (NIHON KODEN: breath. i. x pick-up TR-751 IT) which detects vary of users chest girth. 1 2 se] -1 +1[G] A series of the process including A/D conversion, band pass (b) 2-beat: stacato filtering, recognition calculation, and music data replaying is repeated every 10[msec]. The beat recognition without pre- 1. process needs only 20[msec]. The MIDI format music data are [G replayed in accordance with recognized conducting. The x replayed MIDI data were changed into sound with a tone 1 2 [sec] 1 0 +1[G] generator YAMAHA MU-80. (c) 3-beat: espressivo 3.2 Beat Recognition Process (an Induction based on Actual Musicians' Recognition) I p,! - G 1 1 +1[G] The beat recognition process has build based on interviews [G I with conductors and orchestra players. (a) The orchestra \ x musicians follow the conductor more easily if they know the 1 2 [sec] 1 [G] performing piece, (b) still easier if they are accustomed to the (d) 3-beat: stacato piece. (c) Usually, both the conductor and the orchestra musicians have the information about the playing piece in Fig.3 Filtered Output of 2D Accelerometer advance. (d) They are playing with expectation of coming left:absolute time-domain wave. right:2D loca for a cycle beats and the beat number in a measure. (e) Therefore, musicians can perform even if they are not always watching the conductor. Because of these factors, it is believed that orchestra musicians recognize conducting not only with (I) watching conducted beat-pattern or gesture recognition in the narrow sense, but using (II) beat occurrence expectation based on the playing tempo and (III) the information of playing a musical score. Table 2 The Correspondence of the Conducting Recognition Means human musicians' means objects of recognition means on this HMM (agent-1, gesture recognition in the narrow sense) beat number in a measure, The HMMs, learned in advance, correspond to each beat in a measure. beat-pattern beat timing (including No.l=lst of 2-beat, No.2=2nd of 2-beat, No.3=lst of 3-beat,... conducted by right hand tempo,Agogik,fermata). local peak timing of absolute acceleration, and peak interval meter, meter in the MIDI data (default), or the key-input by user. articulation, high frequency magnitude, and local valley depth. (staccato-espressivo) dynamics. magnitude of absolute acceleration. beat occurrence beat occurrence timing Fuzzy production rules (agent-2) expectation based on (tempo, Agogik, fermata) If (the time, elapsed since the last beat, based on the playing the playing tempo is short) then (the beat occurrence probability is low). information of meter, Fuzzy production rules (agent-3) musical score beat number in a measure, If (the playing point is close to the beat on the score) then (the probabibeginning & ending, etc. lity that the conducted beat corresponds to the note on the score is high). Fuzzy production rules (agent-4) If (on the agent-1, the likelihood of the HMM for the third beat is large) synthesis of means objects listed above and (on the agent-2, the beat occurrence probability is not low) listed above and (on the agent-3, the probability that the input corresponds to the 3rd beat is high) then (the third beat occurrence probability is high)., etc. The recognition process of the system is constructed on the model of human musicians' recognition. Their correspondence is shown in the table 2. It consists of processing units, called agents, corresponding to the above

Page  00000005 factors. Each agent has a simple function: (i) agent-1 recognizes gesture with HMM, (ii) agent-2 expects beat occurrence based on the playing tempo, (iii) agent-3 calculates distribution of beat occurrence possibility on the playing musical score, and (iv) agent-4 synthesizes each agent output. 3.3 HMM Recognition (agent-1) Beat Number: The HMM is a Markov process with transition probabilities [M.Okochi'86]. The 2-dimensional accelerometers' output are quantized into 32 kinds of labels by their five kinds of characteristic parameters: i.e. the difference between vertical acceleration magnitude at the newest local peak of the absolute acceleration signal in the time domain and one at the peak one beat before, whether the horizontal acceleration magnitude at the absolute acceleration local peak is larger than a threshold or not, present rotation direction of 2D acceleration in the x-y plane, rotation direction at one beat before, rotation direction at two beats before. One of the 32 kinds of label is outputted every 10[msec]. The label stream is inputted to the HMM recognition process. 16 kinds of HMM are set for each of the conducting beat gestures which should be recognized. The gestures correspond to the beat number in a measure: e.g. the fifth HMM is for the third beat of 3-beat. For the input label stream, the likelihood of each HMM are calculated. The recognized result by the agent-1 is the HMM number providing the largest likelihood when the local peak occurs on the time-domain absolute acceleration signal. These HMM are left-to-right and discrete-type without state skipping. They output labels at transition. The definition of the HMM is shown in expression (1). The variance of the gesture characteristic parameters are coped with by the HMM output probabilities, and the time structure variance are coped with by the HMM transition probabilities. Their likelihood are calculated with the forward algorithm [M.Okochi'86]. The likelihood Pr(LIM) that the HMM number M defined by expression (1) outputs a label stream L = 11,12...., T is calculated with the equation (2). The label stream length T varies widely depends on tempo. The HMM were learned in advance with the Baum-Welch algorithm [M.Okochi'86]. All initial HMM before the learning process were the same, and their all output probabilities and transition probabilities were set at the same value. About 100 teacher samples were used for the each HMM learning [S.Usa and Y.Mochida'98.7]. the number of the state the number of the label transition probability output probability initial state probability the number of the HMM a set of the last state N= 5. (state: Si,...,S5) K = 32. (label: 11,12....,132) Pij.: probability of transition from state Si to state Sj. qij(lk).: probability that the label lk is outputted at the transition from state Si to Sj. ri = 0. (i=2,...,5), r = 1.: probability that the initial state is Si. C = 16.: the number of the recognition objects. If = {5: the last state is S5. The definition of conducting recognition HMM Pr(LIM) =?...? rioPioilqioil(ll)pili2qili2(2).. Pi(T-1)iTqi(T-1)iT(lT) iO iiT in If (1) (2) 3.4 The fuzzy production rules (agent-2,3,4: Using the Information of Score and Performance) The fuzzy production rules are written in if-then form as shown in the table 2. The membership function examples for agent-2 are shown in Fig. 4. For fuzzy reasoning, the max-min process is used. For de-fuzzy, the center of gravity is used. They are the most general fuzzy reasoning methods. To recognize the beat finally, agent-4 synthesizes each agents output, i.e. each HMM likelihood by agent-1, the beat occurrence expectation based on playing tempo by agent-2, each beat occurrence possibility on the playing musical score by agent-3. The recognition result is the beat number with the largest possibility at the absolute acceleration local peak. The Number of Conducting Beats: In the MIDI music data, although the meter information is contained, the number of conducting beats in a measure is not written in. On this system default-setting, when the MIDI music data is loaded, the number of beats in a measure is automatically set equal to the meter, and the beat gesture occurrence probability for 1.0 1.0 the next beat timing based on S the playing tempo otime since last beat [msec] 0occurrence probability 1.0 "the time, elapsed since "the beat occurrence the last beat, is short" probability is low" Fig. 4 Examples of Fuzzy Membership Function agent-3 is also distributed on the beat position of the musical score in a fuzzy membership function form as shown in Fig.5(a). To direct note timings except beat timing, the beat gesture occurrence probability should be set manually as shown in Fig.5(b). The number of conducting beats in a measure can be changed arbitrarily by numerical key input even during a performance.

Page  00000006 A preceding research [N.Takeshi, S.Haruyama and T.Kobayashi'96] verified that the HMM technique produces effects to recognize gestures by 1 st 2nd 3rd1 unspecified people. With this system, the 2-and 3-beat recognition has been~ tried only with agent-i HMM recognition. Its beat number recognition rate for careful gestures by (a henmero easi the HMM learning data provider was a measure"="the meter" (b)to direct timing except beat 96.13%, and the rate for gestures by Fig.5 Examples of the Beat Occurrence Probability Distribution on a Score the other people was 92.49%. But, these recognition rates are not enough to play through even a short piece, because it has hundreds of beats. Then, not only with HMM but with agent-2,3,4, the 8 musical pieces in 2-, 3-, or 4-beat have been recognized. The beat number recognition rate for careful gestures by the HMM learning data provider was 99.74%, and the rate for the other people was 98.95%. Using agent-2,3,4, some pieces can be played through without recognition error. Accordingly, for conducting recognition, not only HMM but the music score information and playing performance information are necessary [S.Usa and Y.Mochida'98.7]. Fuzzy reasoning is the suitable solution for conducting recognition because of its strong points. It can (a) easily realize human know-how with if-then form, (b) it can deal with many complicated parameters, and (c) it does not need strict analytic models. 4 Einsatz, brathing, etc. i [S.Usa, Y.Mochida'98.8] ~aE(~ 4.1 Einzsatz with eyes B h gB~ An example of gazing point by a user is shown in Fig.6. The orchestra image is divided into 8 areas. The user watches the oboe part in the ': '~i i~i ~ iiiiii iiii area No.4, one beat before of '`''''` beginning of the famous melody by english-horn in the B~~ symphony No.9 second Iii~i ~ movement by A.Dvorak., The system simulates an orchestra response for Fig. 6 An Example of Gazing Point (The bold white line shows its locus.) Einsatz. Some Einsatz marks The orchestra image is divided into 8 areas (No. 0 7). are wrote in control marks file, in advance. The marks file and the music data file are a pair. An Einsatz mark consists of the area number (0 - 7) and timing which should be watched, and corresponding part (midi channel) number. The user should watch the appropriate part in the orchestra image on the 29 inches monitor at the appropriate timing. When an Einsatz mark is read, if the user's watching point is inside of the area specified in Einsatz mark, the system continues the performance. In other case, the system judges as a mistake of Einsatz, and stops ~~-~ replaying only the part specified in Einsatz mark for a ~ measure. ~~ 4.2 breathing i ~~~ A part of score of the Pastral Symphony and an example of corresponding breathing by a user are shown ~i::::::::::::: in Fig.7. Same as Einsatz marks, some breath marks are written in control marks file, in advance. It consists of breathing in or out and its timing. Breathing in at the Fig. 7 An Example of Breathing end of a fermata, and breathing out at the beat of next note, for example. When a breath mark is read, if the output of the breath sensor and the breath mark are the same, the system continues the performance. In other case, the system judges as an incorrect breathing, and stops replay. 4.3 autonomy Agogik and fermata Agogik is realized by MIDI music data replay following conducting beat input. To simulate

Page  00000007 the varying autonomy of an orchestra, autonomy marks can be written in the control marks file additionally. An autonomy mark consists of the limits of tempo change, order of low pass filter for tempo calculation, and passing length of replay when the gesture input stops. When the autonomous degree is 0%, the system is glued to the conducting. In this case, when the conducting gesture stops, the system stops replay simultaneously, and waits until the next beat input. A fermata or pause can arbitrarily be prolonged with a halt of the baton. When the autonomous degree is 100%, the system ignores the conducting, and plays with tempo on music data. 4.4 Other Process [S.Usa, Y.Mochida'98.7] Beginning a Piece This system starts a performance with the aforementioned extra beat. The timing and Dynamlik of the extra beat can be confirmed with sound. Delay Insertion The delay is inserted between the moment when the local peak occurs on the time domain absolute acceleration and the moment when the system replays the corresponding notes. The inserted delay is defined as a function of the playing tempo. In the equation (3), delay denotes inserted time[msec], tempo denotes playing tempo (i.e. the number of beat per a minute). Based on the investigation mentioned earlier, coefficient is set at (-5/3), constant is set at (550/3). delay = coefficient * tempo + constant (3) Dynamlik Dynamik is in proportion to the absolute acceleration. Articulation Articulation and nuance of the performance are directed by the beat-pattern between the beats. As shown in Fig. 1, in staccato, the gesture is sharp and the baton almost halts momentarily at each beat. In espressivo, the gesture is smooth and the baton is always moving smoothly. To recognize the degree between staccato and espressivo, therefore, two parameters are detected i.e. the gesture smoothness and the degree of the halt. The general MIDI music data on sale contains musical score information, e.g. notes, Dynamik, tempo, meter, bar, orchestration, instruments and their placement (panning). The preparatory elements should be written in the MIDI music data or control marks file in advance. The control marks addition corresponds to a part of rehearsal information sharing between a conductor and an orchestra. Since the MIDI standard does not define a regulation of articulation parameter, the recognized articulation is ignored for replay. 5 Summary This multi-modal conducting simulator recognizes many conducting elements and behaves like an orchestra. With this system, users can enjoy virtual conducting experience. This trial development of the system is a technical base of a new music performance system. The system has problems yet The eye camera obstructs a part of user's view, needs troublesome adjustment before every use, has low resolution, and is over-sensitive to the user's motion. A gaze point detection method using camera instead of wearing sensors has announced [T.Chino, K.Fukui, Q.Yamaguchi, K.Suzuki and K.Tanaka'98], but it works only in point-blank distance. The breath sensor is also over-sensitive to the user's motion. The user, therefore, is restricted his motion. Sorts of recognized articulation or nuance and their reflection to the sound, and recognition rate are not enough. After the solution of these problems, the system can provide helpful information to conducting students, i.e. conducting gesture, eye point and timing, breathing by a teacher or by oneself. And control marks will be generated automatically from teachers conducting information. Since unskilled conducting students are not accustomed to the specific sounding delay of actual orchestra, the system can be a profitable training system. Acknowledgments The authors greatly appreciate associate professor Mr. Kotaro Sato, department of conducting, faculty of music, Tokyo national university of fine arts and music, for giving various knowledge about conducting and advice based on his performance career and practical instruction background. They also appreciate Mr. Ishimura, the president, Mr. Okumura, Mr. Wachi and Mr. Kato, the director and general manager, Mr. Suzuki, the general manager, and Mr. NagaLi, the senior engineer of Yamaha corp., for giving us the chance to do this research. References P. CaLrosi, "Light Implemented Conductor Baton, Light Baton, Computer Music Department of CNUCE/CNR, Pisa Italy, " ICMA Video Review Vol.#1 R. ChestermaLn "Conductors in Conbersation", Robson Books LTD (1990), (Translated by Nakao, p.44, Yosen-sha,tokyo,1995, in Japanese) T. Chino, K.Fukui, Q.Yamaguchi, K.Suzuki, K.Tanaka," GazeToTalk:A Non-verbal Interface System with Speech Recognition, Gaze Detection, and Agent CG",Interaction98,Inform.Proces.Soc.Japapp 6-6(9Jpnee

Page  00000008 S. Furui, "Speech Recognition," J. of the Inst. of Electr., Inform. and Comm. Engi., 11/'95, Vol.78, No.11, pp.1114-1118 (1995, in Japanese) T. Harada, H. Morita, S. Ohteru, S. Hashimoto, "Control of Musical Performance using A Baton", No.42 meeting of Inform. Proc. Soc. Japan,3N-5,p.1-313-314, (1991, in Japanese) P. Hartono, K. Asano, W. Inoue, S. Hashimoto, "Adaptive Timbre Control Using Gesture," ICMC Proc., pp.151-158 (1994). M. Hasimoto, K. Iwata, S. Nisida, "A Study of Man-Machine Interface in Synthesizer System," J. of Soc. of Precise Engi., JSPE-57/10, pp.118-123 (1991, in Japanese) N. Ide, S. Kato, T. Higuchi, S. Horiuchi, "Musical Information Software with Pen Input -Karajan Kun-," in Electronics Life, Aug.'95 Japan Broadcast Publishing, Tokyo, pp.99-106. (1995, in Japanese) Y. Ito,H.kiyono,"Music Conducting Software Magicbaton", PFU Tech.Rev.,7,2, pp.56-61 (1996, in Japanese) Iwashita, S. Ohonuki, T. Hiuga, T. Hosono, "The study on the Control of a Synthesizer by the Manmachine Interface," College of Science and Tech., Nihon Univ., annual report, H.5, pp.677-678 (1994, in Japanese) H. Katayose, T.Kanamori, S.Inokuchi "Motion Capture Sensor: DigitEye3D, Principles and Fundamental Spec", J. of the Inst. of Electr., Inform. and Comm. Engi.,Vol.J80-D2,No.10, pp.2889-92 (1997, in Japanese) D. Keane, P. Gross, "The Midi Baton," Proc. of ICMC, pp.151-154 (1989). T.Kurokawa, "Non Verbal Interface", pp.31, Ohm-sha, Tokyo, (1994, in Japanese) T. Machover, J. Chung, "Hyper-Instruments: Musically Intelligent and Interactive Performance and Creativity Systems", ICMC, Proc.,pp.186-190 (1989) T. Marrin, J. Paradiso, "The Digital Baton: a Versatile Performance Instrument", Proc.ICMC97,pp.313-316('97) M.V. Mathews, D.L. Barr, "The Conductor Program and Mechanical Baton," Stanford Univ. CCRMA Dept. of Music Report No.Stan-M-47 May pp. -11 (1988). H. Morita, S. Ohteru, S. Hashimoto, "Computer Music System Which Follows A Human Conductor", ICMC Proc., pp.207-210 (1989). H. Morita, S. Hashimoto, S. Ohteru, "A Computer Music System that Follows a Human Conductor," IEEE Computer, Vol.24, no.7, pp.44-53 (1991). A. Ohtsuka, "The Standard Encyclopedia of Music a-te",pp.749-752,Ongaku no Tomo Sha,Tokyo('66,Japanese) M. Okochi, "Speech recognition based on Hidden Markov Models," J. Acoust. Soc. Japan. Vol.42, No.12, pp.936-941 (1986, in Japanese) J.A. Paradiso, "Electronic music: new ways to play", IEEE Spectrum Dec.97,pp.18-30 (1997) C. Roads, "The Computer Music Tutorial", The M.I.T. Press, Massachusetts (1995) M. Rudolf, "The Grammar of Conducting -A Practical Study of Modem Baton Technique-", Shirmer Books, New York, (1950),(Translated by A. Ohtsuka, Ongaku no Tomo Sha, Tokyo, 1968. in Japanese) H. Sawada, S. Ohkura, S. Hashimoto, "Gesture Analysis Using 3D Acceleration Sensor for Music Control," ICMC Proc.,pp.257-264 (1995). H. Sawada, S. Hashimoto, "Gesture Recognition Using Acceleration Sensor and Its Application for Musical Performance Control," Trans. of the Inst. of Elec.,Info. and Comm. Engi.,A,vol.J79-A,No.2,pp.452-459 (1996) H. Sawada, S. Hashimoto, T.Matsushima, "A Study of Gesture Recognition as Human Interface," Interaction 97 of Inform. Proces. Soc. Japan, pp.25-32 (1997, in Japanese) T. Suzuki, "Music and Computer Science," J. Inform. Proces. Soc. Japan,Vol.35,No.9,pp.830-835('94, Japanese) N. Takeshi, S. Haruyama, T. Kobayashi, "HMM-based Human Gesture Recognition," Tech. Report of the Inst. of Electr., Inform. and Comm. Engi., PRMU96-8, Vol.96, No.40, pp.53-59 (1996, in Japanese) T. Tsuneyama, "Computer Orchestra System KARAYAN",INTEC Tech. report No.33,pp.57-69('91,in Japanese) S. Usa, Y. Mochida, "A Conducting Recognition System on the Model of Musicians' Process", Journal of the Acoustical Society of Japan 19-04 (E) (1998.7) S. Usa, Y. Mochida, "A Multi-Modal Conducting Simulator", Journal of Japan Society for Fuzzy Theory and Systems (1998.8, in Japanese)