A Computer Music System for Human SingingSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact email@example.com to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 150 ï~~A COMPUTER MUSIC SYSTEM FOR HUMAN SINGING Wataru Inoue Shuji Hashimoto Sadamu Ohteru Department of Applied Physics, School of Science and Engineering WASEDA UNIVERSITY 3-4-1, Okubo, Shinjuku-ku, Tokyo 169, JAPAN E-mail- firstname.lastname@example.org, ohteru @cfi.waseda.ac.jp Abstract This paper is concerned with an automated accompaniment and modulation system for the human singing. One purpose of our system is to produce an adaptive accompaniment to follow the singing in real time, so that singers can change the singing tempo according to their own emotional feelings. Another purpose is to control the pitch of the singing voice in real time. When a singer sings out of tune, the system monitors the pitch of the singing and adjusts it to the correct one. 1. Introduction In the past ten years, we have been engaged in real time man-machine interface for musical performance[Morita et at, 1991 ][Sato et aL, 1991][Harada et al, 19921. As music is the art on time axis, real time processing is a supreme order in the computer music performance system. Among a lot of interesting investigations, one of the most serious cases where the real time processing is required, may be that of automated accompaniment. Many attempts have been reported on this matter[Dannenberg, 19841[Dannenberg and Mont-Reynaud, 1987][Naoi et al., 1989][Wake et al., 1992][Horiuchi et at, 19921. In most of these systems, the melody line inputs are made by a keyboard played by performer, to make the system free from physical sound signal processing such as the pitch detection. And they can focus on the problem of how to generate the accompaniment in the adequate timing with the suitable arrangement. It may certainly be effective to allow acoustic sound inputs for the application of computerized musical accompaniment. However, the work on real time acoustic sound processing for automated accompaniment have not been reported so much[Venoe, 1984][Vercoe and Puckette, 1985]jakeuchi etal., 19931, although there are many works on sound processing in the field of acoustics[Yamaguchi and Ando, 1977][Niihara et al., 1986][Kuhn, 19901. Recently some Pitch-to-MIDI equipments have become commercially available. However, they are not sufficient, especially for the vocal sound input. As the human voice is more complicated than the instrument sounds, we have to overcome abundant problems. On the other hand, an accompaniment system called "Karaoke" developed in this country, a sort of MMO(Music Minus One), has become popular throughout the world. However, in the Karaoke system, singers must sing in the tempo of the recorded accompaniment, that is, they cannot sing in their favorite tempo. In this paper, we introduce a computer music system for the human singing which is composed of two parts: an automated accompaniment part[Inoue et al., 19921 and an automated pitch modulation part[Inoue et al., 19931. The system accompanies your song following your favorite tempo, and also compensates for your voice when you sing out of tune. 4A.3 150 ICMC Proceedings 1993
Page 151 ï~~Establshed Tempo MII NSTRMENT =Accompaniments = SPEAKER Correct Singingu f Adaptive Figure 1: Rlow diagram of the system. SPEAKER P Accompaniments = 2. System Configuration The system hardware consists mainly of a digital signal processor unit (Mitec Corp. MSP77230 with NEC DSP chip # PD77230), a digital sound processor (YAMAHA Corp. SPX901), a MIDI instrument(E-mu Systems, Inc. Proteus), and a personal computer (NEC PC-9801) to control the entire system. The scores of the melody, accompaniments and the lyrics of the song are effectively maintained as a knowledge base in the system software. Figure 1 shows a flow diagram of the system. The system first extracts the singing pitch as a MIDI note number with the help of the DSP unit in real time. In order to automatically accompany the singing, the personal computer gets the information about the tempo, measure, and key of the singing by comparing the MIDI note number of the singing with the melody score. Thus, it starts to follow the song according to the detected measure and key. The accompaniments are adaptively generated with MIDI instruments according to the singing tempo. In order to automatically compensate for the singing voice in real time, the personal computer detects the dif ference between the MIDI note number of the singing and the melody score. The pitch of the singing voice is modulated to the pitch of the melody score by the digital sound processor according to the detected difference. 3. Singing Voice Analysis The DSP extracts the fundamental frequency of the singing voice by the "peak to peak" method. The singing voice is quantized to 12 bits digital data with the sampling rate of 20kHz. To improve the precision of the pitch extraction, the singing voice is filtered through a lowpass filter. Figure 2 shows the example of the filtering the singing voice. Peak detection is performed only on waves exceeding the threshold to eliminate the small peaks due to the harmonics. measurin Threshold 10 Time msec. 20 (b) Output through the lowpass filter Figure 2: Filtering the singing voice. 30 ICMC Proceedings 1993 151 4A.3
Page 152 ï~~Table 1: The singing pitch and MIDI note number. Tone MIDI Note Fundamental Range of Name Number Frequency Hz Frequency Hz C 48 130.81 127.14- 134.70 C# 49 138.59 -142.71 D 50 146.83 -151.20 D# 51 155.56 -160.19 E 52 164.81 -169.71 F 53 174.61 -179.81 F# 54 185.00 -190.50 G 55 196.00 -201.83 G# 56 207.65 -213.83 A 57 220.00 - 226.54 A# 58 233.08 - 240.01 H 59 246.94 -254.29 C 60 261.63 -270.91 The detectable fundamental frequency range is approximately from 80 Hz to 1,000 Hz, which covers bass to soprano. The obtained pitch is set to the nearest halftone interval and is sent to the personal computer as a MIDI note number of Table 1. 4. Automated Accompaniment The score data of the melody and accompaniments consist of the MIDI note number, length of the note, MIDI velocity, flag for the chord, and MIDI channel. That is, the data of an event is five byte in total, quite short to provide enough time for processing. The system gets the information about the singing measure of the melody score by monitoring not the absolute but changes in the patterns of the singing pitch; people can be singing a little out of tune. According to this information, the system can predict the tempo of the singing. The tempo of the accompaniments are adjusted to the detected tempo of the singing by the linear prediction method using the man-machine interaction model[Sawada et al., 19921. Thus, the system starts to follow the singing according to the detected measure and key, and adaptively gener ates the accompaniments with MIDI instruments according to the established tempo. In this system, a singer needs not sing from the beginning of the song. He/ she can start from an arbitrary paragraph of the lyrics. If the singer starts to sing in the middle of the song, the system compares the singing pattern with some distinctive parts of the song which are maintained in the system software, and begins to accompany from the recognized paragraph. 5. Automated Pitch Modulation When a singer sings a song in a tempo of the accompaniment generated by the system, the system will monitor the difference between the melody score and the MIDI note number which is extracted by the DSP unit. According to the pitch difference, the system adjusts the singing pitch to the correct one (that is, the pitch of the melody score) by using the pitch change program of the digital sound processor in real time[YAMAHA]. The digital sound processor can change the pitch at a halftone interval with the range of Â~ one octave. The tone color of the singing voice is obviously distorted in the pitch shift modulation of the digital sound processor. To restore the original tone color, we used a filter on the output of the digital sound processor. The function of the system mentioned above is not an automated accompaniment but a real time compensation for singing pitch to help a singer who cannot sing in tune due to his/her unskillfull or the inadequacy of his/her compass. 4A.3 152 ICMC Proceedings 1993
Page 153 ï~~6. Results and Conclusions In the experiments, the system could satisfactorily adjust to the tempo of the human singing. Figure 3 shows one of the results of the automated accompaniment. The automated pitch modulation system could change to the correct pitch although we had some problems remained unsolved with the tone color distortion which could not be sufficiently eliminated by the output filter. Slow Tern The tempo of the singing. The tempo of the accompaniments The phase difference Fast Tempo Time Figure 3: The result of the automated accompaniment. The proposed system can be used not only for automatic accompaniment, but also for vocal music education. Furthermore, the system may promote a new style of vocal music as a real time voice operated instrument. We are now going to introduce speech recognition into our system. Then the system will be able to know the singing tempo by comparing the results of speech recognition with the lyrics of the song, and also know the difference between the singing pitch and the pitch of the melody score according to the pitch extraction. The future system will accompany the human singing with a full range of tempo changes, allowing the singer to sing in a wide compass, from bass to soprano. References [Morita et al., 199l]Morita, H., Ohteru, S., and Hashimoto,S., 'A Computer Music Perfomzer that Fblo aHuman onuc,"/Computer, Vol.24, No.7, IEEE, 1991. [Sato et al., 1991]JSato,A., Hashlmoto, S., and Ohteru,S., "Singing and Playing in Musical irta Space" ICMC Proc., 1991. [Harada et al., 1992]Harada, T., Sato,A., Hashlmoto,S., and Ohteru,S., "Reai Time Control of 3D Sound Space by Gesture," ICMC Prc., 1992. [Dannenberg, 19841Dannenberg,RB., 'An OnLine Algorithm for Real-Time Accompaniment,"IlCMC Proc., 1984. [Dannenberg and Mont-Reynaud, 19871 Dannenberg,RB., and Mont-Reynaud,B., "Fbilowing an Improvisation in Real-Time," ICMC Proc. 1987. [Naoi et al., 1989]Naoi,K., Ohteru,S., and Hashimoto,S., "Automatic Accn Using Real TimeAssigning Note Value, "Convention Record of Acoustical Society of Japan, Spring 1989 (in Japanese). [Wake et al., 1992]WakeS., Kato,H., SaiwakiN., and Inokuchi,S., 'The Session System Reacting to the Sentiment ofPlayer,"Japan Music and Computer Science Society, Proc. of Summer Symposium '92, 1992 (In Japanese). [Horiuchi et aL, 1992Horiuchi,Y., Fujii,A., and Tanaka,H., 'A Computer Accompaniment System Conskdering Independence ofAccompanist," Japan Music and Computer Science Society, Proc. of Summer Symposium '92, 1992 (in Japanese). [Vercoe, 1984JVercoe,B., 'The Synthetic Performer in the Content of Live Performan," ICMC Proc., 1984. [Vercoe and Puckette, 1985]Vercoe,B., and Puckette,M., "Synthetic Rehearsal:Trng the Synthetic Performer," ICMC Proc., 1985. [Takeuchi et al., 1993]Takeuchi,N., Katayose,H., and Inokuchi,S., "VirtualPerformer: Adaptive KARAOKE System," Convention Record of Information Processing Society of Japan, Spring 1993 (in Japanese). [Yamaguchi and Ando, 1977]Yamaguchi,K., and Ando,S., 'App iutin of Short-Time Spectral Analysis to natural Musical Instrument Tones," Journal of Acoustical Society of Japan, Vol.33, No.6, 1977 (in Japanese). [Niihara et al., 1986Niihara,T., Imai,M., and Inokuchi,S., '7Yanscription of Sung Song," Proceedings of the International Conference on Acoustics, Speech and Signal Processing, New York IEEE, 1986. [Kuhn, 99OJKuhnW.B., 'A Real-TimnePitchRecognition Algorithm for Music Applications," ComputerMusicJoumal, Vol. 14, No.3, 1990. [Inoue et al, 1992]Inoue,W., Hashimoto,S., and Ohteru,S., 'Aut rated Axxnponfrnent System for Singing," Japan Music and Computer Science Society, Proc. of Summer Symposium '92, 1992 (in Japanese). [lnoue et al., 1993]Inoue,W., Hashlmoto,S., and Ohteru,S., 'Automated Pitch Modulation System for Singing," Convention Record of Information Processing Society of Japan, Spring 1993 (in Japanese). [Sawada et al., 1992JSawada, H., Isogai,M., Hashimoto,S., and Ohteru,S., "Man-Machine Inercto in Musicvl/Aoorvmpanfrnent System," Convention Record of Information Processing Society of Japan, Spring 1992 (in Japanese). [YAMAHAJ"DIGITAL SOUND PROCESSOR SPX9O II Manua2L" YAMAHA Corp. 1CMC Proceedings 1993 153 4A.3