Voice Pitch Changing by Linear Predictive Coding Method to Keep the Singer’s Personal TimbreSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact firstname.lastname@example.org to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 404 ï~~Voice Pitch Changing by Linear Predictive Coding Method to Keep the Singer's Personal Timbre Yoshinari SASAHIRA Shuji HASHIMOTO Department of Applied Physics, School of Science and Engineering, WASEDA UNIVERSITY 3-4-1Okubo,Shinjuku-ku,Tokyo 169,JAPAN E-mail:695L0 41 cfi. waseda. ac.jp,shujivax. cfi. waseda. a c.jp Abstract Various approaches have been tried for the pitch changing of musical sounds. These methods are mostly based on the manipulation of the sampling-rate. They work rather well for sound of musical instruments but not for human voices.They change not only tae pitch of the voice but also the timbre,so it is often difficult to guess the identity of the singer by hearing the output of a conventional pitch changer. We must change only the pitch of the singer's voice while preserving its timbre. This paper describes a new method of voice pitch changing by introducing Linear Predictive Coding (LPC) which is used in speech analysis-synthesis data compression. 1 Introduction The problem of voice pitch changing has been researched for years.In an early system, the human voice was changed by altering the rerecording speed of music tape. This method however changes the pitch,timbre and also the time of sound reproduction, so it was not suitable for use in a pure voice pitch changing system. Instead, this method has been used by musicians to effect timbre changes. With the advent of digital machinery and tools,the environment of voice pitch changing was transformed. Today, as seen in most of commercially available systems. The conventional methods for pitch changing are mostly based on manipulation of the sampling-rate. These methods work well for instrumental sounds but not for human voices. They shift the fundamental frequency and also the whole frequency spectrum. In human voices, the spectrum envelope contains information on the phonetic and personal features of the singer.Therefore, we must not change the spectrum envelope in preserving the singer's personal timbre. This is the indispensable condition for more natural voice pitch changing. Recently, several analysis-synthesis methods for sound processing have been reported to treat pitch information and the spectrum envelope separately[Settel,Z. & Lippe,C.],fRupert,C & Michael Warstat]. In this paper, we describe a voice pitch changing system, introducing the Linear Predictive Coding (LPC) method to maintain a singer's personal timbre. 2 Algorithm 2.1 Linear Predictive Coding Linear Prediction method has been widely used in various fields since it was first proposed by N.Wiener [Wiener,N.]. Itakura and Saito[Itakura,F.and SaitoS.] and Atal and Schroeder [Atal,B.S. & Schroeder,M.R] applied this method for the first time to the field of speech analysis-synthesis. In LPC, data prediction is done by the linear combination of the previous data as follows. 404 I 4CMC P R OC EE D]1N G S 1995
Page 405 ï~~i=aSt-- S (1) where ci,x-i and St represents the coefficients of the linear predictor; previous data and predicted data, respectively. The coefficients of linear prediction are determined by the least mean square criterion using the auto correlation function._ of the past data. It arrives at a solution of a set of simultaneous linear equations, shown below. Ro R1 R2... R 11 al Ri R1 R0 "'.." Rp-21 2 R2 R2 R R0 -'-.J-3 a3 -0 (2) -RV_,_ P _-3... R0 a R In speech transmission, the coefficients of the linear predictor and the prediction error et = St -S: called "residual error" are sent separately to the receiving side where the decoding is done to recover the original voice. We can easily restore the original voice by using merely the data of residual error and the coefficients of the linear predictor as follows. p St = - S_ i + e, (3) 2.2 Voice Pitch Changing If the prediction of the sound signal goes well,the power spectrum of residual error will be like white-noise. But the power spectrum of the residual error preserves vibration cycles of the vocal bands, even though the original voice is recovered by the linear combination of 8-10 previous data. Two vital pieces of information for voice pitch changing are extracted in the LPC: one is the spectrum envelope of the original voice, which is represented by the predictor coefficients, and the other is the pitch information of the singer's voice, which is contained in the residual error. By changing the pitch of the residual error, the pitch of the decoded voices can be changed, while the timbre of the original voice can be preserved if we do not change the predictor coefficients. Therefore, the system for timbre preserving pitch changing can be illustrated as Figure 1. [nputS tO Â~ p tct 1 c' Output Figure 1 Voice Pitch Changing using LPC Method I C M C P R OC EE DI N G S 199540 405
Page 406 ï~~3 Experiment 3.1 Basic Experiment First, we tested that the LPC method to recover an original voice perfectly. The specification of the tested voice is as follows. Sampling-Rate 22050Hz Input and Output Data 16bit Order p of the Predictor 16 Tested Voice. male(age 22), vowel TABLE 1 Specification of the Tested Voice Secondly we changed the picth of the residual error and fed to the decoding part of the system to re-synthesize the voice. In the short period of 20-30 milliseconds, the wave form of the human voice is regarded as stationary. So the renewal of the coefficients of the linear predictor was done every 400 data, which corresponds to 18.1milliseconds. 3.2 Experiment of timbre recognition We conducted a survey with 10 people(aged 18-22). The explanations and spectrum figures of voices used in this questionnaire are shown in Table 2 and Figure 2. Voice A vowel"i" voice of a male(the original voice) about 220Hz Voice B vowel"i" voice of another male about 440Hz Voice C pitch changed voice applying the proposed method to Voice A about 440Hz Voice D pitch changed'voice applying the conventional method about 440Hz Voice E vowel"i" voice pronounced by the male of Voice A about 440Hz TABLE 2 Voices used in the questionnaire (a)Voice A I... I. I... I... (c)Voice C (e) Voice E.......... (b) Voice B, f i 1 l i t.... I t, I t t t t 4.) s p. ley 1 1 Scale 4,, Frequency (kHz) Figure 2 Spectrum of Voices used in the ICMC PROCEEDINGS 1995 questionnaire 406
Page 407 ï~~The method of the questionnaires was as follows. At first, subject heard the voices, from Voice A to Voice E,and after that,answered the following questions.The result of the experiment is shown in Table 3. Q1: Select as many voice as you can whose timbre is never Voice A's timbre. Q2: Select one voice whose timbre most resembles Voice A from the rest of the voices. Subject.. 2. T3 4 6 7__ J 9 -1Â~ Tof j? Voice(the original 220Hz voice) - -T -" -t-i-i- - -! - i- - Voice B(theother's 440Hz voice) x x x I'xjxHIp? x? 0i0 i 7 Voice C(our method) '0 o o x 0p0?4o C i3i f2 Voice D(previous method)? X x -X o o x- x x x 0 2 7 11 Voice (,'s 440Hz voice)...x1 o ].Â~ x oÃ~ o p.! peio t5.s 3 i2 0 TABLE 3 The Result of the Experiment of Distinguishing Individual Human Voice These symbol means as follows: 0 This voice has the closest features to "Voice A" among the samples. o It may be the same person's voice of "Voice A" x It cannot be A's voice? Cannot distinguish "Voice D" (pitch-changed voice using conventional method) was clearly recognized as an unnatural human voice. On the other hand, 50% of subjects recognized "Voice C" (pitch changed voice using LPC method) as having the same features as the "Voice A" (the original voice), because the spectrum envelope is preserved as shown in Figure2. 4 Conclusion We proposed a new algorithm of voice pitch changing, and the experimental system based on the proposed algorithm was constructed. The experimental results proved the propriety of the algorithm, and unlike the conventional system, we could determine the identity of the changed voice much more easily. The proposed method can be used not only for adaptive KARAOKE system[Inoue,W et al] but also for a new instrument that is driven by voice. 5 References Settel,Z.& Lippe,C. "Real-Time Musical Applications using FFT-based Resynthesis", Proceedings of the 1994 International Computer Music Conference pp.338-pp.343 Rupert,C,Nieberle & Michael Warstat "Implementation of an analysis/synthesis system on a DSP56001 for general purpose sound processing" Proceedings of the 1992 International Computer Music Conference pp.26-pp.29 Wiener,N. "Extrapolation, interpolation, and smoothing of stationary time series" MIT PRESS, Cambridge,Massachusetts(1966) Itakura,F. and Saito,S." Analysis synthesis telephony based on the maximum likelihood method", Reports of the 6th Int.Cong.Acoust, C-5-5(1968) Atal,B.S. and Schroeder,M.R.. "Predictive coding of speech signals", Reports of 6th Int.Cong.Acoust., 0-3-4(1968) Wataru Inoue, Shuji Htashimoto and Sadamu Ohteru "Adaptive Karaoke System",Proceeding of the 1994 International Computer Music Conference pp.7O-pp.77 ICM C PROCEEDINGS 199540 407