# Alteration of the Vibrato of a Recorded Voice

Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact mpub-help@umich.edu to use this work in a way not covered by the license. :

For more information, read Michigan Publishing's access and usage policy.

Page 00000186 ALTERATION OF THE VIBRATO OF A RECORDED VOICE ARFIB D.*, DELPRAT N.** * C.N.R.S Laboratoire de M6canique et d'Acoustique, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 09, France <arfib@lma. cnrs-mrs. fr> ** Laboratoire de Moddlisation en M6canique etUniversitd Pierre et Marie Curie, Tour 66, Case 162, 4 place Jusssieu,75252 Paris Cedex 05, France <delprat@ccr.jussieu.fr> ABSTRACT A method of vibrato analysis and resynthesis, based on time-frequency analysis and source resonance modelling, is described. It allows the alteration of the vibrato of a recorded voice independently of the other sound parameters. The different steps of the algorithm are illustrated with time-frequency diagrams and musical examples. L INTRODUCTION We present here a signal processing method for the vibrato alteration of a recorded voice. The musical purpose is to perform intimate transforms in order to change the voice expressivity [1] by removing or accentuating the vibrato without altering the other features of the sound. The vibrato can be easily defined as a frequency modulation of the fundamental frequency which typically depends on two parameters: the extent (depth of the modulation) and the rate (frequency of the modulation) [2]. figure 1: extent and rate of a vibrato Therefore, the vibrato alteration amounts to impose a new frequency modulation law. This implies a pitch detection, the extraction of a vibrato curve and the estimation of the ratio between the new pitch curve relative to the original one. It is possible to reconstruct the sound by only performing a variable time resampling whose increment depends upon the ratio. This method is very well suited for signals whose spectrum has no formant. But in the case of the voice, we have to take into account the effect of the formants structure which induces different amplitude modulations on each harmonic according to the variation of the fundamental frequency. Therefore the formants have to be moved and in order to keep their correct time-frequency localisation, our choice is to move them before the resampling. This has already been done on academics signals, whose synthesis parameters are already known and the challenge now is to perform it on recorded voice. The complete algorithm which has been developed is based on source-resonance modelling [3], time-frequency analysis (Gabor transform) [4] and standard filtering techniques. Its different steps and some used specific techniques will be described in section 2. Finally, musical examples will illustrate the applications of the algorithm to the vibrato modification and to its rate correction in the case of the time-stretching of a solo voice.. THE ALGORITHM Unlike academic signals,, the vibrato parameters and formants localisation of a recorded sound are a priori unknown and a significant analysis process has to be performed before the resynthesis. The whole algorithm which can be decomposed in three main parts is summarised in figure 1. II. I Main steps of the algorithm The algorithm begins with a time-frequency analysis of the voice sound with the help of the Gabor Transform (Short-Time Fourier Transform) [5]. This typically gives two time-frequency diagrams: the phase and modulus diagrams. From the diagram of the modulus transform, a source-resonance separation is performed by means of a real cepstrum method [6] that gives - 186 - ICMC Proceedings 1999

Page 00000187 the spectral envelope (resonance spectrum) and the source signal (excitation spectrum). I -I b) the vibrato curve modelling (fig. 4) I k O A-.-, i - 0^ ' ', - *t,^ ^ y ^ '<':' "^'s.*'h\-",..^ ~.: ^. ',* '";'' /^'r''"'' ^^ ^ ^ '^^^'^ ^.^.~~JSiC-rl^ ii i. **'^ * r^ _J L fig. 3: sonagram of a singing voice (top: signal; bottom: time/frequency modulus) Then the key point of the process is to detect the fundamental frequency of the source signal and to separate it into a pitch curve and a modulation law. At this step, it is possible to model the vibrato curve and to give new values to the vibrato extent, rate or shape. The ratio curve is the ratio between the new to the old pitch curves. The last part consists in the reconstruction of the sound with a new vibrato curve (so a new source signal) keeping the formant structure at the same place. The change of the fundamental frequency is done with the help of a variable resampling, and the formant structure is applied by means of an analysis-synthesis technique (Fourier filtering). Rather than resampling first and applying the formants second, it has been shown that it was theoretically the same and practically better to move the formants in the opposite direction before the resampling. The reason is that in step one we are in the frequency domain so it easy to make the formants inverse move there. Then we can make the time resynthesis and the resampling in the last step. 11.2 Basic tools Let us give now some details on the techniques used for: a) the pitch detection It is performed from the inverse FFT of the source spectrum modulus limited to the positive frequency. This inverse FFT gives an analytic signal [7]. The real part of the signal thus obtained displays a series of peaks which allows to estimate the period of the sound (of course, if it is periodic) and the imaginary part shows a zero-crossing at the same points that allows to confirm the previous detected period value and allows an easy interpolation for its precise value. The curve given by the successive period estimations corresponds to the pitch curve which displays the time and frequency evolution of the pitch. figure 4 It is now possible to estimate the fundamental frequency and its frequency modulation law by considering that the first one corresponds to the mean frequency and the second one to the oscillatory part of the curve. Like in the sourceresonance filtering, the problem concerns the information selection (on one hand, the vibrato oscillations and on the other, the pitch curve). In particular, this means that the choice of the filters cut-off frequency is of great importance. As an example, figure 5 shows the vibrato extraction of a voice note. measured pitch curve separated pitch curve 46- Ca separated vibrato curve figure 5 ICMC Proceedings 1999 -187 -

Page 00000188 To model the vibrato curve, we need to display independently the rate and the extent of this curve. An analytic signal is constituted with the help of a Hilbert transform, the modulus being the extent and the derivative of the phase being the rate. Then it is possible to model the extracted vibrato curve with specific functions in order to give a new vibrato curve. The resampling is a very basic operation. The step increment, instead of being one, is changed according to the ratio between the new pitch curve to the old one. 11.3 Vibrato modifications Different vibrato alterations can be performed by changing the depth and/or the frequency modulation of the vibrato curve. For example, the alteration of the vibrato extent will accentuate or reduce the vibrato depth. In the same way, its setting to zero will totally remove the vibrato. This alteration follows the step of the separation of the pitch curve and the one of the modelling of the vibrato curve. At this point we have a modulus and a phase value: thus it is easy to reconstruct a new vibrato curve and a new pitch curve. W;: Tom....., modulus and phase of the analytic signal figure 6 c) the sound resynthesis At this stage, two operations have to be performed: the formants move and the variable resampling. accelerated and slowed vibratos figure 7 As we have applied a short-time Fourier transform and a source-resonance separation with the help of a cepstrum technique, the formants move consists in modifying the formants structure along this formula: A(c)= As(w)*Ar(c/ratio)/Ar(ow) where As and Ar are respectively the modulus of the source and the resonance. The temporal signal is classically reconstituted by the means of an overlap and add technique with the new modulus and phase which have been preserved. figure 8: change of the vibrato rate It is also possible to change the vibrato rate. The process is not as simple as the previous one. In effect, the analytic signal associated to the frequency modulation signal has to be calculated with the Hilbert transform and its phase has to be multiplied with a factor, proportionally to the new rate. This can give rise to two artefacts. The first one depends upon the correct separation between the modulus and phase during the analysis process of the vibrato curve. In effect, if ever there still remains a small amount of oscillations at the old vibrato rate in the modulus, this will be audible after the reconstruction. On the other hand, vibrato is not an independent process for music perception and not so easy to synchronise with the flow of notes. - 188 - ICMC Proceedings 1999

Page 00000189 However the technique of vibrato rate modification is particularly well suited for the time-stretching of a solo voice. Although the time-frequency algorithms to slow down a sound (without transposition) give very good results for speaking voices [8],[9],[10], the psycho-acoustical rendering is not satisfactory for singing voices with vibrato. The fact is that due to the global transformation, the vibrato rate is also divided and the result do not sound very natural (for example, a vibrato of 6 cycles/second becomes a vibrato with only 3 cycles/seconds when slowed by a factor of two). We can think about the rectification of the vibrato after the time-stretching process but it has been proved that it was better to change the vibrato first and then perform the timestretching. 11.4 THE "NIGHTINTALE SONG" Vibrato extraction and modification of a real signal. Application to the time-stretching We have worked with a recorded voice (an excerpt of a Russian song, called "the Nightingale", rendered by Nathalie Dessay, a soprano voice). The figures that have been shown in this article come from a single note from this excerpt. The sound examples that will be presented are the following: - the removal of the frequency vibrato - the compression and extension of the extent of the vibrato - the change of rate of the vibrato - the rectification of the vibrato for timestretching processing IV. CONCLUSION The selective modification of a single sound parameter generally needs to take into account several interactions between that parameter and other sound features [11]. Concerning the vibrato modification of a recorded voice, there are a manifold difficulties. First of all, the vibrato curve extraction is related to the timefrequency modulation law detection which is a delicate non-stationary analysis problem [12]. Then the connection between the formnant structure of the voice and the vibrato amplitude of the spectral components imposes strong constraint for the resynthesis. On the other hand, the standard methods we have used have some disadvantages (or limits) which can introduce some estimations errors. For example, the pitch detector is very sensitive to the reverberation and it is not easy to accurately detect the pitch curve; the extraction of the vibrato curve is very dependant upon the good choice of the parameters of the analysis. Some applications of the algorithm have not yet be investigated in particular in the sound morphing area (like the interpolation between two musical renderings). But in any case the study of the vibrato alteration is a interesting step towards a better understanding of the different and seemly independent attributes that govern the musical transformations of sounds. REFERENCES [1] Sundberg J., The science of singing voice, Northern Illinois University Press 1987. (2] Prame E., Measurement of vibrato rate of ten singers, in Vibrato, P.Dejonckere, M.Hirano, J.Sundberg Eds., Singular Publ. Group., San Diego 1995. [3] Oppenheim A., Speech analysis synthesis system based on homomorphic filtering, J.Acous.Soc.Am., 45,1969. [4] Flandrin P., Temps-frdquence, Edition Hermes, 1993. [5] M.R Portnoff Time-frequency representation of digital signals and systems based on short-time Fourier analysis, IEEE Trans. Acoust.Speech, Signal Processing, vol ASSP-28, P55-69, 1980. [6] Oppenheim A. and Schafer R., Digital signal processing, Prentice Hall, Englewood CliffNew-Jersey, 1975. [7] J.Justice, Analytic signal processing in music computation, IEEE Transactions on Acoustics, Speech and Signal, ASSP ni27, pp 179-190,1979. [8] Arfib D., Delprat N., Musical transformations through modifications of timefrequency images, Computer Music Journal, Vol 17 (2), MIT Press 1993. [9] Moorer A., The use of the phase vocoder in computer music applications, J.Audio Eng. Soc., n026, pp42-45, 1978. [10] Arfib D., Analysis, transformations and resynthesis of musical sounds with the help of a time-frequency representation, in Representation of musical signals, Eds. G.DePoli, A. Piccialli and C.Roads, MIT Press 1991. [11] Arfib D., Delprat N., Selective transformations od sounds using timefrequency representations: an application to the vibrato modification, AES 104th convention, AES preprint 4652 (P5-2), amsterdam, may 98. [12] Delprat N., Global frequency modulation law extraction from the Gabor transform of a signal: a first study of the interacting components case, IEEE Trans. Speech and Audio Processing, vol 5, nl01 1997 ICMC Proceedings 1999 -189 -