Page  00000512 Multimedia Application of Time Compress/Stretch of Sound by Granulation Takebuini ITAGAKI Donald KNOX D Department of Engineering, Glasgow Caledonian University Cowcaddens Road, GLASGOW G4 OBA, LUE Abstract The technique of time-stretch/compression by granulation was originally proposed for frame-rate conversion of cinematic films by optical method. Various musicians have implemented the same technique as sound manipulation methods. This paper describes an application of sound granulation for variable-speed video playback without transposition of original pitch. 1a. Inrtroduction The technique for time-stretch/compression of sound by granulation (or windowing) was originally proposed in 1946 [Gabor 1946]. The original' aim of the technique was for fr~ame-rate conversion of cinematic films by optical method. Despite the conceptional simplicity, the signal processing demands had restricted the realisation in digital media, until highpower DSP chips and CPUs were cheaply available [Truax 1988 and 1994, Itagaki et al. 1995]. The technique has been applied for sound processing mainly for music comp~osition [Roads 1978, Xenakis 1971]. The technique, however, has not been exploited for the original purpose in digital media. Our group impLi~ments the technique for sound tracks of slow motion and fast forward video. In variable-speed functions of video playback systems, audio output is usually muted during the operation [Watkinson 1994]. If audio were to be output, this results in the transposition of the pitch. For example, in case of a slow motion sequence, as the tape speed slows down by repeating a frame, the pitch of the sound track is lowered. In the opposite case, fast forward, the pitch is shifted upward. These effects usually accompanied with distortion in harmonic contents, due to the pitch shiift, as appeared in the wave table synthesis technique. Using the sound granulation method for tune-stretch/compression, a slow motion and fast forward video sequences can be replayed with sound in respectable speed in original pitch without heavy distortion in timbrel structure. 2. Sound Granulation Sound granulation technique was originally propo sed for narrow bandwidth communication and fr~ame-rate conversion of cinematic films by optical method [Gabor 1946]. The basic idea is quite similar to the quantum-wave theory in physics: sound may be described as a sequence of elementary acoustic elements. It can be seen in cinema and video images where rapid sequence of static images gives an impression of moving objects. Gabor described the elementary acoustic element as "acoustic quanta" or "grain"' - 512 - -512-ICMC Proceedings 1999

Page  00000513 [Gabor 1947]. These hypothesis were later verified mathematically by Bastiaans [Bastiaans 1980 and 1985]. Truax applied the technique for granulation of sampled sound in real-time that involves a process of stretching and compressing the sound in a manner identified as variablerate time shifting [Truax 1988]. Timecompression and expansion by granulation are achieved by extraction of windowed grains from a stream of sound and reordering in time. Baastians called the method as "sliding window" [Baastians 1985], because of the granulation windows' movement in the time domain. an optimal grain size and other granulation parameters, the resultant is very smooth. (Hz) 5000 -2500-..... 1200 -o-..". L-- 22.:. ".: -,,.-, _t.,,, 4000 8000o Time (samples) Figure 3: Original Speech 12000 (Hz) input [ - $:-: ch,t4 I i - 2 - 3_ S 1 /' /'- / c.. Figure 1: Timing Chart for x2 Stretch ch1 -i 5 9 13 ch2 "3 7 -11) Figure 2: Timing Chart for x2 Compression Using a trapezoidal window, a stream of sound is granulated. The degree of timecompression or time-stretch regulates the movement of the granulation window. By overlapping two streams of granulated sound, at the mid-point of the ramp, discontinuity of amplitude is avoided, but phase discontinuity is not, as the granulation method is asynchronous without analysis of the original sound. This is the main artefact of the time manipulation by granulation. By choosing 2000 4000 Time (samples) Figure 4: x2 Compressed Speech 3. Application for Video Basic idea was 'back to the original'. As Gabor developed the granulation technique for frame-rate conversion of cinematic film [Gabor 1946]. However, in variable-speed functions of video playback systems (fast forward or slow motion), audio output is usually muted during the operation [Watkinson 1994]. In other words, the rate of framed image is manipulated, but the speed of sound track is not. As a first step for the implementation, the operations for the speed/rate manipulation are done non-real-time. Firstly, the image and the sound track of a video clip are separated. In commercial video formats, one frame contains sound track of about 30 ICMC Proceedings 1999 - 513 -

Page  00000514 to 40 msec long (PAL 25 frames/sec, NTSC 30- frames/sec). This figure is longer than the optimum grain size for time-manipulation [Itagaki 1998]. Sound granulation using a 10 to 20 msec long window is performed on the sound track in accordance with the method explained above. The picture is manipulated by repeating frames (for stretching) or removing frames (for compression). Thence, the processed sound track and the processed picture are re-united. As we do not operate the interpolation of the picture at a moment, the variety of the stretch/compression ratio is restricted. 4. Summary Initial results suggest that the application of sound granulation for variable-speed video is viable. The granulation technique allows preservation of local signal periodicity and provides an intelligible audio cue source. Our final goal is to develop the above system to provide full professional jog and shuttle control of video with pitch-preserved sound. It is also anticipated that real-time implementation of the system. As the calculation costs of about 20-30 MIPS is required for the two-channel granulation [Bartoo et al 1994, Itagaki 1998], it could be achieved with a single, inexpensive DSP chip. References Bartoo, T., D. Murphy, R. Ovans, and B. Truax. 1994. "Granulation and TimeShifting of Sampled Sound in Real-Time with a Quad DSP Audio Computer System." In Proceedings of the 1994 ICMC (Arhus, DENMARK). San Francisco, CA: ICMA. pp. 335-337. Bastiaans, M. J. 1980. "Gabor's Expansion of a Signal into Gaussian Elementary Signals." In Proceedings of the IEEE. 68 (4): 538 -539. Bastiaans, M. J. 1985. "On the SlidingWindow Representation in Digital Signal Processing." IEEE Transactions on Acoustic, Speech, and Signal Processing. ASSP-33(4): 868-873. Gabor, D. 1946. "Theory of Communication." Journal ofInstitution ofElectrical Engineers. 17(3): 429-457. Gabor, D. 1947. "Acoustical Quanta and the Theory of Hearing." Nature 159(4044): 591-594. Itagaki, T., P. D. Manning, and A. Purvis. 1995. "An Implementation of Real-Time Granular Synthesis onto a Multi-Processor Network." In Proceedings of the 1995 ICMC (Banff, Canada). San Francisco, CA: ICMA. pp. 493-494. Itagaki, T., P. D. Manning, and A. Purvis. 1997. "Distributed Parallel Processing: Lessons Learned from a 160-Transputer Network." Computer Music Journal. 21(4): 42-54. Itagaki, T. 1998. "Real-time Sound Synthesis on a Multi-processor Platform." Ph.D Thesis, University of Durham Roads, C. 1978. "Automated Granular Synthesis of Sound." Computer Music Journal 2(2):61-62. Truax, B. 1988. "Real-Time Granular Synthesis with a Digital Signal Processor." Computer Music Journal. 12(2): 14-26. Truax. B. 1994. "Discovering Inner Complexity: Time Shifting and Transposition with a Real-time Granulation Technique." Computer Music Journal 18(2): 38-48. Watkinson, J., 1994. The Art of Digital Video. Boston: Focal Press Xenakis, I. 1971. Formalized Music. Bloomington: Indiana University Press. ISBN: 0-25-4332378-9. -514 - ICMC Proceedings 1999