# CONCATENATION AND STRETCH/SQUEEZE OF MUSICAL INSTRUMENTAL SOUND USING SOUND MORPHING

Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact mpub-help@umich.edu to use this work in a way not covered by the license. :

For more information, read Michigan Publishing's access and usage policy.

Page 00000001 CONCATENATION AND STRETCH/SQUEEZE OF MUSICAL INSTRUMENTAL SOUND USING SOUND MORPHING Naotoshi Osaka NTT Communication Science Laboratories* 3-1 Morinosato Wakamiya, Atsughi-shi Kanagawa, 241-0193, Japan ABSTRACT Sound morphing is one of the successful synthesis technologies for recent computer music, and several studies have been reported on this subject. We have proposed a sound morphing algorithm based on a sinusoidal model. In this paper, a correspondence search algorithm is improved, which is core technology of morphing based on a sinusoidal model. Then wider applications of the algorithm are introduced. It can be used as concatenation of two sounds if morphing interval is as short as 100 msec. It is also used as stretch and squeeze of a sound, if the algorithm is applied to two copies from the same sample. Moreover, adding a musical expression to an instrumental sound are discussed. 1. INTRODUCTION Sound morphing is a synthesis technology that continuously interpolates two timbres and becomes one of the important study items in not only computer music but also in speech research. The author has been developing the technology based on both sinusoidal model [1] and physical model [2]. This paper discusses an improved algorithm and wider application of the technology to various sound modifications along with time alignment, such as stretch/squeeze of a sound, and addition of music expression. In image morphing, the first step is to find the corresponding points between the two images. In commercial software, this correspondence is done manually. Similarly in sound morphing, there are many cases when correspondence of represented parameters for two sounds should be found and interpolated. Since so many numbers of parameters are needed in a sinusoidal model, it is not practical to find correspondence manually. Therefore, certain algorithm to find the correspondence is indispensable. However, most of the reported technology avoids to solve the problem. Peak to peak match algorithm proposed by [3] works but is an ad hoc algorithm. Fits and Haken also reported an algorithm based on a sinusoidal representation [4]. Their method divides the frequency bands after normalizing by fo. Sinusoidal match is done only in the correspondent band, and matching is not done across the different normalized frequency band, which limits the combination of sinusoids. Our algorithm faces the correspondence problem, and tries to find an optimal solution. We call it an "optimum partner search problem." In the previous study, there were cases of not being able to find the correspondence when sinusoids were placed in complicated manner between facing frames to be interpolated. The new version can find the correspondence in any kinds of sinusoidal placement. 2. SEARCH ALGORITHM OF CORRESPONDENCE USING DYNAMIC PROGRAMMING The algorithm is modified from the original one [1] so that it finds appropriate correspondence for various patterns of peak displacements among two successive frames. Let xi, (i=0,1,.../-1) and yj, (/=0,1,...,J-1) denote vectors of two groups. I and J are the numbers of vectors, respectively, and usually I is not equal to J. The problem is to decide the combination of close resemblance for xi-yj pairs so that combination is an optimum as a whole. It is solved by adopting a criterion that minimizes total cost among pairs, using dynamic programming. Total cost is denoted as follows: I-1 To (I) = min C(x, Ywi(k)) i=o0 (1) where wi(k) is a window function for x, to reduce the search range of y vectors less than distance dth, and is defined as specific members in y group. Figure 1 shows a window function and cumulative cost function. Vectors of each group are lined up in certain order, practically frequency order. wi(k) is defined as kth candidates of y group within the window, which ith vector of x takes into consideration, and is denoted as an absolute number of y in the order defined above. ni is a total number of y vectors in the window w;. wi (0)= 0, i= 0,1,...I -1 (2) Case D(xi, ywi(k)) ~ dth wi (k)Ej, 1< k< ni, j = 0,1,...,J-1 (3) Otherwize wi (k) = * Currently, with School of Engineering, Tokyo Denki University. E-mail: osaka@im.dendai.ac.jp

Page 00000002 yJ-1 where, Final condition T( (I) = min I) (8) (9) yit+ wi (3)= j+1 yi wi- 1(3)= wi (2)= j yi-1 wi-1(2)= wi (1)= j-1 yi-2 wi-1(1)= j-2 k* represents k index in the preceding window, w11y, to point Ywiyk), and f* indicates how far to look back from the present position i. These are defined recursively as follows: i Y yo Fig.1 Cumulative cost Tk*(i-1) used in the algorithm (l*,k*) - F(w,I,k,n,l) Initial condition l=1 Case i-l<0 k~=0, *=IJ Case n1-1 ~0 (10) (11) Xi Xi-1 Case wi- (1) ~ w. (k) -1 < whi (n,-1) (l' = 1, k* = w 'i-l (w, (k)- 1)) (l*,k*) Case w-1 (n1) < wi(k) -1 ( 1*=1, k = n1t*) Case wi (k) -1< wi- (1) F(w,I,k, n,l +1) (12) Fig. 2 Comparison of k* and k*' acquired from functions F and F' Tk(i) is a cumulative cost function up to the distance from xi to Ywis. Cost function is defined basically in terms of distance between correspondent vectors, and is defined as D(x1,y7~i). However, it also incorporates value of the case when xi cannot find a partner, dunIt. T'o(i,k) is defined as a cumulative cost for previous x vector, xi-1 dealing with up to ywi(k), when xi does not find any correspondent vectors. The dimension of the cumulative cost function T is increased by one (3 dimension), if and only if the other party is not found. Recursive condition is given below: Initial condition T0(0)= d(0), (4) for k = 1,2,...,no Tk (0) = min(D(xo, yw0(k), Tk- (0)) (5) Case i= 0 Case n11 =0 (I*, k*) = F(w, I,k, n,1+ 1) Moreover, To(i)k is defined as: To(0) = d,,it. F"k*'(~*,~ 1 T'0 (i, k)= d,,it x(l* -1)+ min o, otherwise T'o (i -l,k* ) Similar to F, k* and l* are defined as follows: (l*,k*') F' (w,I,k,n,1) Initial condition: l=1 Case i-l< 0,k~0=, /*=I Case nt-1 ~0 (13) (14) (15) (16) Ca for k = 1,2,..., n ka 1 Tk-1(i), k = 1 m Tk (i) = min D(xY, ki(k) ~ Tk*(i - f*) D(xi, w~i(ki)+ 0o(0).se i 2 1 for k = 1,2,.. n. kTk Tk-1(i), k(=1 o T (i)= min D(xiYwi(k) T(I - l*)+ daii D (xi, Yi(k))+ 0 (i - I*, k*) + dazz (6) Case wi-1(1) ~ wi (k) K wi-_(ni-1) * ~=, k*' = w:-1(wi(k))) ( Case w-1 (n-1) K w.i(k) (18) (1', k,') = (1 = 1, k* = nt-t* Case w (k) < w-1 (1) F(w,I,k,n,l +1) (17) (7)

Page 00000003 Case ni1 = 0 (l*,k*)= F'(w,I,k,n,l + 1) (19) X2 XI xo While xi points the kth candidate of y vector in wi; wi(k), k*' is defined as an index when preceding x, Xi-1, points the same y vector, k* remains the same definition. Fig.2 compares both F and F'. dnull is defined appropriately by a user in the same dimension as a distance. Different values of dnull give the different optimum combination. Fig. 3 gives three examples of such a case. Three characteristics of the algorithm are; 1) A distance function to sort vector members in order to use a window function and cost function to find correspondence of x and y vectors do not have to be the same distance function. 2) Total cost from x to y and y to x are not the same and does not satisfy reflexive law, or reflexive law is not proved. 3) When sorting is done using a distance function defined in a window function, optimum combination is derived so as not to cross over the correspondent member, that is, combination of, say, 3-7 and 4-5 do not happen. Fig. 4 depicts applied results of the algorithm to peak to peak match problem. For a) through c), distance function to regulate the window function is defined in one dimension using frequency, as well as the function used to line up peaks ( a) and b)). On the other hand, in c), two dimensional distance function in terms of frequency and amplitude is used in calculating distance of x and ith candidate of a partner, that is, D(x,ywH(k)). In each of a) to c), matching is done to the near candidate, having nothing to do with the umber of members. In c), we can tell that matching is done, taking amplitude into consideration. 3. APPLICATION TO THE CONCATENATION OF SOUND FRAGMENTS AND EXPRESSION ADDITION TO THE SUSTAINED INSTRUMENTAL SOUNDS The algorithm stated in the previous chapter is used for both peak to peak match of the two succeeding frames in sinusoidal model, and morphing of two original sounds. Other applications introduced here are the special cases of morphing when transferring time from one sound to another is very short; concatenation. Three functions are introduced here: 1. Concatenation of two original sounds with different timbres. 2. Concatenation of two original sounds with the same timbre. 3. Stretch/Squeeze of a single sound. For 1 and 2, if morphing is done in as short as about 100 msec, it is no longer morphing, but becomes concatenation perceptually. Fig. 5 shows these functions. yi Case 0 none dnuil< a<b Case 1 - dnuli< 3b-2a Case 2 ----....... dnuit> 3b-2a Fig.3 Different combination results for various values of dnuii (1) depicts a concatenation of two different timbres. This can be applied to speech synthesis. 3.1. Addition of music expression by controlling concatenation interval (2)-1 of fig. 5 depicts a concatenation of two samples with the same timbre. Concatenation of different expression for the same instrument is possible. In string instruments, concatenation timing and interval can control the expression of slurring and bowing. (2)-2 of fig.5 shows another example of a concatenation for different pitch sounds. This can be interpreted as an implementation of portament. 3.2. Stretch and squeeze of sound (3)-1 and 2 of fig. 5 depict the stretch and squeeze of a sound, respectively. Stretch requires two copies of a sample. Concatenation is done for the overlapped interval, and the sound is stretched as long as the time delay of the latter copy. On the other hand, squeezing is done for the truncated copies of an original sound. The former uses the head part of the copy, while the latter uses the tail part of the copy. These functions are useful in music composition when sound resources are limited. So far these technologies have been applied to a real computer music pieces: "Nubatama II", limited sound resources for a tiny baby are used, and sound interval are controlled appropriately by stretch and squeeze. All the functions introduced from (1) through (3), including intrinsic morphing function. 4. CONCLUSION Improved version of a correspondence search algorithm, using dynamic programming, is proposed, which is a core technology of sound morphing based on a sinusoidal model. The algorithm is rather complicated, but works for complicated combinations of sinusoidal patterns for real sounds. Variety of applications besides morphing are introduced, starting peak to peak match of succeeding two frame of a sample, and including music expression addition and sound duration control. Finding other music applications using this algorithm is a topic for the further study.

Page 00000004 Q 600 -xi* 500-- 400 -300--- 200 -0. --------------- -------- ---------- ------- -- ----- -y -z Q Xi a) scalar x and y, rep- b) Another example of c) vector x and y, repreresenting only scalar x and y senting frequency frequencies and amplitude Fig. 4 Examples of peak to peak match Timbre A 2000 ms Timbre morphing 100 ms Timbre concatenation (1) Musically meaningful timber morphing Performance A Performance B Morphing interval 400-100 ms - (2)-i Different performance concatenation Pitch A Pitch B Morphing interval 40000 Portament* 400-10 ms concatenation (2)-2 Different pitch concatenation Sample A Sample A Morphing interval (3)-i Stretching a sample Head part of sample A \Tail part of sample A Morphing interval (3)-2 Squeezing a sample Fig. 5 Various applications of sound morphing 5. REFERENCES [1] Osaka, N. "Timbre interpolation of sounds using a sinusoidal model," Proc. of ICMC '95, pp. 408-411, Banff, Canada, 1995. [2] Hikichi, T. and Osaka, N. "Sound timbre interpolation based on physical modeling," Acoustical Science and Technology, pp. 101-111, Feb. 2001. [3] McAulay, R. and Quatieri, T., "Speech analysis/synthesis based on a sinusoidal representation," IEEE Trans. on ASSP, vol. ASSP-34, No4. pp. 744-754, Aug. 1986. [4] Fits, K., Haken, L., Lefvert, and S., O'Donnell, M. "Sound morphing using Loris and the reassigned bandwidth enhanced additive sound model: Practice and applications," Proc. Of ICMC '02, pp. 393-400, Gothenburg, 2002.