Page  110 ï~~Visualization and Predictive Modelling of Musical Signals using Embedding Techniques Jeff Pressing Chris Scallan Neil Dicker Department of Music La Trobe University Melbourne Australia Musjlp@lure.latrobe.edu.au Abstract A standard technique of embedding is applied to musical signals. Foundations An auditory signal A(t) is in one sense a one-dimensional function of time. Yet we accept as axiomatic that this one dimension conceals a greater number of dimensions of perception with respect to speech or music signals. In the case of music, these extra dimensions are formed by pitch, timbre, dynamics, independent parts, and so forth. Hence we may interpret music as a sample of one variable, amplitude, of a more complex system. Under this interpretation, we may model a musical signal using chaotic time series analysis, which has advantages over traditional Fourier or ARMA (autoregressive moving average) techniques for certain kinds of signals. It is most likely to have advantages for signals where the underlying dynamics is that of a strange attractor lying on an M-dimensional manifold (Castagli 1989), but it need not be confined to such signals. A number of methods are available for interpolation and nonlinear prediction, and we have so far focussed on the use of radial basis functions. Chaotic time series analysis can also supply a visual representation of the multidimensional aspects of sound. Central to the technique is Takens' Embedding Theorem (Takens 1981). This says that certain global properties of an N-dimensional signal may be determined by "embedding" any one dimension of the signal. Embedding works as follows. Consider our time series above, now sampled at regular time points tn with to = =nT with 'r = 1/(sampling rate). Takens' theorem states that for almost all times, and some m _ 2M + 1, there is a smooth map f: Rm -.%tsatisfying eqnl1 for 0 < n < co. A([n + m]z) = f(A(n'T),A([n + 1]z),...,A([n + m - 1]z)). (1) There are specific but rather general conditions under which this will hold that we will not discuss here (Takens 1981). This equation means that some global properties of the full Ndimensional system can be found in tuples of the delayed single variable. Hence we may form the vector (m-tuple) =m (A(nTc),A([n + l]Tx),..., A([n + m - 1]T). (2) These m will lie on a set of some dimension DC (in) and this dimension is a measure of the complexity of the sound in question: it is the so-called correlation dimension, which can be estimated as follows. 7A.2 110 ICMC Proceedings 1993

Page  111 ï~~If there are N n 'sin our sound fie, form all the N(N - 1)/2 possible pairs (4~m, 4 7) of such points. Let T 4!n - 47Jfi be the Euclidean distance between each pair of points. Then define Cm(l) tobetheproportionofpairsofpointssatisfying rq <I. This is the correlation integral. From this we compute the correlation dimension, defined by Dc (m) = lim{log Cm (I)/logl} (3) 1-40 In practice, Dc (m) varies with m 6nly until i reaches a certain value m *. It then stabilizes at the true dimensionality of the system, D. Hence the limiting slope (for small 1) of a log-log plot of Cm (1) versus 1 will determine D. We have computed correlation dimensions for a number of sounds and they appear to provide a useful index of complexity. An example is seen in Figure 1, with m = 3. The soundfile here is an excerpt from Steve Reich's Music for a Large Ensemble. The computed dimension is fractal, with D ~ 2.3. Another global parameter characterizing the chaotic time series is the Lyapunov exponent, which indicates the rate of divergence of nearby trajectories in the strange attractor. If this divergence proceeds as e ", then the Lyapunov exponent will be A = lima-1 nIA'(i'r)I. n--* i=o These are also useful in inferring soundfile complexity. Following a suggestion of Gordon Monro, we have applied coloration to plots of embedded sound files. A 2-D plot of A(t + nT) vs A(t)for the sound file can be colorized by applying a color map based on the number of "hits" per pixel. Color maps can be monotonic or recursive. A 3 -D plot of A(t + 2n T) vs A(t + nT) vs A(t) may be colorized by projective drawing and hitsper-pixel coloration, or by a 2-dimensional plot of A(t + n'r) vs A(t) with coloration determined by A(t + 2nT). Shading, texture, and other 3-D parameters can used for m 4. By successive display of embedding frames videos of soundfile embeddings can be constructed. These proceed by successively incrementing the delay, length of file segment analyzed, or analysis starting point, and some seem to us quite beautiful. General Properties of the Embedding Graphs Figure 2 shows a 2-D embedding for a sine tone A(t) = sin(cot) with reverberation, using only black and white due to publication practicalities. The emergent shape is that of a modified Lissajous figure. The main figure is an ellipse oriented along the axes x'y', which are obtained by counterclockwise rotation by n/4. It can readily be shown that the normalized lengths of the X and y' major axes are respectively sin nrco sinn____a2_ a- and b- Slwithfl'-2 41- cos n zr 41+ cos nr wiha+b=2 The coloration function, which is just the probability density in the A(t + ni')-A(t) plane, isFc {(i + Kcoz [1- A2 (t))(1 + Kol 1- A2@( + nz))}. Hence figure ICMC Proceedings 1993 11I 7A.2

Page  112 ï~~shape emphasizes frequency information and coloration emphasizes amplitude information. The effect of reverberation is to produce altered phase copies of the signal, broadening the width of the ellipse. The decay to zero creates the inward spiral culminating in the dense center. For certain delay values, these parallel loops separate. Embedding plots for some other simple waveforms show interesting results. A triangle wave, since A(t + n T) and A(t) are piecewise linearly related, produces a rectangle; a square wave, a square. Combinations of simple waveforms in simple frequency ratios yield a variety of complex trajectories for different delay lengths, one of which is shown in Figure 3. Figure 4 shows the laugh of a 2-year old girl, rich in noise and formants. A Gaussian noise source will produce a joint probability distribution function of the following form: 1(A-t),A2t +t)+A2= 2tcn re p(A(t),At+nT))=e2,where (7 is standard deviation, since successive points on the curve are uncorrelated. Hence the embedding plot will consist of circularly symmetric rings of colour. For a general soundfile, each time segment produces a corresponding embedding space trajectory whose recursive properties are indicated by increased coloration. Retrogressions or symmetric functions of time will produce patterns that are mirrored across the 0 = ic/4 line. Patterns produced by music are complex, variegated and multidimensional. Short transient bursts produce characteristic trajectories. Long soundfiles of complex sources average away significant structure, yielding globular clusters, although greater resolution is possible with higher dimensions. Plots of first and second successive differences (e.g. A(t + T)-A(t)) highlight regions of rapid change. The method has also been extended by first transforming the auditory signal with the STFT, and then applying the embedding to the Fourier transform F (lO), using frequency as the delaying variable. We have in this case also investigated embeddings of the form (F(lo),FQ, o),F(X,2c)) and similarly in four dimensions, where, > 1, with interesting results. Otherwise, the succession of time slice analyses produced from the STFT can be treated as a 2-dimensional signal sampled at the STFT rate, and time delayed to produce a 4-dimensional embedding space, which may be variously projected into 3 dimensions for viewing. References Casdagli, M. 1989. Nonlinear prediction of chaotic time series. Physica D, 35, 335-356. Takens, F. 1981. Detecting Strange Attractors in Turbulence, in: Dynamical Systems and Turbulence, Warwick 1980, Lecture Notes in Mathematics 898 (Springer, Berlin) pp.366-381. 7A.2 112 ICMC Proceedings 1993

Page  113 ï~~Figure 1: Correlation Integral for an Figure 2: C3 sine tone with reverberation. excerpt from Steve Reich's Music for Delay = 60 samples a Large Ensemble. Correlation dimension D 2.3 implies fractal structure. Figure 3: Three sine tones (C3, G3, E4) Figure 4. Laugh of a 2 year old girl. Delay= 112 samples Delay= 1405 samples ICMC Proceedings 1993 113 7A.2