Page  00000538 OLW: An Extension to MSP for the Implementation of Overlap-Windowing Techniques Francois Thibault, Philippe Depalle Music Technology Group, McGill University, Montreal email: thibault,depalle Abstract Many important audio DSP algorithms are based on overlap-windowing techniques. Despite the fact that MSP provides a basic mechanism to handle overlapping windows, there is still a need for an explicit and flexible way to handle such data flow. This paper presents an extension to MSP called OLW (for OverLap-Windowing), which allows an easy implementation of techniques such as STFT, LPC, pitch detection, and formant tracking. In this paper we present the OLW framework that includes OLW objects for data flow transfer, a set of OLW objects dedicated to sound analysis, and a very simple Software Development Kit that allows the implementation of new OLW objects. We also discuss the smooth integration of OLW within MSP, and show that OLW objects behave as standard MSP objects when not used in an overlap-windowing context. 1 Introduction This paper presents an extension to the MAX/MSP environment called OLW (for OverLapWindowing) that allows an explicit implementation of overlap-windowing techniques. Continuous research and development efforts made in conjunction with computing power increase over the last decade have allowed a drastic evolution of real-time music and audio programming environments such as MSP (Zicarelli 1998), jMax (Dechelle et al. 1998), Pd (Puckette 1997), and SuperCollider (McCartney 1998). Despite the availability of such powerful environments to the computer music community, a fairly small number of powerful DSP analysis methods have been developed, likely due to a significant complexity gap that results from a lack of appropriate and explicit overlap-windowing technique management. The current overlap-windowing techniques that are available in MSP fall into two categories. The first consists of programming ad-hoc external objects that hide the overlap-windowing management to the user. This is the case for important third party objects such as fiddle- (Puckette and Zicarelli 1998), cpt (Cirotteau et al. 2001), and lpc- (designed at IRCAM). Although these objects work well, they do not allow access to intermediate steps of the DSP algorithms, and they do not provide the user with the appropriate level of modularity for an easy implementation of various complex signal-processing techniques. The second category implements unconventional approaches using existing MSP tools. Good examples of such designs lead to interesting spectral processing applications (Settel and Lippe 1998), (Dobrian 1999). However, this approach results in complex patches and non-intuitive designs. Furthermore, since these types of patches do not benefit from a flexible data flow management, patches must be rebuilt each time the user changes structural parameters such as the overlap factor. On the other hand, Pd supports overlapwindowing techniques through two built-in objects called block- and reblock- (Puckette 1997). The most recent release of MSP includes the pfft- object (Dobrian 2001), constituting a first step towards using overlap-windowing techniques. However pfftis essentially limited to Fourier-based techniques. In spite of these improvements, a general-purpose approach that would provide more flexible access to signal data flow still lacks in MSP. The OLW framework described in this paper, shows equivalent results to those devised in Pd, using data transfer concepts similar to the ones presented by Wright et al. in the integration of SDIF format within MSP (1999). 2 Overlap-Windowing Technique MSP implements a vector approach to signal processing in which sound signals are processed by blocks of a given size. The block size (called the "vector size" in MSP) can be changed to favor either computation efficiency (with large vector sizes), or low latency (with small vector sizes). Despite careful selection of this parameter, a compromise must be made between the two, depending on the particular demands of the application. For example, for a parameter extracted from a block of 64 samples at a sampling rate of 44.1 kHz, the refresh rate is less than 538

Page  00000539 2 milliseconds, providing sufficient tracking resolution in most cases. However, most signal analysis methods require a much larger frame length on which to operate, often on the order of thousands of samples. This means that a vector size of 2048 (still at a 44.1 kHz sampling rate), updates the parameter every 45 milliseconds, and represents a minimum delay of the same amount for real-time applications. Therefore, for high vector sizes, the rate at which analysis is performed (the reciprocal of the window length) is much too low to yield an analysis time resolution that is appropriate for the tracking of fast time-evolving parameters. This parameter under-sampling may result in aliasing problems. Overlapping windows can simultaneously provide both good analysis rate and large window sizes. The idea is to work with the full window size at a step increment (smaller then window size) which generates redundant data transmission and causes a data flow increase (see Fig-1). This technique is extensively used in signal analysis, and more particularly in the phase vocoder (Kahrs and Brandenburg 1998). r i [ _t Figure 1: An input size of Vs samples (the MSP vector size) is accumulated and N samples (the frame length) are being outputted every I samples (the step increment). Successive windows overlap thus causing an N/I times increase in data flow. 3 OLW Framework 3.1 Introduction The OLW framework proposes a solution that integrates the management of overlapping windows described in the previous section with MSP. In the framework design, two important aspects were considered: the first is the practical implementation of the step increment, given that MSP uses data blocks of fixed vector sizes. The second is to manage the increased data flow that results from the overlapwindowing technique when N is greater than I (as it is usually the case in most DSP algorithms). In order to maintain a coherence between OLW and MSP and to guarantee a smooth transition between the two systems, we have decided to constrain the step increment of the overlapwindowing technique to equal the MSP vector-size. Although this might seem excessively restrictive, it is not problematic as most of the DSP techniques that are developed around overlapping windows provide the user with a final set of data whose cardinality is less than or equal to the step increment of intermediate computations. 3.2 OLW Chain In order to use a patch that includes OLW objects, we must design and maintain an OLW chain that symbolizes the data communication network between OLW objects, and that describes the evaluation tree. Since OLW objects are extended MSP objects, and since MSP already manages the evaluation tree, we have decided to let use MSP to manage the DSP chain, but only in regards to the description of the evaluation tree. Therefore, OLW objects are still connected using MSP patch cords and have names ending with a '~ However, the MSP signal data associated with the MSP patch cords is not used. Instead, OLW buffer references and sizes are propagated along the chain through a mechanism that is described below. 33 MSP/OLW Data Flow Transfer Amongst the list of objects that are provided for the OLW environment, OLWin- and OLWout- are the most important as they 'regulate' the data flow transfers between the MSP contiguous vector size 'world', and the overlapping window 'world'. Standard MSP connection < (contiguous data) OLWin 5T21 OLW framework data connection (overlapping data) Standard MSP connection (contiguous data) Figure 2: The data flow transfer between MSP and OLW is shown in a "bypass" patch that does nothing other than enter and leave the OLW framework. OLWin- buffers data in order to manage the overlapping process and provides the subsequent objects with proper pointer references to the sequences of windows that are delivered every I samples (i.e. every vector size). Note that since OLWin~ is the entry point for an OLW subpart of a patch, it also acts as a 'master object' that activates the OLW data transfer chain on top of the usual MSP DSP chain. When activating the MSP DSP engine, OLWin- sends the configuration information as a 'connect' message out of its second outlet and forces objects connected to its outlets to configure themselves,in turn sending a 'connect' message to their descendants. At the end of an OLW subpart of a patch, OLWout- reduces the data flow from N samples every I samples to I samples every I samples according to several modes. The first of these, called 539

Page  00000540 'rebuffer', simply uses the first I samples from the N given samples, to fill a standard MSP buffer of size I (the vector size). This mode is useful for any object that generates a small set of data from large windows (Ipc analysis, autocorrelation, etc.). The second mode, called OLA, performs the overlap-add method (OLA) (Quatieri 2002) and constitutes the appropriate tool for such tasks as the implementation of the resynthesis section of a phase-vocoder (Quatieri 1989), or SOLA techniques (Kahrs and Brandenburg 1998). Additional modes of extraction/reconstruction such as the Griffin and Lim method (1984), can be implemented in the future. 3.4 OLW object design The integration of the system is demonstrated by the possibility of OLW objects to operate as standard MSP objects when not used in an OLW subpart context. The OLW objects that can also be used effectively in a constant data flow environment, automatically detect if they are used in an OLW subpart of the patch or in a regular MSP object fashion. They then create appropriate references to data buffers that are provided to the dsp_perform method (Zicarelli 1998). Such OLW 'dual' objects are doubly beneficial since they also provide new tools for MSP itself. 4 Example 4.1 Signal autocorrelation Let us now illustrate the description of the OLW framework using a simple signal processing example that highlights the basic structure and functioning of OLW, as well as the versatility of OLW objects used as standard MSP objects. The choice of autocorrelation provides a good example because of its widespread usage in audio applications as the first step for analysis tools such as fundamental frequency estimation (Quatieri 2002), linear prediction analysis (Quatieri 2002), and power spectral density estimation (Oppenheim and Schafer 1989). In order to get a consistent statistical analysis, one has to use a number of samples (i.e. a window size N) much greater than the higher order (K) needed. By choosing a step increment I (i.e. a vector size) greater than K, one can easily implement the autocorrelation process using the OLW framework, and the following sequence of objects (see left side of Fig-3). First, OLWin- extracts windows of N (1024 in the example) samples every I samples (128). In the example, K is chosen to be 65 (note that this is less then I). The OLWwinmult~ weights the window with the specified window shape (i.e. rectangular) and an arbitrary length of 640 samples. Next, OLWautocorrestimates the first K autocorrelation coefficients r (k) for k = 0,..., K. Finally, the OLWout- object sends the first I samples of the window (N) into a 'traditional' MSP I-sized block, before the autocorrelation function is displayed by a MSP scope- object. On the right side of the patch, OLWautocorr- is used in a standard MSP patch, and provides up to I autocorrelation coefficients per block of contiguous data, computed from the same number of samples. Signal Autocorelation In an 'OLW patch' In a standard MSP patch Figure 3: Simple example demonstrating the use of OLW for the implementation of the autocorrelation of a cosine wave in both OLW and standard MSP. When compared to ad-hoc designed external objects (such as fiddle-, lpc~, and cpt~) the proposed OLW modular architecture shows net advantages as it allows developers to factorize common functionalities, using the overlapping window process managed by OLWin-. For example, the same OLWin- object is able to feed several analysis algorithms, thus bringing greater clarity to the implementation and avoiding internal duplications of data. 5 OLW Environment 5.1 Set of objects As OLW has been designed primarily to implement overlapping windows, DSP algorithms for musical applications, and more specifically for analysis tools, it is essential to provide the user with basic analysis methods, implemented as OLW objects. Hence, Fourier transform, linear predictive coding, cepstrum analysis and power estimation are included in the OLW basic set of objects. In practice, the design of a specific musical application often requires additional signal manipulations. Therefore, a collection of standard mathematical functions was selected and rewritten as OLW objects to facilitate the development of OLW abstractions (following the model of MSP abstractions). 540

Page  00000541 5.2 Additional features In order to improve the flexibility and accuracy of the DSP algorithms to be developed, OLW allows the user to implement patches that support multiple window sizes. For example, one can imagine an algorithm that filters a given signal, separating in two signals comprised of low- and high-frequency components, and processes them respectively through large and small-frame sizes OLW subpart. Accordingly, these subparts work at a specific overlap factor to fulfill the unique step increment OLW basic feature. Some DSP algorithms also benefit from lower sampling rate. For example, the development of a real-time controller that uses the first three formants of a singing voice to control a synthesizer does not require the usual audio sampling rates (such as 44.1, or 48 kHz), rather one fourth of these values. Therefore, a mechanism which allows one sampling rate per OLWin-...OLWout- pair is being implemented. This will bring the advantages of multirate systems implementation (Crochiere and Rabiner 1983) to the environment. 53 OLW Software Development Kit For the sake of completeness and generality, a very simple software development kit is available as an application programming interface for handling overlapping windows. It is comprised of the new dedicated OLW structure built on the basic MSP object structure, along with a handful of API functions accessible to the user. In addition to using basic MAX functions (main, new,free, etc.) and MSP functions (dsp_perform, dsp_add, etc.) (Zicarelli 1998), designing OLW objects consists of writing an appropriate configuration method that reacts to the MSP DSP chain activation in order to set the buffers and compute DSP-related variables. The design of a perform method that is actually able to process a specified arbitrary amount of data is obviously required. 5.4 Further improvements Future implementations should allow an arbitrary step increment size even when using several instances of the OLW framework in the same patch. Allowing multiple sample rates as in multi-rate systems is already under development and should provide greater flexibility for the framework. A further extension of the current set of objects implementing other analysis techniques should also be made available to fully demonstrate the design capacity. 5.5 Conclusions In this paper, we presented an extension to the MAX/MSP programming environment called OLW, which allows an explicit implementation of overlapwindowing techniques. This framework has proven to be a highly efficient approach to prototyping and implementing complex DSP techniques that run in real-time under MSP. It is our hope that OLW will facilitate and contribute to the development of powerful DSP techniques such as sound analysis within the MSP environment. References Cirotteau, D., D. Fober, S. Letz, and Y. Orlarey. 2001. "Un Pitch Tracker Monophonique." Proceedings of the Journees d'Informatique Musicale, Bourges, France, pp. 217-223. Crochiere, R. E., and L. R. Rabiner. 1983. Multirate Digital Signal Processing. Upper Saddle River, NJ: Prentice Hall. Dechelle, F., R. Borghesi, M. De Cecco, E. Maggi, B. Rovan, and N. Schnell. 1999. "jMax: An Environment for Real Time Musical Applications." Computer Music Journal 23(3):50-58. Dobrian C. 2001. "MSP documentation." Dobrian C. 1999. "Programming New Real-time Possibilities with MSP." Proceedings of the Digital Audio Effects Conference, Trondheim, pp. 50-54. Griffin, D.W., J. S. Lim. 1984. " Signal Estimation from Modified Short-Time Fourier Transform." IEEE Trans. on Acoustics, Speech, and Signal processing, Vol. ASSP32, No. 2, April 1984, pp. 236-243. Kahrs, M., and K. Brandenburg. 1998. Ed. Applications of Digital Signal Processing to Audio and Acoustics. Dordrecht,: Kluwer Academic Publishers. McCartney, J. 1998. "Continued Evolution of the Supercollider Real-time Synthesis Environment." Proceedings of the International Computer Music Conference. International Computer Music Association, Ann Arbor, Michigan, pp. 133-136. Oppenheim, A. V., and R. W. Schafer. 1989. Discrete-Time Signal Processing. Saddle River, NJ: Prentice Hall. Puckette, M. S., M.T. Apel, and D. Zicarelli. 1998. "Realtime Audio Analysis Tools for Pd and MSP." Proceedings of the International Computer Music Conference. International Computer Music Association, Ann Arbor, Michigan, pp. 109-112. Puckette, M. S. 1997. "Pure Data." Proceedings of the International Computer Music Conference. International Computer Music Association, pp. 224-227. Quatieri, T. F. 2002. Discrete-Time Speech Signal Processing, Principles and Practice. Upper Saddle River, NJ: Prentice Prentice Hall. Settle, Z., and C. Lippe, 1998. "Real-time Frequency-domain Digital Signal Processing on the Desktop." Proceedings of the International Computer Music Conference. International Computer Music Association, pp. 142-149. Wright, M., R. Dudas, S. Khoury, R. Wang, and D. Zicarelli. 1999. "Supporting the Sound Description Interchange Format in the Max/MSP Environment." Proceedings of the International Computer Music Conference. International Computer Music Association, pp.182-185. Zicarelli, D. 1998. "An Extensible Real-time Signal Processing Environment for MAX." Proceedings of the International Computer Music Conference. International Computer Music Association, pp. 463 - 466. 541

Page  00000542 An Algebra for Block Diagram Languages Yann Orlarey, Dominique Fober, St~phane Letz Grame, Centre National de Cri~ation Musicale 9 rue du Garet, BP 1185 69202 Lyon Cedex, France Abstract We propose an algebraic approach to block diagram construction as an alternative to the classical graph approach inspired by dataflow models. The proposed algebra is based on three binary operations of construction sequential, parallel and recursive constructions. These operations can be seen as high level connection schemes that set several connections at once in order to combine two block diagrams to form a new one. Algebraic representations have interesting application for visual languages based on block diagrams and are useful to specify the formal semantic of this languages. 11 Introduction The datafiow approach proposes several well known models of distributed computations (see [2] and [1] for historical papers, and [6] and [3] for surveys) and many block diagram languages are more or less directly inspired by these models. Due to their generality, datafiow have been used as a principal for computer architecture, as model of concurrency or as high level design models for hardware [4], the semantic of these various models can be quite complex and depends of many technical choices like, synchronous or asynchronous computations, deterministic or non-deterministic behavior, bounded or unbounded communication FIFOs, firing rules, etc. Because of this complexities, the task of defining the formal semantic of block diagram languages based on datafiow models is not trivial and the vast majority of our datafiow inspired music languages don't have an explicit formal semantic. Provide a formal semantic is not just an academic question. Because of the great stabil ity of the mathematical language, it is probably the best chance we have to preserve the meanting of our tools over a long period of time, and therefore the musics based on them, in a world of rapidly evolving technologies. To solve the problem we propose a block diagram algebra (BDA), an algebraic approach to block diagram construction as an alternative to the classical graph approach inspired by datafiow models. The idea is to use high level construction operations to combine and connect together whole block diagrams, instead of individual connections between blocks. Having defined a set of construction operations general enough to build any block diagram, the formal semantic can be specified in a modular way by rules, associated to each construction operation, that relate the meaning of the resulting block diagram to the meaning of the block diagrams used in the construction. There are several techniques to describe the semantic of a program. Since we are mostly interested by what is computed by a block diagram and not so much by how it is computed, we will adopt a denotational approach, which describes the meaning of a program by the mathematical object it denotes, typically a mathematical function. Moreover, in order to make things concrete and to simplify the presentation, we will restrict ourself to the domain of real time sound synthesis and processing. 2 Block diagram terms Viewed as graphs, block diagrams are defined by a set of primitive blocks and a set of connections between these blocks. In the algebraic approach adopted here, block diagrams are viewed as terms of a language ID 542

Page  00000543 described by the following syntactic rule: dEll::= bEIB I (di:d2) (d1,d2) (d- ~d2) We suppose elsewhere defined a set lB~ of primitive blocks corresponding to the basic functionalities of the system, and such as for each b E lB we know the number of input ports ins(b) and output ports outs(b). Among these primitive blocks we consider two particular elements called identity '"~ and cut "!". Here is an informal description of these elements as well as the three binary operations of composition we propose. 2.1 Identity"L_"'and Cut "(!" As shown by figure 1 identity '1" is essentially a simple wire and cut "!" is used to terminate a connection. Identity: I' Cut:! Figure 2: (B: C) sequential composition of B and C Figure 3: sequential composition of B and C when k Figure 1: the _ and primitive 2.2 Sequential composition ":" The sequential composition of B and C is obtained by connecting the outputs of B to the inputs of C according to the scheme of figure 2. In its strict version, sequential connection is only allowed if the number of inputs of C is an exact multiple of the number of outputs of B: outs(B) * k ins(C) where k C N*. If k 1 we can simplify the diagram as in figure 3. It is convenient, but not essential in terms of generality of the algebra, to extend the sequential composition to the reverse case where the number of outputs of B is an exact multiple of the number of inputs of C: outs(B) k * ins(C). The inputs of C act as output bus for the outputs of B as in figure 4. Figure 4: sequential composition when outs(B) ins (c) 543

Page  00000544 Another possible extension, but that we are not considering here, when the numbers of outputs and inputs are not related by an integer factor is described by figure 5. Figure 7: (B~C) recursive composition of B and C 2.5 Examples We present two short examples of block diagram description. 2.5.1 Example 1 The example of figure 8 is typical of situation where you have an input stage, several parallel transformations combined together and an output stage. It is described by the following expression: A: (B,C,D): E Figure 5: A second extension to sequential composition 2.3 Parallel composition "," The parallel composition of B and C is notated (B, C). It is represented figure 6. Figure 6: (B, C) parallel composition of B and C 2.4 Recursive composition "'Y Recursive composition, notated B ~ C, is essential for building block diagrams with feedbacks capable of computing signals defined by recursive equations. As shown by figure 7, the outputs of B are connected back to the inputs of C and the outputs of C are connected to the inputs of B. The operation is only allowed if outs(B) > ins(C) and ins(B) > outs(C). For practical reasons we incorporate directly into the semantic of the ~ operation the 1-sample delays (represented by small yellow boxes on the diagrams) needed for the recursive equations to have a solution. Figure 8: Several transformations in parallel 2.5.2 Example 2 The diagram of figure 9 is a little bit more complex to describe. The first step is to rearrange the connections as in figure 10. We see clearly now two places in the diagram where our wires have to cross. So the next thing to do is to describe an "X" block diagram allowing two wires to 544

Page  00000545 Figure 9: a typical block diagram with feedbacks Figure 10: Same example after rearranging the connections cross. The definition is given by the following formula and correspond to figure 11: X =(_,_):(!,_,_,!) Figure 11: The block diagram X = (_,_): (!,_,_,!) allows two wires to cross The diagram is made of two selectors in parallel. The first selector:!,_ selects the second of its two inputs and the second selector: _,! selects the first of its inputs. This technique is easy to generalize to define any n x m matrix of connections by composing in parallel m selectors, each selector being a parallel composition of one and n- 1!. Using X, the definition of the diagram of figure 10 is now straight forward: ((_,Xj_): (BC)) ~X 3 Generality of the BDA As we just said any n x m matrix of connection between two block diagrams of respectively n outputs and m inputs can be represented using m selectors of the form (!,...,!,,!,...!) in parallel. Therefore any acyclic graph can be represented by using an appropriate matrix between each stage of the graph. In presence of cycles all the connections that create cycles can be handled with a ~ operation and the remaining (acyclic) graph with connection matrix. We give here an indirect proof that the BDA can represent any block diagram by giving its equivalence with the Algebra of Flownomials (AoF). Proposed by Gh. Stefanescu [5] the AoF can represent any directed flowgraphs (blocks diagrams in general including flowcharts) and their behaviors. It is based on three operations and various constants used to describe the branching structure of the flowgraphs. They all have a direct translation into our BDA as shown table 1. AoF BDA par. comp. A --B A,B seq. comp A.B A: B feedback AT (A ~ _): (!,_outs(A)-) identity I transposition X (_,_):(!,_,!) ramification Ak n: n*k A identification Vn*k: n Table 1: Correspondences between the algebra of Flownomials and our block diagram algebra. Note: _n is an abreviation that means the composition of n identity in parallel. 4 Well typed terms As we have seen section 2, depending of the number of input and output ports of the blocks diagrams involved, not every operation is allowed. We can formalize these constraints as a small type system. We define the type of a block diagram d to be defined by its number of inputs n and outputs m. We will write d: n -- m to specify that diagram d has type n -- m. The type system is defined by the following inference rules: (prim)b n - b: n -m 545

Page  00000546 (id) "1 -- 1 6.1 Definitions and notations (cut)!:1 -- 0 B:n-- ~m C:m*k-* p k > 1 (seq) (B: C): n -- p B: n mn m *k C: m- p k > 1 (seq')B:C):n-p (B: C): n -+p 6.1.1 Signals A signal s is modeled as discrete function of time s: N --*R For a signal s, we will write s(t) the value of s at time t. We call 5 the set of all signals B: n -- n m C: o -> p (par) (B, C): n+o - m + p B: v+n -- u+m C: u-- v (B ~ C): n - u + m For the rest of the paper we will assume well typed terms. 5 Number of inputs and outputs of a block diagram We can now define precisely the outs() and ins() functions on well typed terms. For outs() we have: 6.1.2 Delayed signals We will write x-1 the signal x delayed by one sample and such that: x-(0) x-1(t + 1) 0 x(t) 6.1.3 Tuple of signals We will write: 1. (xl,...,xn): a n-tuple of signals of ~Sn, 2. (): the empty tuple, single element of So, 3. (xi,...,xn)k: the tuple (xl,...,xn) repeated k times. 6.1.4 Signal Processors We define a signal processor p as a function from a ntuple of signals to a m-tuple of signals: p:;n: - m outs(_) outs(!) outs(B: C) outs(B, C) outs(B ~ C) 1 0 outs(C) outs(B) + outs(C) outs(B) And for ins(): We call IP the set of all signal processors: P U n___ n,mEN ins(_) ins(!) ins(B: C) ins(B, C) ins(B ~ C) 1 ins(B) ins(B) + ins(C) ins(B) - outs(C) 6 Semantic of block diagrams In this section we will see how to compute the semantic of a block diagram from the semantic of its components. We will adopt a denotational approach and describe this semantic by a mathematical function that maps input signals to output signals. 6.1.5 Semantic function The semantic function [[.]]: D - IP associates to each well typed block diagram d E D the signal processor p E IP it denotes. It is such that [d]] = p: ins(d) ___ outs(d) 6.2 The semantic function [[.]] The semantic function [[.]] is defined by the following rules 546

Page  00000547 6.2.1 Identity [[]Ux = x 6.2.2 Cut 6.2.3 Sequential composition case outs(B) * k = ins(C) At a user interface level, algebraic block can be used in block diagram editor as an equivalent textual representation in addition to the graphic representation. They can be used also to simplify and enforce a structured representation of visual diagrams that is easier to follow and understand for the user. Algebraic representations have also the advantage, compared to graph representations, to be easier to manipulate and analyze formally. They can be used as an adequate internal representation for compilers and optimizers that need to do smart things like abstract interpretation, specialization, partial evaluation, etc.. References [[:C]] (x,...,x ) where (yi,...,yp) (Si,...,sm) (y,..,yp) [[B (xi,...,xn) case outs(B) = k *ins(C) [[B: C]](xl,...xn) = where (yl,...,yp) = (S,...,Sm) = 6.2.4 Parallel composit [[B,C]] (xi,...xn,si,...,s where (yi,...,yn (ti,...,t [1] J. B. Dennis and D. P. Misunas. A computer archi(Y1,... Y m) tecture for highly parallel signal processing. In Pro[[C (... k-.meedings of the ACM 1974 National Conference, C~J K ~oSl+'m/)'"" * J=~O(Sm+/'mages 402-409. ACM, November 1974. [[B]] (xi,...,x,) [2] G. Kahn. The semantics of a simple language for ion parallel programming. In Proceedings of the IFIP Congress 74. North-Holland, 1974. o) = (Y,...,Ym,ti,...,tp) [3] E. A. Lee and T. M. Parks. Dataflow process net) = B1]] (xi,...,xn) works. In Proceedings of the IEEE, volume 83, ) = [C1 (si,...,so) pages 773-801, May 1995. -I L..,, v 6.2.5 Recursive composition [[B C]] (xi,...,xn) = (yi,...,m) where (yl,...,ym) = [B](r,...,rXl,...,Xn) (r,...,r) = [[C](Yl1.-1y ) 7 Conclusion The contribution of the paper is a general block diagram algebra, based on two constants and three operations, and its denotational semantic. This algebra is powerful enough to represent any block diagram while allowing a compact representation in many situations. Algebraic representations of block diagrams have several interesting applications for visual programming languages. First of all they are useful to formally define the semantic of the language and, as stated in the introduction, there is a real need for such formalizations if we want our tools (and the musics based on them) to survive. [4] Najjar, Lee, and Gao. Advances in the dataflow computational model. PARCOMP: Parallel Computing, 25, 1999. [5] Gheorghe Stefanescu. The algebra of flownomials part 1: Binary flownomials; basic theory. Report, Technical University Munich, November 1994. [6] Robert Stephens. A survey of stream processing. Acta Informatica, 34(7):491-541, 1997. 547

Page  00000548 Clock Skew Compensation over a High Latency Network Dominique Fober, Stephane Letz, Yann Orlarey GRAME Research Laboratory 9 rue du Garet, BP 1185, 69202 LYON Cedex 01, France [fober, letz, orlarey] Abstract Exchange of time stamped events between different stations raises the problem of the clock frequencies difference as soon as one station try to compensate for the transmission delay and to render the events with a minimum time distortion. We propose a simple, efficient and low cost method to compensate for the clock frequencies difference. This method relies only on regular time stamped packets transmissions and may be used in many cases. It provides good performances to the receiver station in regard of the sender reference time even on a heavily loaded communication channel. It operates also very efficiently on a low latency local network. 1. Introduction In the musical domain, real-time rendering of time ordered events transmitted over a high delay network is a tricky task as time ordering is part of the musical information itself and should be preserved with a maximum accuracy. Obviously, playing the transmitted events at reception time is not a solution as the transport latency variation will introduce an important time distortion which may reach several hundred of milliseconds on the Internet. Therefore, any suitable real-time transmission protocol should include mechanisms to compensate for the latency variation. Such mechanisms are based or equivalent to buffering technique [1, 2, 3, 4] which consists in delaying the events rendering by a fixed value, greater than the maximum latency variation expected. However, if the receiver and sender clock frequencies differ, this buffering technique will sooner or later reach its limit: * if the receiver clock is running faster than the sender clock, the receiver will consume the events faster than produced, exhausting at term the provisions made to compensate for the transport latency, * if it is running slower, the number of buffered events will increase with time, exhausting at term the receiver memory space. For the rest of the paper, we'll refer to the clock frequencies difference as the "clock skew". Clock synchronization has been subject of many works. As it is one of the most basic problem in distributed systems, many Clock Synchronization Algorithms (CSA) have been developed [5, 6, 7] to ensure that physically dispersed process will acquire a common notion of time using local physical clocks and message exchange over a communication network. Beside distributed systems point of view, protocols such as NTP or SNTP [8, 9] have been designed, intended to synchronize clocks over the Internet. Partially based on NTP, time synchronization has also been approached in the context of audio streams [10]. Our proposed method differs from the protocol approaches for the following reasons: * it operates without requiring a master clock, * it doesn't require any transaction to operate and therefore may be used with an existing protocol such as MWPP [11]. The clock skew detection is based on distributed systems algorithms, modified to take account of peerto-peer connections and of the high transmission delay introduced by the Internet. The rest of this paper is structured as follow: section 2 presents the clock skew detection algorithm, section 3 is dedicated to its implementation, section 4 deals with performances issues, and section 5 summarizes and outlines the future developments of this work. 2. Clock skew detection algorithm 2.1 Related technologies In distributed systems, a common approach to establish a common time base consists in using faulttolerant interactive convergence algorithms. These systems are generaly made up of several nodes, located on the same network. The principle of these algorithms is the following: * at regular intervals called "resynchronization intervals", all the nodes of the system exchange their clock values in order to determine a clock correction term. Communication between the 548

Page  00000549 nodes is generaly achieved using broadcast of time stamped messages over the network. * for each received message, a node will compute a clock deviation as the difference between the message reception and transmission dates, minus the current evaluation of the transport latency. At the end of the resynchronization interval, each node will have collected a set of clock deviations, which are then used to compute a clock correction term. Several methods exists, we reused the following: * the Fault Tolerant Midpoint Algorithm (FTMA): operates on a sorted set of clock deviations, it assumes that some nodes of the system are sending faulty values: these faulty values are therefore discarded and the correction term is then computed as the arithmetic mean of the lower and higher remaining values. * the Adaptative Exponential Fault Tolerant Midpoint Algorithm (AEFTMA): operates a smoothing of the correction terms produced by FTMA using exponential smoothing technics, where the weighting factor is dynamically computed according to the current correction term. Compared to these works, our situation constitutes a special case from several points of view: * only two nodes are involved in the skew detection process, * they don't have to agree on a common time base: each node is independantly estimating its clock deviation compared to its peer node, * as the latency variation is used to detect the clock skew, we cannot consider that there are faulty messages. 2.2 Peak latency filtering We have made several transport latency measurements using differents network paths. It appears that this latency globally fits in a constant range, with more or less frequent peaks. The range width and the peaks amplitude may greatly vary depending on the Internet service provider and the network path. 2510 200 ] "150 - 100 - 50 - 0. I E t i C~Ck z~s Ski~w i~ IL " -i:i Ba a: I ~" timwe ~~5m n Figure 1: continuous latency variation measured over 5 mn The figure 1 shows 5 minutes of continuous measurements done over a 42,6 kbps modem line connection via a free Internet service provider, at a time renowned for being a high traffic period. The measurement has been done using time stamped UDP packets sent every 200 milliseconds. The slow latency increase slope is due to the clock skew and detection of the skew may be viewed as measuring this slope. The intended solution consists in peak filtering and in computing the convergence point of the remaining values. Our algorithm, named Peak Tolerant Midpoint Average algorithm (PTMA) operates similarly to FTMA with the following differences: * the sorted FTMA clock deviation vector made up of the n nodes deviations collection is transformed into a sorted latency variation vector made up over time, * the discarded values count is not limited to the k Byzantine tolerance, * the arithmetic mean is computed using all the remaining latency variations (and not only the lowest and highest). Let w be the time window size of the latency variation collection and k be the number of values discarded per window. At the time t, the sorted latency variation vector is ~,=[S,-..6.. 1, ~I i and the average midpoint latency variation is then computed as: LV, = t- w+k/2 w-k Intuitively, the algorithm operation may be viewed as a drastic selection on the latency variations: it tends to discard all the lowest and highest variations; but also as a selection on the variation history as it retains only a discontinuous subset of the sliding temporal window. Applied to the 549

Page  00000550 measured latency variations, we obtain what we call the skew profile. 2.3 Skew profile smoothing However, in a more chaotic transmission context, the skew profile obtained with PTMA is varying significantly around the real clock skew. Therefore we have extended PTMA to the Exponential Peak Tolerant Midpoint Average algorithm (EPTMA) by simply applying exponential smoothing to the PTMA output. Assume that L V is the skew profile value at time t, EPTMA computes the corresponding smoothed value by: L,,, = a.LV, + (1 - a).LV,',T where the weight factor a is such as 0 < a < 1. Choosing a small a value gives more weight to the passed values and minimizes the effect of any pulse while this effect is more persistent. 2.4 Initialization speeding-up Acuracy of the results is highly dependant of the parameters used by the algorithm. In particular, choosing a large value for the temporal PTMA window size W and a small value for the retained values R will increase the long term acuracy but may also delay significantly the first results produced by the algorithm. Therefore, we improved PTMA in order to produce values earlier. Let n be the number of values currently pushed in the window: as long as n is lower than W, we dynamically compute the retained values r as: R r,,= R- (n - W which produces a parabolic curve from 0 to R when n varies from 0 to W. Always to speed up the initialization process, we choosed to use a weighting factor a = 1 for the first value pushed into EPMTA. 3. Implementation Detection of the clock skew requires only to receive packets time stamped with their transmission date at a sufficient rate to guaranty a good acuracy. Input values of the algorithm are latency variations which may be simply measured as follow: * at first packet reception, a receiver host evaluates its initial apparent clocks offset (ACO1) as the difference between the current local time and the packet time stamp. * then for any following packet, the latency variation L V, is: LV, = ACO, -ACOi (n > 1) where A CO is the apparent clock offset for the packet n. Let CORR, be the correction term returned by EPTMA for the packet n, the corrected local time is then: TCORR = T- CORR, where T is the current local time. 4. Performances Actually, the system behave like shown in figure 2: at initialization stage (time 0), the receiver and sender clock offset has a given value (150 for the example) which will continue to move up to the stable stage. Note that on the given example, the clock deviation is zoomed 10 times. &n (r 500 450 400 350 300 250 200 150 100 50 0 ns) cio ck deviation (zoomed x 10) stable stage - | stt ble deviation time 5mn Figure 2: system behavior At the stable stage time, the additionnal clock deviation becomes equal to: K = k + E(eptma) where k is the stable constant deviation and E(eptma) represents the error due to the skew profile evaluation. We choosed to characterize the algorithm performances with both k and e(eptma) values: the k value is important because it is to be added to the provisions made for the latency compensation and the error denotes the stability of the system. The results presented below come from a simulation of the system on a single station, using however both latencies generated by an abstract model of a network path behavior and latencies collected from real network transmissions. 3.1 Internet measurements For the measurments below, we made use of the following parameters: * packets transmission rate: 200 ms * PTMA window size: 200 values * retained values: 20 values * EPTMA weighting factor a: 0.01 First results have been collected using an abstract model of a network path behavior with a clock 550

Page  00000551 deviation of 2 for 10000 time units. Figure 3 illustrates an example of the transport conditions which may be qualified as bad. Sn (ms) 300 200 00 time -7mn Figure 3: example of simulated latencies. Results of 100 successive simulations over 5 minutes are presented by the table 1: it shows the maximum values reached, the average values and the standard deviations. Sn (ms) 1000 900 - peaks over 3s. 800 700 600 600 stable 4tage 2 500 400 300 200 100 time -7mn Figure 4: worst case latency It appears that the system is self-adapting to medium or long term changes of the network behavior: it smoothly switch from one stable stage to another one. Note also that the overall error E(eptma) has always been lower than ~2. 3.2 Local network measurements The algorithm operates also very efficiently on a low latency local network. Table 3 presents the corresponding results. Latency is characterized by a range of 1 ms delay with peaks up to 5 ms. max average std dev k E(eptma) 20.6 _ -1.25 6.79 -0,67 5.06 0.20 Table 1: simulation of a busy network Smax Saverage std dev k E(eptma) 0,7 -0.05 0,69 -0.01 0,02. 0.02 max average Istd dev k E(eptma) 23.25 -0.65 8.11 -0.49 8.32 0.17 Table 2: real world latencies We have also measured the performances using real latencies variations collected on the Internet. The estimated clock deviation was 1 for 10000 time units. Other parameters remain unchanged. Results are presented by the table 2. It appears that in both cases, the system presents a good stability and that the remaining clock deviation may be considered as low in regard of the expected latency over the Internet. Using real world data, we have ignored the worst transport conditions encountered because the latency variation has exceeded reasonable values for a correct time rendering (over several seconds). However, it is interesting to examine the system behavior in this exceptionnal context. Figure 4 presents the clock deviation for one of these worst cases. Note that the deviation deviation is still zoomed 10 times. Table 3: local network performances The parameters used changed from previous experiments: * packets transmission rate: 200 ms * PTMA window size: 30 * retained values: 10 * EPTMA weighting factor a: 0.2 The receiver clock deviation was 2 for 10000 time units. The system shows a very good stability and a remaining clock deviation under one time unit. 4. Conclusion We have proposed a new algorithm to compensate for the clock skew on a high delay network. It runs very efficiently on the Internet but also on a local network. Its presents many advantages compared to previous works: * it doesn't require any transaction to operate and therefore it may be used independantly of the transport protocol, * it doesn't rely on a master/slave scheme: each receiver is independantly evaluating its clock skew relatively to the sender, 551

Page  00000552 * consequence of the previous point: as the compensation scheme is attached to a given realtime stream, a network of any complexity can operate correctly and in particular, a receiver may handle several incoming streams with different clock skews characteristics. This algorithm has been implemented in the form of MidiShare [12] drivers and provides real-time communication over Internet or over a local network to the system client applications. It may be probably more improved: in particular concerning the way to handle the initalization stage (the first seconds of a transmission) which is critical in regard of the future efficiency. Acknowledgements This research was partly supported by the Mil Productions Company (Villefranche/Saone - France). We would like to thank it for its support. References [1] Fober D. Real time, musical data flow on Ethernet and MidiShare software architecture. Proceedings of the ICMC, 1994, ICMA, San Francisco, pp. 447-450 [2] M. Goto, R. Neyama, Y. Muraoka. RMCP: Remote Music Control Protocol. Proceedings of the ICMC, 1997 ICMA San Francisco, pp.446-449 [3] J.P. Young, I. Fujinaga. Piano master classes via the Internet. Proceedings of the ICMC, 1999 ICMA San Francisco, pp.135-137 [4] D. Fober, Y. Orlarey, S. Letz. Real Time Musical Events Streaming over Internet. Proceedings of the International Conference on WEB Delivering ofMusic, 2001, pages 147-154 [5] T.K. Srikanth, S. Toueg. Optimal Clock Synchronization. Journal of the ACM, vol. 34, pp. 626D645, July 1987. [6] M.M. de Azevedo and D.M. Blough, OFault-Tolerant Clock Synchronization for Distributed Systems with High Message Delay Variation, Fault-Tolerant Parallel and Distributed Systems, D. Pradhan and D. Avresky, eds., pp. 268D277, IEEE CS Press, 1995. [7] R. Ostrovsky, B. Patt-Shamir. Optimal and efficient clock synchronization under drifting clocks. Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, 1999, pp. 3 - 12 [8] D. L. Mills. Network Time Protocol (NTP) version 3. IETF, RFC 1305, 1992 [9] D. L. Mills. Simple Network Time Protocol (SNTP) version 4. IETF, RFC 2030, 1996 [10] E. Brandt R.B. Dannenberg. Time in Distributed RealTime Systems. Proceedings of the ICMC, 1999 ICMA San Francisco, pp.523-526 [11] J. Lazzaro, J. Wawrzynek. The MIDI Wire Protocol Packetization (MWPP). Internet-Draft. IETF, 2002, (work in progress) [12] Y. Orlarey, H. Lequay. MidiShare: a Real Time multitasks software module for Midi applications - Proceedings of the ICMC 1989, ICMA San Francisco, 1989, pp.234-237 552