Time-varying estimation of parameters in rule systems for music performance

Zanon, Patrick; De Poli, Giovanni; Dorigatti, Alessandro

Time-varying estimation of parameters in rule systems for music performance Patrick Zanon, Giovanni De Poli, Alessandro Dorigatti CSC - DEI University of Padova - Italy {patrick,depoli}@dei.unipd.it, excawind@inwind.it Abstract Most performance rule system compute deviations by using a set of weighted rules. In this paper we describe a method of parameter (or weight) estimation, so that the rule system can generate a performance that will be as similar as possible to a human interpretation of a given score. The method allows either a global and a time varying estimation on different time scales. 1 Introduction The analysis of deviations in music performance has led to the formulation of some models that describe their structures, with the aim to be able to automatically synthesize what the player unconsciously adds to the notation of the score. Some system of rules, with different characteristics and various degrees of flexibility, have been proposed, but all of them have the aim of covering a range of "expressive" variations as wide as possible. These rules are developed mostly using the analysis by synthesis or by machine learning algorithms [5]. The most important is the KTH rule system [2]. Different rules can be weighted by the so called k parameters, allowing them to model performances more closely and adapt the rules to different situations. Moreover tuning weights could be used to model some emotional expression [1]. Weighting parameters are normally estimated by a trial and error procedure with analysis by synthesis approach. Friberg [3] attempted automatic estimation using an iterative minimization of a sort of distance between performances, by varying the values of the parameters with the intent of approaching a given performance. However the performance style or strategy can change along the piece. Thus it is interesting to have a time varying estimation of parameters. In this work we will use a suitable pre-Hilbert' space to represent the scores allowing an optimal estimation of parameters on different temporal scales and following their time variations. 2 Estimation Methodology We observed that the considered models start from the nominal performance (a literal or mechanic interpretation of the score) and introduce, in an additive way, duration and volume variations on some notes according to the rules, each of them weighted by a characteristic multiplicative coefficient kj. The idea is to estimate the parameters so that the interpretation, generated by the rule system, will be as similar as possible to the given one (e.g. a human performance), which will be called sample performance. This suggests the way to go: by representing the performances with suitable vectors and formalizing their 'A pre-Hilbert space is a linear space equipped with an inner product. A complete pre-Hilbert space is a Hilbert space. concept of distance with a particular formulation of the Euclidean norm, it will be possible to access to the results of the theory of pre-Hilbert space', and in particular to the theorem of the projection, that is the best approximation in the least square sense. This can be do with some definitions: every performance of n notes was symbolized as a vector p laying in a (3n - 1)-dimensional space P, in which the elements are: n sound intensities (Sound Levels Li), n durations (Di), and n - 1 time intervals between notes (Inter On Set Intervals 1i = Oi+1 - Oi, where Oi is the onset time of the i-th note). The vector space P as it is, does not evidence the variations inserted by the performer or by the rule of the expressiveness model. Therefore, when it is necessary to refer to the deviations from the nominal performance PN we use the symbol Ap = p - PN. With this notation, each rule of the model is represented by a (3n - 1)-dimensional vector rj, whose components are the deviations that would be applied to PN using a unitary parameter as weight. Thus the model can generate the following performances: m Pg = PN + kjrj j=1 (1) where m is the number of applied rules. All pg lay in a vector subspace 7 of dimension m whose base is R = [ri... r. ]. The Euclidean norm should take in account characteristics of the human ear, so several psychoacoustic principles were used. On the base of these studies it has been chosen to measure the variations of intensity in dB and the variations of duration in percentage relative to the nominal duration or inter on set. At this point a criterion about how loudness variations and time variations could be considered together must be selected. Many possibilities were tested, but the best results were obtained by weighting this different kind of variations so as to balance the effects of the just noticeable variations for human ear (we used 0.5 dB on average for sound levels and 5% approximately for the durations). Thus the norm used in our space is: II Ap II S L2 AD2 n-i Al2 21 i A1 Ni i-1N2 a, ALi + aOD: + D a 2 i= i=l i =1 N,i (2) 410 0