Time-varying estimation of parameters in rule systems for
music performance
Patrick Zanon, Giovanni De Poli, Alessandro Dorigatti
CSC - DEI University of Padova - Italy
{patrick,depoli}@dei.unipd.it, excawind@inwind.it
Abstract
Most performance rule system compute deviations by using a set of weighted rules. In this paper we describe
a method of parameter (or weight) estimation, so that the rule system can generate a performance that will
be as similar as possible to a human interpretation of a given score. The method allows either a global and
a time varying estimation on different time scales.
1 Introduction
The analysis of deviations in music performance has led to
the formulation of some models that describe their structures, with the aim to be able to automatically synthesize
what the player unconsciously adds to the notation of the
score. Some system of rules, with different characteristics and various degrees of flexibility, have been proposed,
but all of them have the aim of covering a range of "expressive" variations as wide as possible. These rules are
developed mostly using the analysis by synthesis or by
machine learning algorithms [5]. The most important is
the KTH rule system [2]. Different rules can be weighted
by the so called k parameters, allowing them to model
performances more closely and adapt the rules to different situations. Moreover tuning weights could be used to
model some emotional expression [1]. Weighting parameters are normally estimated by a trial and error procedure
with analysis by synthesis approach. Friberg [3] attempted
automatic estimation using an iterative minimization of
a sort of distance between performances, by varying the
values of the parameters with the intent of approaching
a given performance. However the performance style or
strategy can change along the piece. Thus it is interesting
to have a time varying estimation of parameters. In this
work we will use a suitable pre-Hilbert' space to represent
the scores allowing an optimal estimation of parameters
on different temporal scales and following their time variations.
2 Estimation Methodology
We observed that the considered models start from the
nominal performance (a literal or mechanic interpretation
of the score) and introduce, in an additive way, duration and volume variations on some notes according to
the rules, each of them weighted by a characteristic multiplicative coefficient kj. The idea is to estimate the parameters so that the interpretation, generated by the rule
system, will be as similar as possible to the given one (e.g.
a human performance), which will be called sample performance. This suggests the way to go: by representing the
performances with suitable vectors and formalizing their
'A pre-Hilbert space is a linear space equipped with an inner
product. A complete pre-Hilbert space is a Hilbert space.
concept of distance with a particular formulation of the
Euclidean norm, it will be possible to access to the results
of the theory of pre-Hilbert space', and in particular to the
theorem of the projection, that is the best approximation
in the least square sense.
This can be do with some definitions: every performance of n notes was symbolized as a vector p laying in
a (3n - 1)-dimensional space P, in which the elements
are: n sound intensities (Sound Levels Li), n durations
(Di), and n - 1 time intervals between notes (Inter On
Set Intervals 1i = Oi+1 - Oi, where Oi is the onset time
of the i-th note). The vector space P as it is, does not
evidence the variations inserted by the performer or by
the rule of the expressiveness model. Therefore, when it
is necessary to refer to the deviations from the nominal
performance PN we use the symbol Ap = p - PN. With
this notation, each rule of the model is represented by a
(3n - 1)-dimensional vector rj, whose components are the
deviations that would be applied to PN using a unitary
parameter as weight. Thus the model can generate the
following performances:
m
Pg = PN + kjrj
j=1
(1)
where m is the number of applied rules. All pg lay in
a vector subspace 7 of dimension m whose base is R =
[ri... r. ].
The Euclidean norm should take in account characteristics of the human ear, so several psychoacoustic principles were used. On the base of these studies it has been
chosen to measure the variations of intensity in dB and the
variations of duration in percentage relative to the nominal duration or inter on set. At this point a criterion about
how loudness variations and time variations could be considered together must be selected. Many possibilities were
tested, but the best results were obtained by weighting
this different kind of variations so as to balance the effects
of the just noticeable variations for human ear (we used
0.5 dB on average for sound levels and 5% approximately
for the durations). Thus the norm used in our space is:
II Ap II
S L2 AD2 n-i Al2
21 i A1 Ni i-1N2
a, ALi + aOD: + D a 2
i= i=l i =1 N,i
(2)
410
0