Page  00000001 Inductive Learning of General and Robust Local Expression Principles Gerhard Widmer Department of Medical Cybernetics and Artificial Intelligence, University of Vienna, and Austrian Research Institute for Artificial Intelligence, Vienna email: gerhard@ai.univie.ac.at Abstract The paper presents a new approach to inducing general rules of expressive performance from real performance data via inductive machine learning. A new learning algorithm is briefly presented, and then an experiment with a very large data set (performances of 13 Mozart piano sonatas) is described. The new learning algorithm succeeds in discovering some extremely simple and general principles of musical performance (at the level of individual notes), in the form of categorical prediction rules. These rules turn out to be very robust and general: when tested on performances by a different pianist and even on music of a different style (Chopin), they exhibit a surprisingly high degree of predictive accuracy. 1 Introduction The goal of the research described here is to use computational methods from machine learning and data mining to extract general quantitative models of expressive music performance (i.e., rules for tempo, timing, dynamics, articulation, etc.) from real performances by human musicians. We aim at using large amounts of 'real-world' performances in order to arrive at empirically valid models. In fact, for our investigations we have prepared what is most likely the largest set of precisely measured performances ever studied in performance research. Our motivation is to contribute both new computational analysis methods and new insights to expressive performance research. Previous work (Widmer 2000) has shown that it is indeed possible to find a certain amount of structure in such complex data. In an experimental study with an extended collection of piano performances (the same as used here), various machine learning algorithms learned to predict the performer's expressive choices (e.g., lengthen or shorten a particular note) at a local note-to-note level. It was shown that the learners were able to predict the performer's choices with better than chance probability. However, the learned models were extremely complex. For instance, a decision tree discriminating between accelerando (note shortening) and ritardando (note lengthening) with 58% accuracy had 3,037 leaves (which corresponds to 3,037 classification rules)! This is clearly not desirable as the purpose of the project is to discover intelligible rules that provide new insight. Also, it seems obvious that the level of individual notes is not sufficient as the sole basis for complete models of performance. Some of the performer's choices may be explainable by reference to single notes and their local context, but others may only become predictable if one relates them to higher levels of the musical structure of the piece (or explains them as part of a more abstract expression pattern such as a gradual crescendo). In other words, it is unreasonable to expect that one can find a note-level model that completely discriminates between different categories of performer actions. The new study presented here was motivated by these observations. We still remain at the note level, but develop a new approach to rule discovery. We abandon the idea of complete coverage/discrimination and instead aim at finding partial rule models that 'only explain what can be explained' at the note level. It will be shown that one can discover small numbers of very simple rules that still cover a substantial number (but by far not all) of the instances of a given category of expressive variation (e.g., note lengthening) and distinguish them quite well from the opposite classes (e.g., shortening). (For instance, we will show that 4 simple rules are sufficient to cover 22.89% of the instances of note lengthening in our large data set.) This is achieved via a new data mining approach to automated rule discovery. 2 Data and Target Concepts The data used in the present study consists of recordings of 13 complete piano sonatas by W.A. Mozart (K.279, 280, 281, 282, 283, 284, 330, 331, 332, 333, 457, 475, and 533), performed by a Viennese concert pianist (Roland Batik) on a Bisendorfer SE290 computer-monitored concert grand piano. The piano measurements (hammer impact times and

Page  00000002 pedal movements), together with the notated score in machinereadable form, provide us with all the information needed to compute the details of timing, dynamics, and articulation. The resulting dataset consists of more than 106,000 performed notes and represents some four hours of music. The experiments described here were performed on the melodies (mostly the soprano parts) only, which gives an effective training set of 41,116 notes. Each note is described by a number of attributes that represent both intrinsic properties (such as scale degree, duration, metrical position) and some aspects of the local context (e.g., melodic properties like the size and direction of the intervals between the note and its predecessor and successor notes, and rhythmic properties like the durations of surrounding notes and some abstractions thereof). With respect to expressive performance, the dimensions we are looking at are (local) timing, dynamics, and articulation. In trying to learn expression rules, we will not look at numeric prediction (i.e., exactly how long or loud is a note played), but rather at categorical decisions. The target classes we wish to predict are defined as follows: * in the timing dimension, a note N is assigned to class lengthen if the inter-onset interval (IOI) between the start of N and the next note is lengthened relative to (a) its predecessor and (b) the current tempo (computed over the last 20 events); class shorten is defined analogously; * in dynamics, a note N is considered an example of class louder if it was played louder (i.e, with higher MIDI velocity) than its predecessor, and also louder than the average level of the piece; class softer is defined analogously; * in articulation, three classes were defined: staccato if a note's ratio of performed vs. notated duration is less than 0.8, legato if the ratio is greater than 1, and portato otherwise; we will only try to learn rules for the classes staccato and legato. A performed note is considered a counter-example to a given class if it belongs to one of the competing classes. (Note that by the above definitions, there will be notes that are neither examples nor counter-examples of some concept.) 3 The Inductive Learning and Discovery Procedure From the perspective of machine learning and data mining, the problem is to find partial descriptive models of categories such as 'situations where a note is lengthened' vs. 'situations where a note is shortened'. Descriptive means that the models should characterize classes of situations that are treated in a similar way by the performer, and these descriptions should preferably be simple and musically meaningful. We will use rule-based models and algorithms for learning classification rules from data. Partial means that we do not expect the rules to be able to cover and describe all of (or even a large part of) the instances of a given category observed in the data. We are looking for rule sets that capture only a (possibly small) part of all observations, but describe these in meaningful terms. Moreover, given the nature of our data and target phenomena, we cannot expect to find rules with very high levels of discriminative accuracy - we cannot assume the artist to be perfectly consistent and predictable. The search for partial models necessitates a special approach to rule learning, since this is not the standard type of problem addressed in 'classical' classficiation settings. Consequently, we have developed a new machine learning algorithm named PLCG that is geared towards finding simple, robust classification rules in complex, noisy data. The algorithm and its properties are described and analyzed in more detail in (Widmer 2001a). Let us simply state the general approach here. For each expression dimension (timing, dynamics, articulation), PLCG does the following: 1. Separate the data into subsets according to tempo (slow/ fast) and time signatures (2/2, 2/4, 3/4, 4/4, 3/8, 6/8) of the pieces. 2. Learn partial rule models from each of these 2 x 6 subsets separately. This is done with a rule learning algorithm of the 'sequential set covering' variety (Firnkranz 1999) that was specially devised and implemented for this project. The resulting sets of rules will most likely be overly specific (specialized with respect to tempo and time signature). 3. Merge all these learned rule sets into one large set. 4. Perform a hierarchical clustering of the rules into a tree of clusters of similar rules, according to a syntactic/semantic rule similarity measure specially defined for our application. 5. For each of these clusters, compute the least general generalization of all the rules in the cluster (i.e., a generalization that subsumes all the rules and is no more general than necessary). The result is one rule per cluster. The resulting tree represents generalizations of various degrees of the original rules. 6. From this generalization tree, select those rules that optimize a given trade-off function between coverage (the number of cases covered by the rule) and precision (the

Page  00000003 percentage of covered cases that actually do belong to the rule's predicted class - in other words, the percentage of correct predictions made by the rule). The goal of the entire procedure is to arrive at rules that both cover a significant number of cases (and thus describe a significant 'sub-concept') and are still reasonably accurate in distinguishing positive examples from counter-examples. This is achieved via the process of learning many specialized rule sets (in step 2 above), finding rules in these sets that seem to describe similar sub-concepts (step 4), generalizing these to varying degrees (step 5), and selecting, from these alternative generalizations, those rules for the final model that optimize some criteria concerning coverage and accuracy. Again, more detail can be found in (Widmer 2001a). 4 Experimental Results This section briefly talks about some of the rules discovered in the experiments and gives a quantitative evaluation of the rules along three dimensions: (1) we quantify their degree of fit (coverage and precision) on the training data, and assess their generality by (2) testing them on some of the same pieces performed by a different pianist, and (3) testing them on pieces of a different style (Chopin). 4.1 Some Simple Principles Discovered The PLCG algorithm was run on the complete Mozart performance data set (41,116 notes), for each of the three expression dimensions timing, dynamics, and articulation. The final sets of rules selected (from a total of 383 specialized rules) consist of 6 rules for local timing, 6 rules for local dynamics, and 5 rules for articulation. These seem to represent very general (and mostly very simple) principles; some of them cover or "explain" a surprisingly large number of the pianist's expressive choices. We cannot discuss all 17 rules here; that is the topic of a forthcoming paper (Widmer 2001b). In the following, we will only give a brief description of the six rules for local timing (i.e., for categories lengthen and shorten). A summary of all the rules is given in Table 4. In the following, the rules are listed in the representation language used by the learning system and are briefly paraphrased. The numbers TP/FP following the paraphrase denote the number and percentage of positive examples in the training data correctly covered (true positives TP) and the number of cases where a rule makes an erroneous prediction, e.g., predicts a lengthening when the pianist actually shortened the note (false positives FP); the percentages are relative to the true numbers of positive examples and counterexamples, respectively. w is the rule's precision - the ratio of cases (out of the total number of cases in which the rule did make a prediction) where the rule's prediction was correct; in other words, 7 = TP/(TP + FP). We also give a separate evaluation of the rules on slow and fast pieces, as there seem to be some substantial differences sometimes. Rules predicting a note (IOI) lengthening: In the domain of local timing, there emerged two rules that appear to represent very strong, general principles, along with a third rule that describes an interesting class of situations. The most general discovered rule is RULE TL1: lengthen IF abstrdurcontext = equal-longer "Lengthen the middle note in a "cumulative" (Narmour 1977) 3-note rhythm situation (i.e., given two notes of equal duration followed by a longer note, lengthen the note that precedes the final, longer one)." slow: 1,020 (27.89 %) /153 (2.59 %), w =.870 fast: 1,645 (16.87 %) / 752 (5.14 %), w =.686 all: 2,665 (19.87 %) / 905 (4.40 %), w =.746 This is an extremely simple principle that is also surprisingly general and quite precise, especially in the slow pieces: there, TL1 covers 1,020 cases (27.89 % of all examples of lengthening in the data) correctly, while predicting a nonobserved lengthening in only 153 instances (2.59 % of the cases where the pianist did the opposite). For fast pieces, TL1 predicts a substantial number of positive instances correctly, but makes a higher number of wrong predictions (5.14 %). A second very simple rule that emerged very strongly and that is obviously related to TL1 is RULE TL2: lengthen IF next _durratio < 0.334 "Lengthen a note if it is followed by a substantially longer note (i.e., the ratio between its duration and that of the next note is < 1:3)." slow: 725 (19.82 %) /132 (2.23 %), w =.846 fast: 1,063 (10.90 %) / 439 (3.00 %), w =.708 all: 1,788 (13.33 %) / 571 (2.78 %), S w =.758 A variant of this rule, with substantially higher positive coverage but also more incorrectly predicted negative instances (particularly in the fast pieces), is RULE TL2a: lengthen IF next _durratio < 0.99 & metrstrength <2

Page  00000004 "Lengthen a note if it is followed by a longer note and if it is in a metrically weak position." slow: 1,121 (30.65 %) / 246 (4.16 %), 7r =.820 fast: 1,651 (16.93 %) / 918 (6.27 %), w =.643 all: 2,772 (20.67 %) / 1,164 (5.66 %), w =.704 (In the quantitative experiments to be described below, we will use TL2a for slow pieces only.) Clearly, rules TL1 and TL2/TL2a are strongly related, but they also partly complement each other; taken together, they cover 2,965 (22.11 %) of the positive examples, which is substantially more than either of them covers in isolation. It is remarkable that essentially one simple principle as embodied in rules TL1 and TL2 - lengthen a note if it is followed by a longer one - is sufficient to account for more than one fifth of all the significant lengthenings observed in a large body of performance data. We consider this a surprising discovery that merits some more detailed investigations. Of course, variants of this principle have been observed before. For instance, rule TL1 and rule TL2 cover all the cases of lengthening the last in a sequence of short notes before a (half) cadence that were observed by Palmer (1996) in a Mozart performance, and which were interpreted there as a strategy to draw attention to the upcoming cadence.' In addition to the apparently very strong principles embodied in TL1 and TL2, there was one weaker, less predictive rule that emerged from the learning process. RULE TL3: lengthen IF dir_next = up & int_next > p4 & metrstrength < 2 & int_prev < maj2 "Lengthen a note if itprecedes an upward melodic leap of more than a perfect fourth, if it is in a metrically weak position, and if it is preceded by (at most) stepwise motion (int_prev < maj2)." slow: 95 (2.60 %) / 38 (0.64 %), 7r =.714 fast: 164 (1.68 %) / 94 (0.64 %), 7r =.636 all: 259 (1.93 %) / 132 (0.64 %), 7r =.662 TL3 obviously represents a tendency rather than a strong rule. It is not as clear-cut as TL1 and TL2 and makes a relatively large number of wrong predictions, but it still distinguishes significantly between cases of lengthening and shortening. TL3 appears particularly noteworthy because it turns out to have an interesting parallel in the articulation rules that were discovered. ' The particular timing pattern was discussed by Palmer under the heading Performer-specific Expression. Our discovery shows that at least our pianist applies the same principle, and very consistently so. What TL3 describes is a tendency to slightly delay the target note of an upward leap, by lengthening the IOI occupied by the initial note of the leap. According to the articulation rule AS3 discovered in the same experiment, this seems to usually go along with a slight "staccato" (more appropriately: the insertion of a micropause before the target note of the leap), which amplifies the sense of delay and separation: RULE AS3: staccato IF int_next > p4 & dir_next = up & metr_strength < 2 "Insert a micropause after a note if it precedes an upward leap larger than a perfect fourth and is metrically weak." slow: 307 (6.27 %) / 161 (2.31 %), w =.656 fast: 930 (5.39 %) / 239 (1.99 %), w =.796 all: 1,237 (5.59 %) / 400 (2.11 %), w =.756 The preparation of leaps via timing and articulation has also been studied previously. For instance, the KTH rule set (Friberg 1995) features a pair of rules named Leap Tone Duration (LTD) and Leap Articulation (LA) that pertain to this type of situation: Leap Articulation inserts a micropause between the notes of a leap, and Leap Tone Duration shortens the first and lengthens the second note (IOI) of a leap for upward leaps, and does the opposite for downward leaps. Note that rule TL3 learned by our system predicts the opposite of Friberg's LTD rule - it calls for a lengthening of the first IOI in upward leaps (and this is supported by our performance data). This suggests that - at least in Mozart piano music - the Leap Tone Duration rule should be used with a negative k parameter. Regarding downward jumps, there does not seem to be a general trend in our set of performances. Rules predicting a note (IOI) shortening: Note shortening and local speedups seem much more difficult to predict. The learning algorithm did not find any really general rule that covers a substantial number of positive instances and is strongly discriminative. Rules with a reasonably large coverage also tend to be overly general, predicting a shortening in many cases where the pianist did not apply one. At best, these rules seem to represent general tendencies that would need to be made more specific in order to be useful as prescriptive rules. One rule with a reasonably positive TP/FP ratio (at least for slow pieces) is RULE TS1: shorten IF prev_dur_ratio < 0.67 & next _dur_ratio > 1. O

Page  00000005 Category #rules True Positives False Positives Precision lengthen 4 3069 (22.89%) 1234 (6.00%).713 shorten 2 397 (2.98%) 179 (0.87%).689 louder 3 1318 (11.33%) 591 (3.24%).690 softer 3 625 (6.63%) 230 (1.14%).731 staccato 4 6916 (31.25%) 1089 (5.74%).864 legato 1 687 (7.42%) 592 (1.86%).537 Table 1: Classification accuracy of learned rulesets on training data (13 Mozart sonatas). "Shorten a note (IOI) N in a sequence PN-N-NN ifN is longer than its predecessor PN and longer than its successor NN (more precisely, if the duration ratio PN:N < 2:3 and N:NN > 1:1)." slow: 354 (9.59 %) / 175 (2.94 %), wr =.669 fast: 489 (5.09 %) / 527 (3.61 %), wr =.481 all: 843 (6.34 %) / 702 (3.42 %), wr =.546 which expresses a tendency to shorten long notes between shorter ones - a sort of smoothing action. We use this rule only for slow pieces in the following experiments. Apart from such weak tendencies, the system was able to discover only rather specialized rules. The one with the clearest musical interpretation is RULE TS2: shorten IF tempo = fast & meter = 3/8 & prev_dur_ratio > 2.0 & dur < 0.5 & nextdur_ratio < 0.99 "Shorten a note (I01) in fast pieces in 3/8 time if the duration ratio between previous note and current note is larger than 2:1, the current note is at most a sixteenth (its duration is < 0.5 beat units), and it is again followed by a longer note." slow: 0 (0.00 %) / 0 (0.00 %) fast: 43 (0.45 %) / 4 (0.03 %), wT =.915 all: 43 (0.32 %) / 4 (0.02 %), wr =.915 This rule describes a well-known phenomenon that has been observed in empirical studies before, namely, the common shortening of sixteenth notes following a dotted eighth (and usually again followed by a longer note), as in the famous theme of the A major sonata K.331 (Gabrielsson 1987). Most of the cases covered by rule TS2 do indeed fall into this dotted-eigth-sixteenth note category. Note that in our case, the rule explicitly limits this prediction to fast pieces (as, e.g., in the third movement of the Sonata K.280). In the slow ones (e.g., the beginning of K.331) our pianist tended not to apply this shortening in a consistent manner (in only 16 out of 50 cases did the system find a significant shortening). Note also that rule TS2 calls for the duration ratio to be larger than 2:1. It has been observed by others that duration ratios of 2:1 are usually blurred by performers by lengthening the shorter (or shortening the longer) note (see also rule TS1 above). The difference in performing 2:1 vs. 3:1 duration ratios has been linked to considerations of categorical perception, i.e., the need to make these two rhythmic patterns more easily distiguishable for listeners (Sundberg 1993). It is nice to see this principle emerge automatically from real performance data via machine learning. Space limitations do not permit a detailed discussion of the other rules learned. Table 4 lists the entire rule set, along with information concerning individual rule coverage and precision. More details will be given in (Widmer 2001b). 4.2 Coverage and Precision on the Training Performances In trying to assess the quality of the rule sets as partial models, the first aspect we look at is the fit (coverage and precision) of the rule sets on the training performances they were learned from. Table 1 gives the overall coverage and precision of the three induced rule sets on the training data, separately for each prediction category. As can be seen, the categories for which we managed to find rules of high coverage (and still reasonably high discriminative power) are IOI lengthening and staccato or micropause insertion. The rules for dynamic attenuation (class softer) exhibit reasonable precision, but cover fewer cases. The other three categories turned out to be more difficult to predict at the note level. The remarkable result is that for the former three categories, such a high proportion of all observed occurrences can be predicted by so few (and simple) rules. 4.3 Generality I: Testing on a Different Pianist The purpose of the next experiment was to assess the degree of performer-specificity of the discovered rules, by testing them on performances of the same pieces, but by a differ

Page  00000006 Category #rules True Positives False Positives Precision lengthen 4 596 (29.27%) 242 (7.62%).711 shorten 2 90 (4.10%) 45 (1.49%).667 louder 3 210 (13.12%) 87 (2.85%).707 softer 3 53 (3.32%) 45 (1.65%).541 staccato 4 861 (39.28%) 228 (5.71%).791 legato 1 131 (4.63%) 57 (1.70%).697 Table 2: Classification accuracy of learned rulesets on test data (Mozart performances by P.Entremont). Category #rules True Positives False Positives Precision lengthen 4 1752 (69.06%) 327 (10.94%).843 shorten 2 1472 (53.20%) 110 (4.01%).930 louder 3 601 (25.13%) 285 (11.06%).678 softer 3 0 (0.00%) 0 (0.00%) staccato 4 950 (32.40%) 166 (5.92%).851 legato 1 17 (0.85%) 27 (0.73%).386 Table 3: Classification accuracy of learned rulesets on test data (performances of 2 Chopin pieces by 22 pianists). ent artist. The renowned pianist Philippe Entremont has also recorded some of Mozart's sonatas on a BWsendorfer SE290. We managed to obtain and process his renditions of the following pieces: Sonatas K.282 and K283 complete; plus second movements of K.279, K.280, K.281, K.284, and K.333. The resulting set of performed soprano notes comprises 8,105 notes. Table 2 summarizes the predictive accuracy of our learned rules on the Entremont performances. Comparing this to Table 1 we find no significant degradation in coverage and accuracy (except in category softer). On the contrary, for some categories (lengthen, louder, staccato) the coverage of positive examples is higher than on the original training set. The discriminative power of the rules (captured by the precision values) remains roughly at the same level. This (surprising?) result testifies to the generality of the discovered principles (and the merits of our rule discovery method). We are currently extending the Entremont data set with recordings of three additional complete sonatas that are not present in the original training set (K.309, 310, and 311). That will provide further insight into the true generality of the rules. 4.4 Generality II: Testing on a Different Style An additional experiment tested the generality of the discovered rules with respect to musical style. The rule sets were applied to two pieces by Fr6d6ric Chopin (the first 20 bars of the Etude Op. 10, No.3 in E major, and the first 45 bars of the Ballade Op.38, F major), and their predictions compared to performances (on a BWsendorfer SE290) of these pieces by 22 skilled pianists. The soprano parts ('melodies') of these 44 performances amount to 6,088 notes. The coverage and precision achieved by our simple rule sets are given in Table 3. This result is even more surprising. The categories softer and legato turn out to be basically unpredictable, and the rules for class louder cover a high percentage of positive examples, but also exhibit a rather high level of false predictions. But the results for the other classes (lengthen, shorten, and staccato) are extremely good, better in fact than on the original (Mozart) data which the rules had been learned from! The high coverage (TP) values, especially of the timing rules, are remarkable. A closer look at the Chopin test pieces shows that in the timing dimensions, three out of the six learned rules are sufficient to jointly produce these high TP rates of 69.06% and 53.20%. Both of these two test pieces have a rather regular rhythmic structure, and we plan to test the rules on a more diverse set of Chopin pieces. Remember also that the data represent a mixture of 22 different pianists. When looking at how well the rules fit individual pianists, we find that some of them are predicted extremely well (e.g., pianist #15: timing/lengthen: TP = 89/122 (72.95%), FP = 4/129 (3.10%), x =.957; timing/shorten: TP = 71/120 (59.17%), FP = 3/132 (2.27%), F =.959). In summary, we feel that these initial results are quite remarkable. They strongly indicate that it is possible to discover some basic performance principles (in complex, 'realworld' data) that are both fairly precise and general across a range of performers and musical styles.

Page  00000007 Rule Action Conditions pos. coverage Precision (slow+fast) slow fast total TL1 lengthen IF abstrdurcontext = equal-longer 2,665 (19.87%).870.686.746 TL2 lengthen IF next-dur-ratio < 0.334 1,788 (13.33 %).846.708.758 TL2a* lengthen IF next-durratio < 0.99 1,121 (8.36 %).820 -.820 & metr-strength < 2 TL3 lengthen IF dir-next = up 259 (1.93 %).714.636.662 & int-next > p4 & metr-strength < 2 & intprev < maj2 TS1* shorten IF prev-dur-ratio < 0.67 354 (2.66 %).669.669 & nextdurratio > 1.0 TS2** shorten IF tempo = fast 43 (0.32%).915.915 & meter = 3/8 & prev-dur-ratio > 2.0 & dur < 0.5 & nextdurratio < 0.99 DL1 louder IF dir-prev = up 747 (6.42%).847.761.782 & intprev > p4 & metr-strength > 2 DL2 louder IF mel-contour = updown 890 (7.65 %).734.731.731 & intprev > min3 & metr-strength > 2 DL3** louder IF prev-dur-ratio < 0.5 359 (3.09 %) -.709.709 & dir-prev = up & metr-strength > 3 DS1 softer IF prev-dur-ratio > 5.0 377 (4.00 %).764.675.710 DS2 softer IF dir-prev = down 173 (1.83 %).745.811.783 & intprev > maj3 & metr-strength < 1 & dur-prev > 0.33 DS3 softer IF dir-prev = down 169 (1.79%).840.797.813 & intprev > p5 & metr-strength < 1 AS1 staccato IF markedstaccato = yes 3,071 (13.88 %).916.938.934 AS2 staccato IF intnext = unison 2,929 (13.23 %).981.996.934 AS3 staccato IF intnext > p4 1,237 (5.59 %).656.796.756 & dir-next = up & metr-strength < 2 AS4 staccato IF nextdur-ratio < 0.4 1,215 (5.49 %).571.809.717 & dirprev = down AL1 legato IF staccato = no 687 (7.42 %).593.513.537 & mel-contour = updown Table 4: Summary of discovered rules and their individual coverage and precision (separately for slow and fast pieces) on the training data (*: used for slow pieces only; **: for fast pieces only).

Page  00000008 5 Conclusion The results presented here, though promising, are only a small first step towards the ambitious goal of a comprehensive computational model of expressive performance. What we have discovered to date are a few isolated rules that may indeed represent quite general and robust local expression principles. Their generality across performers and styles needs more empirical validation. If they do turn out to be sufficiently reliable, these rules may form the nucleus of a more complex, multi-level model of performance. Our next steps in this line of research are quite clear. First, we plan to perform additional experiments to evaluate the rules on different performers and different types of music. Second, we plan to extend the representation of the music with some important structural dimensions that are currently lacking, notably, harmony. There are some promising algorithms for automated harmony analysis (e.g., Temperley and Sleator 1999) that we are currently looking at. This may lead to the discovery of a few additional note-level principles. And thirdly, the next large step will be to go beyond the level of individual notes and look directly for structural performance regularities at higher levels of musical organization, in particular, phrase structure. We will take a more detailed look at the empirical validity of some published models of phraselevel timing and dynamics (Todd 1989; Todd 1992) vis-a-vis our complex performance data. The next step will then be to study ways of combining note-level and phrase-level models (of varying degrees of abstraction) into one comprehensive model that explains as much of the observed performance patterns as possible. 5.1 Acknowledgments This research is part of the START programme Y99-INF, financed by the Austrian Federal Ministry for Education, Science, and Culture. I would like to thank the pianists Roland Batik and Philippe Entremont for allowing us to use their performances, and the L. Bisendorfer company, Vienna, and in particular Fritz Lachnit for providing the data and technical help. References Friberg, A. (1995). A Quantitative Rule System for Musical Performance. Ph. D. thesis, Department of Speech Communication and Music Acoustics, Royal Institute of Technology (KTH), Stockholm, Sweden. Fiirnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review 13(1). Gabrielsson, A. (1987). Once again: The theme from mozart's piano sonata in a major (k.331). In A. Gabrielsson (Ed.), Action and Perception in Rhythm and Music, Stockholm. Royal Swedish Academy of Music. Narmour, E. (1977). Beyond Schenkerism: The Needfor Alternatives in Music Analysis. Chicago, IL: University of Chicago Press. Palmer, C. (1996). Anatomy of a performance: Sources of musical expression. Music Perception 13(3), 433-453. Sundberg, J. (1993). How can music be expressive? Speech Communication 13, 239-253. Temperley, D. and D. Sleator (1999). Modeling meter and harmony: A preference rule approach. Computer Music Journal 23(1), 10-27. Todd, N. M. (1989). Towards a cognitive theory of expression: The performance and perception of rubato. Contemporary Music Review 4, 405-416. Todd, N. M. (1992). The dynamics of dynamics: A model of musical expression. Journal of the Acoustical Society ofAmerica 91, 3540-3550. Widmer, G. (2000). Large-scale induction of expressive performance rules: First quantitative results. In Proceedings of the International Computer Music Conference. International Computer Music Association. Widmer, G. (2001a). Discovering strong principles of expressive music performance with the PLCG rule learning strategy. In Proceedings of the 11th European Conference on Machine Learning (ECML'01), Berlin. Springer Verlag. Widmer, G. (2001b). Machine discoveries: Some simple, robust local expression principles. Forthcoming.