Page  00000286 PerfComp: A Heuristic-Based Program for Analysis of MIDI Performance Files Timothy M. Walker Division of Mathematics, Computer and Natural Sciences, Ohio Dominican University walkert@ohiodominican.edu Abstract Quantitative studies of music performance often need to compare a performance file to a representation of the score in order to code and analyze patterns of timing, dynamics, and errors. A computer application is presented which uses a heuristic decision-making process to distinguish between types of errors. The program also calculates the timing curve for the performance, relative to the beats defined in the score. The algorithms used to determine error types are defined and discussed, and some practical implications of the program are noted. 1 Introduction Empirical studies of music performance often store performance data as MIDI (Musical Instrument Digital Interface) files. MIDI provides a convenient and ubiquitous format for representing musical information, and there is a wealth of software, both commercial and homegrown, for manipulating MIDI data. PerfComp, the program discussed here, provides a mechanism for determining the relative timing and error rates for a performance file as compared to a reference score file. The specific formatting needs for performance research will be briefly presented. Next, the logic will be described by which PerfComp decides between different possible interpretations of a performance. Pseudocode will be used in order to illustrate the algorithms used within the program. Some parameters for influencing the behavior of PerfComp will then be discussed, including an explanation of the output. 2 Variables and Formats With a goal of studying the detailed structure of a musical performance, a researcher must decide what features are relevant, and how to quantify and measure them. Typically, empirical studies have used three fundamental properties as their primary dependent variables: errors, patterns of timing, and patterns of dynamics (e.g., Desain and Honing 1994; Drake and Palmer 1993; Palmer 1989; Palmer and van de Sande 1995; Pfordresher 2003; Repp, Windsor, and Desain 2002; Shaffer 1984; Sloboda 1983). Note error types and locations are defined by pitches played by the performer other than those notated in the score, and can consist of substitutions, additions, or deletions of notes. Note errors can also be categorized by the nature of a wrong note; for example, a substitution or an addition may or may not be drawn from the musical key of the notated piece (this distinction obviously does not apply to deleted notes, since they were not played at all). Patterns of timing are defined as millisecond-level deviations from the beat locations implied by a strictly-quantified interpretation of the score, and patterns of dynamics are defined as deviations in perceived or measured loudness of notes, usually measured as intensity of the acoustic onset. In particular, patterns of timing deviations are used more than any other variable as a quantitative measure of performance. PerfComp reads and writes text files, which results in an easily-interpretable format for the performance data. Individual notes within a file can be examined, and attributes of performances can be easily graphed or exported for statistical analysis. The input to PerfComp is structured as a fixed-width, tab-delimited, text file obtained from a translation of the original MIDI file1. Each performed note corresponds to one row of data, with seven columns representing: MIDI Channel Number; Onset Time (all times represented in milliseconds); Offset Time; Duration (i.e. offset minus onset); MIDI Pitch Value (a number between 0 and 127 representing by semitones the musical note and octave which was played, 60 being middle C); Key Velocity (a measure of loudness as a number between 0 and 127 representing the intensity of each note); and Interonset Interval (101, the difference between the onset of the current note and the onset of the next note; defined as zero for the last note in the file). 3 Algorithmic Identification of Errors With the data arranged as columns of numbers, performances can be analyzed in terms of any of the columns. However, variability in the performance needs to be considered in relation to the patterning of notes in the 1. The input text files can be easily obtained by using a modified version of the MIDI translation program mftext, which the author can provide to any interested parties. 286

Page  00000287 musical score (e.g. Bora, Tufan, and Bilgen 2000; Heijink et al. 2000; Muiller and Mazzola 2003). Therefore, PerfComp takes as input two files in the text format described above: a performance file and a comparison file which is created as a strictly-quantized representation of the performed score. The program then provides a heuristic-based analysis of the performance. The main difficulty is that incorrect notes can produce an ambiguous interpretation of the performance; a note which does not match the score might truly have been played incorrectly or it might be a note from the score but played at the wrong time. PerfComp considers a few alternatives for each mismatch and decides whether to categorize the event as a substitution (i.e. a wrong note), an addition (i.e. an extra note), or a deletion (i.e. a note from the score was not played). The fundamental mechanism is to search for a nearby matching note in the comparison file, which would indicate the possibility that the performance has additions or deletions. However, the presence of a matching note does not guarantee that the performed note was not a substitution, and PerfComp looks at both global and local properties of the performance file in order to reach an optimal interpretation. 3.1 File Pre-Processing At the global level, PerfComp allows for a bias towards either additions or deletions. It accomplishes this by preprocessing both files and counting the number of notes. The difference in number of notes in the two files (performance minus comparison) is stored in a variable called offset. If there are more notes in the performance file (i.e. if offset > 0), PerfComp will look for additions before deletions. Conversely, if there are more notes in the comparison file (i.e. if offset < 0), PerfComp will look for deletions before additions. Furthermore, offset is dynamically adjusted as errors are encountered, to represent a running count of the discrepancy between remaining notes in the performance file and the comparison file, and is also proportionately interpreted relative to the current processing location in the file. Therefore, the expectations of the program may change as data is processed. A user-supplied parameter, called minLook, can (for non-zero values) ensure that a minimum number of notes will be searched in both directions (i.e. for both additions and deletions) for possible matches, regardless of the value of offset. Pre-processing the files also allows PerfComp to interpret note onsets in terms of beats in the musical score; a user-supplied parameter gives the number of musical beats in the score2, an overall tempo is determined from the first and last onset in the file, and each note in the performance file is 2. Actually, the parameter is the number of beats minus one, since a beat length isn't calculated for the last note in the score. assigned a floating-point number defining its fractional beat position relative to its corresponding note in the comparison file. Additionally, PerfComp calculates the overall octave (register) in which the performance was played, and later takes that into account when comparing notes from the performance and comparison files. Therefore, playing the piece an octave up or down from the notation won't result in all the notes being flagged as incorrect. However, the octave of individual notes is tracked, so a single note played an octave up or down will be flagged as incorrect. 3.2 Looking for Matching Note Values After the files have been read and pre-processed, PerfComp works through the files from beginning to end, comparing one note at a time from the performance file to the comparison file and keeping a running track of the current position and relevant state information. Two variables, lookAheadPerf and lookAheadComp, represent respectively the number of notes in the performance file and the comparison file which will be searched forward from the current position for potential matches. Note that a match that is found ahead in the performance file would indicate potential addition(s), while a match that is found ahead in the comparison file would indicate potential deletion(s). The following code illustrates how the variables are defined and how the general search process works (i and j are indexes for the current position in, respectively, the performance file and the comparison file, tmpi and tmpj are temporary indexes used to search forward from the current position, and found is a variable which will become a non-zero distance between the current position and the matching note if one is obtained): IF (minLook > offset) lookAheadPerf = minLook; ELSE lookAheadPerf = offset; IF (minLook > -(offset)) lookAheadComp = minLook; ELSE lookAheadComp = -(offset); {... ) WHILE ((NOT found) AND ((tmpi < (i + lookAheadPerf)) OR (tmpj < (j + lookAheadComp)))) IF (offset > 0) {first look one note ahead for addition...} {...then look one note ahead for deletion} ELSE {first look one note ahead for deletion...} {...then look one note ahead for addition} If no matching note is found within the above search (i.e. if found = 0), then the current note in the performance file is simply marked as a substitution and the variables are incremented to look at the next note. Even if a matching note is found, however, it might not be determined to be the intended note of the performer. The program calculates two metrics: one based on the matching note (i.e. treating the notes in between as additions or deletions); and one based on 287

Page  00000288 the current (incorrect) note (i.e. treating the current note as a substitution, and therefore the match as being spurious). 3.3 The Metrics Each metric has three components: (1) the extent to which the additions or deletions would decrease (or increase) the running count of remaining note surplus or deficit (in other words, how using the match would change the dynamic value of offset; this is a non-zero factor only for the matching note, since using the current note as an incorrect substitution would obviously not change the running count); (2) the number of correctly performed notes following the note in question (after an adjustment in position for the case of additions or deletions); and (3) the difference in the beat locations between the note that should have been played (the correct note from the comparison file) and the note in question (either the current or the matching note). If the overall metric (the combination of all three factors) yields a better score for the matching note, then it is used and the notes in between are marked as additions or deletions. If the metric is better using the current note, then it is treated as an incorrect substitution and the matching note is ignored. The code below shows the computation and comparison of the metrics and the effect that using the match has on the value of offset. The variable offsetReduct is the amount of change to offset by using the match, and can be positive or negative; offsetMetric, numCorrectFound, and beatDiffFound are the components of foundMetric, which treats the match as a true indication of additions or deletions; numCorrectCurr and beatDiffCurr are the components of currentMetric, which treats the incorrect note as a substitution; GetNumCorrect() is a function which returns the number of correct notes in the performance file following a given position; GetBeat( is a public function of the file objects which returns the beat location for a given note; and pfile and cfile are objects representing, respectively, the performance file and the comparison file. offsetReduct = Ioffsetl - |offset - foundl; offsetMetric = (|offsetl + 1) * offsetReduct / (# pfile notes left); numCorrectCurr = GetNumCorrect(pfile, i, cfile, j); beatDiffCurr = pfile->GetBeat(i) - cfile->GetBeat(j); IF (found > 0) {numCorrectFound = GetNumCorrect(pfile, tmpi, cfile, j); beatDiffFound = pfile->GetBeat(tmpi) - cfile->GetBeat(j);} ELSE {numCorrectFound = GetNumCorrect(pfile, i, cfile, tmpj); beatDiffFound = pfile->GetBeat(i) - cfile->GetBeat(tmpj);} currentMetric = numCorrectCurr - (beatDiffCurr)^2; foundMetric = numCorrectFound + offsetMetric - (beatDiffFound)^2; IF (foundMetric > currentMetric) {offset = offset - found; IF (found > 0) {Mark all the notes in between as additions} ELSE {Mark all the notes in between as deletions}} ELSE {Mark the current note as a substitution} 3.4 Components of the Metrics As the above section of code shows, the offset component of the metric is defined as a function of the current position in the piece and both the direction and distance of the matching note. The variable offsetReduct will be positive if the matching note would reduce the discrepancy between remaining notes in the performance file and the comparison file, and will be negative if it would increase the discrepancy (the absolute values are necessary in order for this to be true regardless of the direction of the bias). The numerator of offsetMetric is the product of abs(offset)+1 and offsetReduct in order to weight more heavily reductions in large discrepancies. The +1 term ensures that increases in the discrepancy (i.e. negative values of offsetReduct) will be weighted for cases in which the current value of offset is zero. The denominator of offsetMetric is a count of the remaining number of notes in the performance file. This serves to weight a change in the discrepancy more heavily (either positively or negatively) if it is towards the end of the piece, since the ends of the two files should be reached at the same time. The final value of offsetMetric can be either positive or negative (the sign will be the same as that of offsetReduct); positive values support an interpretation of the matching note to give additions or deletions, while negative values support an interpretation of the error as a substitution. The components to weight the number of correctly performed notes are fairly straightforward, and simply count the number of correct notes which would follow the current error if it is coded as a substitution (numCorrectCurr) or as additions or deletions (numCorrectFound). These components cannot be negative, and larger values tend to indicate that the given interpretation of the error is valid. The components to weight the difference in beat locations are similarly simple, and take the difference between the beat location of the note that should have been played (from the comparison file) and the beat location of the note that was played (from the performance file). This is calculated for coding as a substitution (beatDiffCurr) and as additions or deletions (beatDiffFound), and larger differences would tend to indicate that the performed note should not be interpreted at the given location in the score. Once all the components have been calculated, the overall metric is defined as the sum of the proper numCorrect variable and offsetMetric (offsetMetric is always zero for currentMetric, since a coding of substitution wouldn't change offset), minus the square of the proper beatDiff variable. Overall, a large positive value is best. Therefore, the error is coded as additions or deletions only if foundMetric > currentMetric; otherwise, the error is coded as a substitution. After the error has been properly coded, the indexes are updated and the program moves on to consider the next note. 288

Page  00000289 3.5 Interpretation and Output Note that values of minLook can influence the identification of errors; small values tend to limit the range over which note matches are found and therefore bias the result towards more substitution codings. Also, the relative weightings of the components in each metric would obviously affect the probabilities of decisions. The squared beatDiff values, for example, already serve to discourage PerfComp from matching notes from different measures in the files. This behavior could be made more or less influential by changing the weighting of the components. PerfComp writes an output file for each input performance file. The output includes all seven columns from the input file, as described above (in section 2), plus six additional columns: Note Name (e.g., "C" or "G#", etc.); Note Octave (the register in which the note was played); Beat (e.g., a Beat of 4.5 would mean the note occurred on the upbeat before the fifth downbeat); Comparison Beat (the Beat number for the equivalent note in the comparison file); Beat Difference (Comparison Beat minus Beat, which gives the extent to which the note occurred either early or late relative to the tempo of the entire piece); and Error (the intended Note Name value if substitution, "ADD" if addition, "DEL" if deletion, or "ok" if note is correct). Note that Beat Difference can be used as a main dependent variable for performed timing; positive values reflect a speeding up while negative values reflect a slowing down. It is not defined for additions and deletions since there are not comparison notes; flags of 999 and -999 are used in the output, and these lines should be removed before analyses. In addition to the output file, there is a "verbose" option of PerfComp which provides all of the calculated metrics, components, and corresponding variables (found, numCorrect, beatDiff, etc.). This can be a useful tool for monitoring and interpreting the operation of the program as it encounters errors. For example, if the values of the two metrics are close to one another for an erroneous note, then the interpretation of that note might be legitimately ambiguous. Additionally, it is often useful to know that a matching note value has been found, even if the match is not identified as evidence of additions or deletions (i.e., currentMetric > foundMetric, so the error is marked as a substitution and the two matching notes are considered to be distinct). A possible interpretation of this situation is that the performer has anticipated a future note (or repeated a past note), similar to linguistic "Spoonerisms" in which speakers exchange phonemes. The verbose output of PerfComp provides an easy mechanism for tracking these and similar situations. 4 Summary Using a computer algorithm for error categorization provides an efficient, objective method of producing analyzable data. Although there are typically only a few errors, if any, in a single performance, empirical studies often contain a multitude of files across experimental factors, participants, etc. Tools such as PerfComp can be scripted along with other standard text-based programs to produce hundreds or thousands of analysis files, literally in seconds. Additionally, the calculation of fractional deviation from the beat location makes the output useful for timing analysis, independent of any error analysis. This program was intended to be as practical and flexible as possible, by applying an intuitive heuristic approach dedicated to the task of performance comparisons and by providing easilyinterpretable output and feedback. PerfComp was written in C++ as a Unix command-line tool reading from standard input and writing to standard output. This architecture makes it convenient to pipe and script, but the program could easily be ported to other platforms. References Bora, U., S. Tufan, and S. Bilgen. 2000. A tool for comparison of piano performances. Journal of New Music Research 29(1), 85-99. Desain, P., and H. Honing. 1994. Does expressive timing in music performance scale proportionally with tempo? Psychological Research 56, 285-292. Drake, C., and C. Palmer. 1993. Accent structures in music performance. Music Perception 10(3), 343-378. Heijink, H., P. Desain, H. Honing, and L. Windsor. 2000. Make me a match: An evaluation of different approaches to scoreperformance matching. Computer Music Journal 24(1), 43-56. Muiller, S., and G. Mazzola. 2003. The extraction of expressive shaping in performance. Computer Music Journal 27(1), 47 -58. Palmer, C. 1989. Structural representations of music performance. Proceedings of the Cognitive Science Society, 349-356. Hillsdale, NJ: Erlbaum. Palmer, C., and C. van de Sande. 1995. Range of planning in music performance. Journal of Experimental Psychology: Human Perception and Performance 21(5), 947-962 Pfordresher, P. Q. 2003. Auditory feedback in music performance: Evidence for a dissociation of sequencing and timing. Journal of Experimental Psychology: Human Perception and Performance 29(5), 949-964. Repp, B. H., W. L. Windsor, and P. Desain. 2002. Effects of tempo on the timing of simple musical rhythms. Music Perception 19(4), 565-593. Shaffer, L. H. 1984. Timing in solo and duet piano performances. Quarterly Journal of Experimental Psychology 36A, 577-595. Sloboda, J. A. 1983. The communication of musical metre in piano performance. Quarterly Journal of Experimental Psychology 35A, 377-396. 289