Page  00000001 Modeling piano performance: Physics and cognition of a virtual pianist Richard Pamcutt Depaitment of Psychology, Keele University r.pamcutt@keeleac uk, http://wwwkeeleac.uk/depts/ps/rpbioghtm Abstract The musical quality of computer-generated performances of piano scores might be improved by incorpoiating a physical model of the pianist's body as it interacts with the instmment Issues considered include the various aspects of musical structure (as conceived by a virtual pianist) that interact with expression, and the expressive function and algorithmic detemnination of fingering (as used by a virtual pianist). 1 Intlrduction Computer simulations of expressive piano performance are becoming increasingly sophisticated But they do not yet produce consistently satisfying iesults. In this regai musical expression is lagging behind timbral synthesis. Most instrumental timbies can now be convincingly synthesized by physical models (Smith, 19%). What's wrong? Have researchers been neglecting vital aspects of the problem? Most recent computer-based attempts to create expressive musical performances (Clynes, 1985, 1986; Friberg, 1991; Honing, 1990; Mazzola, 1994; Sundberg, 1988; Todd, 1989) have focused on the role of musical structure (Claike, 1988). But peihaps the relationship between structure and expression is only part of the challenge. Some new and creative approaches have recently appeaed. Clarke (1995) attempted to explain aspects of performance expressionin semiotic terms, citing Shaffer's (1992) proposal that a musical performance is a kind of nairative, in which events are detennined by the musical structure, and the characters or personalities of protagonists are created by the perfonmer through expressive interpretation. Such arguments, while convincing at a qualitative level, do not immediately lend themselves to application in a computer-based performance system. How about the physical means by which musicians create their interpretations - their bodies, brains, ears, lungs, lips, fingeis? Could the modeling of piano performance, like timbral synthesis, benefit from the introduction of physical models? The model of temporal integration and integrated energy flux proposed by Todd (1994) and developed by Clarke (1995) may be regarded as a physical model of expression based on the physiology of the auditory periphery. But the most important physical models of expression address music's apparent ability to imply movement (Shove & Repp, 1995). Todd's (1995) model of the kinematic implications of thythm and phrasing may be legarded as a physical model of expression based on musical structure as conceived by the pianist. Similarly physical in their origins and implications are Truslit's implied motional forms (Repp, 1993), and Langer & Kopiez's (1995) and Large & Kolen's (1994) banks of coupled, lesonating oscillators. I will not consider these alternative approaches in this paper. Instead, I will consider the possibility of modeling perhaps the most conspicuously "physical" element in the system: the body of the pianist as it interacts with the instrument The idea of a synthetic performer has a long history in computer music circles (eg, Veicoe, 1984), but the idea of including the physical properties of the performer's body in such a model is rather newer. The musical output of Veicoe's (1988) virtual accompanist was determined entirely by the notated and acoustic structure of the music. Similarly, Jean-Claude Risset's piece Duetfor One Pianist (Risset & Van Duyne, 199) involved only MIDI interaction between a live performer and a computer. Here, "virtual pianist" alludes to a hypothetical modelthat combines both physical and cognitive aspects. I will consider possible influences on expression both of the pianist's conception of the musical structure and of the pianist's body as it interacts with the instrument 2 Structuml Models Let us first overview the various structural models of expressive performance that have been developed in recent years. Most have been biased towaid certain aspects of structure, while neglecting others. The tendency for perfoimersto speedup at the start of phrases and slow down at the end of phrases, sections, or whole pieces, and for timing patterns to reflect the hieraichical structure of phrases at different levels, has been studied in considerable quantitative detail (Repp, 1992; Sundberg & Verrillo, 1980; Todd, 1985), and was the primary focus of the model of Todd (1989). The frustratingly idiosyncratic modelof Clynes (1985, 1986) was mainly concerned with modulation of timing and dynamics as a function of metrical position, again at different hieraichical levels. In contrast, Sundberg's (1988) model focuses on the musical surface (leap articulation, hanmonic and melodic charge, etc.). A majorrecent advance in the structuralmodeling of expression(Widmer, 1995) has allowed a system of rules, similar to that of Sundberg and colleagues, to be derived directly and objectively by comparing perfomance data with structural analyses of the music based on the theories of Leidahl & Jackendoff(1993) and of Nannour (1977). Clearly, there is no easy way out A general structural model of expression would need to consider, in a coherent fashion, a wide variety of different aspects musical structure, appropriately balancing surface and deeper levels (Mazzola & Zahoika, 1994). A lecent attempt at a unified model (Pamcutt, 1997) is based on a broad definition of an accent as any relatively salient event, or any event that attracts the attention of a listener more than surrounding events (see also Pamcutt, 1994). In both speech and music, listeners need to get a feel for the importance of individual events

Page  00000002 (syllables, notes) ielative to each other if they are to correctly inferthe undeilying stiuctule and meaning. Musical accents may be classified following Leidahi and Jackendoff(1983): STRUCIURAL or be tween-c ate goiy ac ce nts * time * pitch * loudness * timbie - grouping - metrical - melodic (contour) -hanmonic - dynamic - instumment/oichestration EXPRESS1VEor withincategory accents * time * pitch * loudness * timbie -onsettime (agogic) - duiation (aiticulatory) - amplitude envelope -intonation - stiess - coloration Grouping accents occur at starts and ends of note groups at diffeient levels, from phrases thiough sections to whole pieces. Metrical accents are similarly hieraichical, and may be identified and quantifiedboth within and between beats, and within and between measures. Melodic accents may be divided into turns and skips (Drake & Palmer, 1993), of whichtums are moie important (Huron & Royal, 19%). Turns are peaks and valleys of the melodic contour (peaks being moie impoitant, Thomassen 1982). Skips are disjunct inteivals between consecutive tones; the wider the inteival preceding the tone, the stionger is its accent, and rising skips produce stionger accents than falling. Harmonic accents occur either "horizontally', at sudden changes of haimony, or "vertically', at points of acoustic dissonance. Dynamic accents are explicitly maiked in the score. Timbral accents occur at changes of instrumentationor articulation. The various kinds of expressive accent listed above iepresent the means that musicians have for bringing out structuralaccents.A perfoimer may manipulate onset and offset times, amplitude envelopes, precise pitch (intonation), physical intensity (stress), and timbie. Of these, only stress, agogic accent, and articulatoiy accent are available to the pianist. Based on the above taxonomy, expressive music perfoimance may be iegarded as a process vwhereby expressive accents reinforce structural accents. This broad concept is consistent with the variety of "good" interpretationsof a single piece. A musical score typically includes many diffeient kinds of structural accent, of vaiious strengths or saliences (see Pamcutt, 1997 for examples). DiXing practice and perfoimance, perfoumers are constantly making decisions - largely intuitively and unconsciously - as to which accents should be emphasized, and how and to what extent. Current knowledge does not yet allow us to create a viable scientific model of the undeilying cognitive processes Consequently, computer-based simulations of musical perfoimance, if they are to be musically convincing, really cannot avoid involving the user in the interpretative process. It appeais for the moment to be impossible to ariive at a good interpretation solely by means of abstiact and supposedly geneial principles. The user needs to be invited to choose which accents to emphasize, and by how much. The user should also be free to manipulate associated modulations of timing and dynamics. In the case of dynamics, stiesses may be large or small, sudden or gradual. A "giadual stiess" would involve a gradual increase in loudness for notes leading to the accent, and a gradual decrease afterwards. In the case of timing, agogic accents may involve a slowing of local tempo beftbe the event and a speeding up after it; and articulatoiy accents may involve a tempoiaiy increase in legato fbr tones near a given event The creation of userfriendly interfaces for such adjustments, including "popup", "nudgeable" local and global cuives for modulation of timing and dynamics in the vicinity of specific accents, would be a stiaightforward progiamming exercise. After each change to the interpretation, the iesult would be synthesized, listened to, and appraised by the user, who wIould then make further changes. After many such iterations, the user would anive at an interpretation of the piece that is not only musically acceptable (assuming that the user has sufficientmusical skills, talent, and patience) but also unique and personal. A ielatively theoiy-free music interpretation system has iecently been developed by Dalgaamo (1997), with encouraging iesults. Dalgamno's system could be made moie flexible by adding elements of the theory set out above, while always allowing the user the option of skipping the theory altogether and directly manipulating the surface parameters. Learning to play an instrument invariably iequires thousands of hours of practice (Sloboda & Howe, 1991), including the development of technique by seemingly endless repetition of scales, arpeggios, exercises, and studies. There is no compelling reason why some perfoimers should not now take advantage of appropriate technology to reduce the time they spend on technical woik to allow more time for interpretation. Moreover, computer-assisted musical interpretation would be attractive for musicians with physical disabilities that otheiwise prevent them from leaching high levels of technical expertise (Dalgarno, 1997). 3 A PhysicalModelofPiano Fingering Miller (1996) showed that lingeiing can affect the sound of a perfoimance. Pianists played an unfamiliar piece by Beethoven using 3 diffeient fingeiings: their own; that of the composer, and ta fingeiing devised by the experimenter. When presented with audio iecoidings, independent listeners consistently prefeired the first perfoumances to the second and the second to the third The note-by-note mechanism by which fingening affects perfoimance has not yet been systematically investigated. But interview data (Claike et al, 1997) suggest that lingeiing may affect perfoimance in specilic ways that would be directly amenable to computer modeling. For example, legato (as measured by the degree of temporal overlap between successive tones, Repp, 1995) is more likely to be maintained within hand positions than between them (that is, at changes of hand position), and between strong lingeis than between weak

Page  00000003 fingers. Notes played with the thumb tend to be played more loudly and held down for longer than notes played by other fingers. In fast passages, articulation tends to be clearer (the onset time of a note coirelating more closely with the offset time of its predecessor, and key velocities beingmore consistent) when stronger fingers are used (1, 2, 3 ather than 4, 5). How could these ideas be applied to computer-based interpretation of piano music? First, one would need to decide on a fingering. Second, perfomance pammeters couldbe modified to take account of fingering effects. A suitable fingering could be decided for the piece, either directly by the user, or by a model. Better, these two approaches could be combined, the model geneating a list of possible fingerings and the user deciding among them.Best of all, successive "performances" of the piece could be made subtly different by having the model prescribe a probability value for each fingering; the user could then adjust the values. Such changes in fingering would generate aleatoric variations in performance parameters, and may shed light on otherwise inexplicable "unintentional" or"random" fluctuations. A model has recently been advanced for the prediction of fingerings of melodies (Parncutt, Sloboda, Clarke, Raekallio, & Desain, 1997). The fingerings used by keyboard players are detenmined by a ange of ergonomic (anatomical/motor), cognitive, and musicinterpretive constraints. The model attempts to encapsulate only the most important ergonomic constraints; it may thus be regarded as a kind of physical model, based on the physiology of the fingers and hands. The model begins by generating all possible fingerings for a melodic fragment, limited only by maximum practical spans between finger-pairs. Many of the fingerings generated in this way seldom occur in piano performance. For example, the melodic fragment C4-E4 -G4 may be fingered 121, 123, 124, 125, 131, 134, 135, 141, 145, 212, 213, 214, 215, 231, 234, 235, 241, 245, 312, 313, 314,315,341, or345. Next, the difficulty of each fingering is estimated using a mule system. Each is named after a technical difficulty that a pianist might seek to avoid; e.g., the "Weak-Finger Rule" accounts for the tendency to avoid the weaker fingers 4 and 5, and the "Stretch Rule" accounts for the avoidance of unnecessary stietches between fingers. The other mules are called Small-Span, Large-Span, Position-Change-Count, Position-ChangeSize, Thiee-Four-Five, Thee-to-Four, Four-on-Black, Thumbon-Black, Five-on-Black, and Thumb-Passing. The sum of all 12 mule contributions is the predicted difficulty or "cost" of a specific fingering. The relative importance of each rule may be adjusted by applying a weight (or linear coefficient) to it; diffeient pianists and musical styles may requihe different relative weightings. For full details of the model see Pamcutt et al. (1997) or Sloboda, Pamcutt, Clarke, & Raekallio (1997). Once fingerings have been assigned difficulty estimates, they are ranked in order of calculated difficulty. The fingering with the lowest difficulty is predicted to be used most often in perfoamance. The fingerings that pianists actually use or recommend are expected to appear amongthe least difficult calculated fingerings. 4 A PhysicalModelofthePianist's Body Fingering, of course, is only one aspect of the physics of a virtual pianist. The soundof a piano performance depends also on how fingers are used to play notes- not to mention the role of the wrists, foreanns, upper arms, back, and torso. A lecent series of interviews clarified the complex relationship between fingering, interpretation, and the physical interaction between the pianist and the keyboard (Clake et al, 1987). A model of pianoperfofmance incorporating a virtual pianist couldimprove on Pamcutt et al.'s (1997) model of fingering in two ways.First, it might do away with some of the existing 12 mules and instead estimate difficulty directly from a physical model of the pianist's hands and arms in theirlelationship with the keyboard For example, certain fingerings make it physically easier to play consistently loudly, legato, or with "amn weight". In a second stage (closely related to the first), the model could take into account the physics of the body in modeling the execution of the notes. Both stages could take advantage of recent advances in related disciplines such as motor control and robotics. A physical modeling approach could eventually account for all physical aspects of piano performance. A model including room acoustics could detennine when and how the pedal is used, as investigated by Repp (1996). A model knowing about physical properties of the hands, including interactions between fingers and between hands (cf Pamcutt, Sloboda, & Clarke, 1997), could account for timing of arpeggiated chords (Repp, in press, a) and for the complex combination of expressive intentions and motor constiaints that detemnine onset asynchronies in notated simultaneities (Palmer, 1996; Repp, in press, b). A physical model of the pianist's body couldeven inform the process by which pianists get used to the touch of unfamiliar pianos, as "information flows from the instrument to the performer, via sound, and also through a bi-dhectional information flow from the haptic senses (tactile, kinesthetic, foice, etc.)" (Van den Berghe, De Moor, & Minten, 1995, p. 1). Even the relationship between acoustic interpretation and gross bodily movements of the performer might be amenable to modeling. Observers can moie reliably recognize the expressive intent of a pianist by vision than by hearing. (Davidson, 1993, 1994). Specific parts of the pianist's body, such as the hands and upper torso, are related to specific forms expression; for the pianist that Davidson studied, hand lifts occuned at rests or sustained notes, and sharp forward movements of the head occuned at cadence points. Larger movements were consistently associated with higher degrees of expressiveness. Modeling of body movements might explicate otherwise mysterious cyclic developments in tempo and dynamics occuning partially independently of the musical structure in pieces such as Chopin's Etude in E minor, for which an abundance of performance data alieady exist (Cladie, 1995; Sloboda, Lehmann, & Pamcutt, 1997). 5 Discussion At first sight, the various physical constraints on perfomance may appear as hindrances to the realization of a desired musical effect Surely if a computer model

Page  00000004 knows about the perfomner's subliminal concept of the musical stucture and intentions for conveying the stucture to the audience, that should be sufficient to produce a convincing simulation of a real performance? Aren't physical constraints exactly what computer modeling can succeed in liberating us from? These questions cannot be answered until convincing computer-generated realizations of musical scores become everyday reality. Meanwhile, some insight may be gained by considering the parallel case of timbral synthesis. Synthesized musical instrument sounds equihe a great deal of physical complexity to fool trained ears. If any audible component of the sound is missing, the lesult usually sounds musically worse than the original. Yet much of this complexity does not seem to have any particularly musical point to it. To take a simple example, for what intrinsically musical reason does a clarinet tone need a noisy onset? Apparently, what sounds good is often no more than what sounds familiar from good musical performances. In other words, the subjective quality of synthesized soundis detenninedby auditory association. Much the same principle may be expected to hold in the case of simulated expressive piano performance. Just as the individual tones of the performance need to sound genuine in their acoustic complexity, so too, I would venture, should the timing and dynamics of the performance reflect the complexity of the physical processes that enable that perfoimance to be lealized by a real human being sitting at a real piano keyboard - and that includes the constraints that limit, often frustratingly, what leal performers can do. 6 Rferences Clarke, EF 1988. "Generative Principles in Music Perfomance," in Sloboda, J (Ed.), Generative Processes inMusic. Oxford. Clarke, EF 1995. "Expression in Performance: Generativity, Pereption and Semiosis," in J Rink (Ed), Practice of Performnce. Cambridge U P. Clarke, EF, R Pamcutt, M Raekallio & JA Sloboda 1997. "Talking Fingers: An Interview Study of Pianists' Views on Fingering," Musicae Scientiae 1. Clynes, M 1985. "Music Beyond the Score," Communication & Cognition 19, 169-194. Clynes, M 1986. "Generative Principles of Musical Thought," Communication& Cognition AI, 3, 185-223. Dalgarm, G 1997. "Ceating an Expiessive Performance Withjut Being Able to Play," Brit J Music Ed. Davidson, JW 1993. "Visual Perception of Perfonmance Manner in Movts of Solo Musicians," Psychll Mus 21, 103-113. Davidson, JW 1994. "Which Areas of a Pianist's Body Convey Information about Expressive Intention?," J Hum Mov Stud 6,279-301. Drake, C & C Palmer 1993. "Accent Structures in Music Perfomiance," Mus Peicept 10, 343-37& Friberg, A 1991. "Generative Rules for Music Performance," Comp Mus J 15 (2), 56-71. Honing, H 1990. "POCO: An Environment for Analysing, Modifying, and Generating Expression " Poc ICMC. Huron D & M Royal. 1996. "What is Melodic Accent?" Mus Percept 13, 489-516. Langner, J & R Kopiez 1995. "EntwuLf einer neuen Methode der Perfornanceanalyse," Jahibuch Musikpsychologie 12, 9-27. Large, EW & JF Kolen 1994, "Resonance and Perception of Musical Meter," Comect Sci 6, 177-208. Leidali, F & R Jackenroff 1983. A Generative Theory of Tond Music. Cambridge, MA: MITPiess. Mazzola, G & O Zahcra 1994. "Tempo Curves Revisited," Comp Mus J 18 (1), 40-52. Miller, MF 1996. "Piano Fingerings and Musical Expression" Poster, Soc Mus Thecry, Baton Rouge LA Nannour, E 1977. BeyaidScherkerism U Chicago P. Palmer, C 1996. "On Assignrent of Stuctuue in Music Performance," Mus Percet 14, 23-56 Pamcutt, R 1994. "Perceptual Model of Pulse Salience and Metrical Accent," Mus Pecept 11, 409-464. Pamcutt, R 1997. "Accents and Expression in Piano Performance," in W. Auhagen et al (Eds), Systenische Musikwissenschaft.Krefeld: Dolr. Pamcutt, R, JA Sloboda, EF Clarke, M Raekallio & P Desain 1997. "Ergommic Model ofFingering," Mus Percept. Pamcutt, R, JA Sloboda & EF Clarke 1997. "Interdeperdence of right and left hands in sight-ead, written, and rehearsed fingerings," Proc. Euro. Soc. Cog Sci. Music, Uppsala. Repp, BH 1992. "Diversity and Commonality in Music Performance," J Accst Soc Am 92, 2546-2568. Repp, BH 1993. "Music as Motien: A Synqpsis ofTTmslit (1938)", Psycl-l Mus 21,48-72 Repp, BH 1995. "Acoustics, Perceptior and Production of Legato Articulation," J Accst Soc Am, 97, 3872-3874. Repp, BH1996."Pedal Timing and Tempo in Expressive Piano Performance," Psyclhl Mus 24, 199-221. Repp, BHin pess, a. "Some Observaticns on Pianists' Timing of Alpe~giated Chcrds," Psychll Mus. Rep, BH in press, b. "Patterns of Note Onset Asynchronies in Expressive Piano Performance," J Accst Soc Am. Risset, J-C, & S Van Duyne 1996. "Real-Time Performance Interaction with Computer-Controlled Accustic Piano," Comp Mus J 20 (1), 62-75. Shaffer, LH (1992). "How to Inteipret Music," In MR Jones & S Holleran, Cognitive Bases of Musical Comnunication. Washington, DC: AmPsycl-l Assoc. Shoxe, P, & BHRepp 1995. "Musical Moticn and Perfomance" in J Rink (Ed), Practice ofPerforrmnce. Cambridge U P. Sloboda, JA & MJA Howe 1991. "Biographical Piecursors of Musical Excellerce," Psychol Mus 19, 3-21. Sloboda, JA, A Lehmann & R Pamcutt (1997). "Perceiving Intended Emotion in Concert-Standard Performances of Chcpin's Pielude No. 4 in E-Minor," Proc Eum Soc Cog Sci Mus, Uppsala, Sweden. Sloboda, JA, EF Clarke, R Pamcut & M Raekallio 1997. "Determinants of Fingering Chdce in Piano Sight-Reading," J Exp Psycl-l: HumPerc Perf Smith JO 0 I 1996. "Physical Modeling Synthesis Update," Comp Mus J 20 (2), 44-56 Sundberg, J 1988. "Ccmputer Synthesis of Music Perfomance," in JA Sloboda (Ed), Genrative Processes inMusic. Oxford. Sundberg, J & Venillo, V 1980. "On Anatamy of Retard," J Acoust Soc Am 68, 772-779. Thomassen, J 1982. "Melodic Accent: Experiments and Tentative Model," AmJ Psyclrl 12, 546-560. Todd, NPMcA 1989. "Ccmputational Model of Rulato," Cortemp Mus Rev 3, 69-88 Todd, NPMcA 1994. "Aulitcry 'Primal Sketch'," J NewMus Res 23 (1), 25-70 Todd, NPMcA 1995. "Kinematics of Musical Expressionr" J Acuost Soc Am97, 1940-1949, Van den Beighe, G, B De Moor & W Minten 1995. "Modelling Grand Piano Key Action," Comp Mus J 19 (2), 15-22 Vercoe, B 1984. "Syrthetic Perfoumer in Cortext of Live Perfommance." Proc ICMC. Vercce, B 1988. "Comection Machine Tracking of Polyphonic Audio" Proc ICMC. Widmer, G 1995. "Modeling Rational Basis for Musical Expression" Comp Mus J 19 (2), pp. 76-96