Page  00000386 ACQUISITION OF VIOLIN INSTRUMENTAL GESTURES USING A COMMERCIAL EMF TRACKING DEVICE Esteban Maestre, Jordi Bonada, Merlijn Blaauw Alfonso Perez, Enric Guaus Music Technology Group, Universitat Pompeu Fabra Ocata 1, Barcelona, SPAIN {emaestre, jbonada, mblaauw, aperez, eguaus} ABSTRACT This paper presents a method for the acquisition of violin instrumental gesture parameters by using a commercial two-sensor 3D tracking system based on electro-magnetic field (EMF) sensing. The methodolgy described here is suitable for acquiring instrumental gesture parameters of any bowed-string instrument, and has been devised by paying attention to intrusion, flexibility, and robustness. After reviewing relevant related work in the field, we give an overview of the application context, pointing out some basic needs to be fulfilled for our research purposes. Then, we present the steps for calibrating the system, followed by details on the computation of a number of relevant instrumental gesture parameters. The use of a number of the extracted parameters to perform score-performance alignment and database automatic annotation is also outlined. Finally, we conclude stating next steps in using acquired data, along with further developments of the methodology. 1. INTRODUCTION Interaction between performer and instrument during a performance gets more complex when dealing with excitationcontinuous musical instruments, as it is the case of bowedstring family, which is often considered among the most articulate and expressive musical instrument family. Capturing violin instrumental gestures parameters (i.e. directly involved in sound production mechanisms) is of great interest, as there have been already a number of works dealing with this issue and with the aim of appyling results both to performance and to research (see Section 2). Particularly for our research purposes, focused on the study of instrumental gestures in violin traditional performance in order to apply them for synthesis, it is important to be able to acquire relevant timbre-related parameters with enough accuracy, while not affecting the performance habits. In this paper, we present a new approach for acquiring violin instrumental gesture parameters in real-time. This approach is based on the use of electro-magnetic field (EMF) sensing (two wired sensors), by using one of the commercially available solutions. The main requirements that we forecasted when devising the proposed methodology were: (1) to have a system capable of gathering as many instrumental gesture parameters as possible, (2) to be accurate, (3) to be easily attached to any violin/bow, and (4) to present a small effect on the instrument playability. As a first step towards our research goals, we use the acquired parameters for automatically annotating a performance database by applying a custom score-performance alignment procedure. The paper is structured as follows. First, we offer an outlook to the related work in the field, highlighting the main differences with the approach presented here. Then, we put our work into context, pointing out our research purposes, and discussing the fulfillment of our application requirements. Section 4 outlines the calibration procedure, while Section 5 gives the details of the computation of a number of instrumental gesture parameters. In Section 6, we present the score-alignment procedure devised for annotating our database. We conclude by discussing further developments of the methodology and future uses of our gathered data. 2. RELATED WORK Research in capturing gestures involved in sound production mechanisms of bowed-string instruments has led to diverse successful approaches that can be found in the literature since already the eighties. In [1] and [2], Askenfelt presents methods for measuring bow motion and bow force using diverse custom electronic devices attached to both the violin and the bow. The bow transversal position is measured by means of a thin resistance wire inserted among the bow hairs, while for the bow-bridge distance, the strings are electrified, so that the contact position with the resistance wire among the bow hairs is detected. For the bow pressure, four strain gages (two at the tip and two at the frog) are used. A different approach was taken in [6], where Paradiso and Gershenfeld measured bow displacement by means of oscillators driving antennas (electric field sensing). In a first application carried out for cello, a resistive strip attached to the bow was driven by a mounted antenna behind the bridge, resulting as well into a wired bow. Afterward, in the violin implementation of this methodology, which resulted into a first wireless measurement system for bowing parameters, the antenna worked as the re 386

Page  00000387 ceiver, while two oscillators placed in the bow worked as drivers. What they call bow pressure is measured by using a force-sensitive resistor below the forefinger (or between the bow hair and wood at the tip). These approaches, while providing means of measuring the relevant bowing parameters, did not allow tracking performer movements. Furthermore, the custom electronic devices that needed to be attached to the instrument resulted to be somehow intrusive, while not being easy to interchange the instrument at performer's demand. More recent implementations of violin bowing parameter measurement introduced some important improvements, resulting into less intrusive systems than previous ones. In [9] and [10], Young measured downward and lateral bow pressure with foil strain gages, while bow position with respect to the bridge is measured in a similar way as in [6]. The strain gages are permanently mounted around the midpoint of the bow stick, and the force data is collected and sent to a remote computer via a wireless transmitter mounted at the frog. Besides the additional hardware attached to the violin, the highly customized bow imposed its use. In [7], Rasamimanana performs wireless measurements of acceleration of the bow by means of accelerometers attached to the bow, and uses force sensitive resistors (FSRs) to obtain the strain of the bow hair as a measure of bow pressure. This system has the advantage that can be easily attached to any bow. Conversely, it needs considerable post-processing in order to obtain motion information, since it is measuring only acceleration. This was carried out afterward by Schoonderwalt [8], who combined the use of video cameras with the measurements given by the acceloremeters in order to reconstruct bow velocity profiles. When looking into the literature for similar approaches to what is presented in this paper, we only find the work by Goudeseune [3], who used a commercial EMF device (same principles as the one we use in our application) for tracking some low-level momevement parameters and use them for controlling some synthesis features in a performance scenario. The procedure of extracting movement or gestural parameteres was not much ellaborated, as he just used speeds or positions/rotations of the sensors in the violin or bow without trying to extract relevant instrumental gesture parameters. 3. APPLICATION CONTEXT The main objective of the work presented in this paper is to propose a new methodology for the acquisition of violin instrumental gesture parameters from performance recordings, and also for the automatic construction of an annotated database including instrumental gesture parameters along with captured audio and video streams of the recorded performance. The analysis of the gesture data present in our annotated database will allow us to to ana lyze instrumental gesture data in order to use them to control sample-based and physical model-based violin synthesis algorithms. Thus, we can define three main con texts for the work presented here, considering the first one to be the most important (it is the focus of the paper) since it is serving as a departing point for further research: * Database construction. Acquisition, pre-processing, and annotation of violin performance database, including relevant instrumental gesture parameters. * Retrieval. Retrieving samples from a violin performance database, based on results and annotations of instrumental gesture parameters analysis, and use selected samples in a concatenative sample-based violin synthesizer. * Mapping. Controlling a violin synthesizer based on physical models, from the streams corresponding to instrumental gestures (e.g. bow speed) that have been acquired, stored and/or (re-)rendered. 3.1. Application requirements We have to consider several requirements in our application context. First of all, since our aim is to capture the most relevant aspects of violin performance focusing on instrumental gestures, we have to define the list of parameters to measure. Extensive and detailed studies [1] [2] have shown the most important timbre related bowing parameters to be bow transversal position, bow transversal velocity, bow to bridge distance, and bow pressing force. In addition, we consider the actual violin and bow movements to be relevant, because they also belong to gestures occurring during the performance process. Another requirement is the possibility to use the acquired data for characterizing and modelling such performance gestures, particularly instrumental gestures. Hence, the accuracy and robustness of our measurements should be as good as possible, with a high sampling rate. Furthermore, the acqusition system/procedure must prove to be non-intrusive, so that the performance process remains unaffected, allowing the performer to play as usual, and avoiding any sound modifications. Not less important is to be flexible and portable, so that it can be easily adapted to any violin with any bow, in order to capture the performer's instrumental gestures while using his own instrument. Moreover, we aim at obtaining as many instrumental gesture parameters as possible, while keeping a computationally cheap data processing. After considering all the previous requirements and concerns, we decided to use the Polhemus Liberty system, a 6DOF tracking system based on electro-magnetic fields (EMF), and consisting of two wired small receiving sensors (4b 0.5cm) and a transmitting source. Each sensor provides its 3DOF for position and 3DOF for orientation/rotation at 240Hz sampling rate, with static accuracies of 0.75mm and 0.150 RMS within a range of 1.5m to the source, respectively. We attach one sensor to the violin body and the other to the bow (see Figure 2) and, from their posi tion and orientation, compute the position of the ends of 387

Page  00000388 ., v Pickup Camera ------------------:,:' Bow:: 6DOF Mic Motin Audio description I description Bow displacement ' Bow velocity it Bow acceleration Energy P itch Bow pressing force pa.rameter itc Bow-bridge distance Aperiodic String estimation i Finger position estimation $Em o-*:i{S*^^ a --, MusicXML | corePerformance Manual annotation n __ - _ ANNOTATED DB Figure 1. Schematic representation of the acquisition process. The two main steps that are described in this paper have been highlighted: motion description and scoreperformance alignment. each string and the hair ribbon at any time. From such positions, we are able to obtain a number of relevant instrumental parameters (see Section 5) extending the four stated above. In order to minimize the intrusiveness of our system, we place the bow sensor at the bow's gravity center (see Figure 2) and attach its wire to the performer's right forearm, so that it does not affect the performer, especially when considering its small weight of 6 gm compared to a typical bow weight of 70 gm. 3.2. Acqusition process overview The acquisition process is developed in three main steps: data capturing, data processing, and database annotation. We have schematically represented acqusition data flow in Figure 1. In our performance recordings, after calibrating the mesurement system (see Section 4), we obtain data from atwo audio sources (microphone and electric violin pikup), a video-camera, and two 6DOF sensors of the EMF tracker. Once the raw data streams are captured and synchronized, some data processing is applied in order to obtain, in real-time, a number of relevant instrumental gesture parameters along with some audio descriptors. Finally, the annotation of the database is carried out off-line by means of score-performance alignment plus some additional manual annotation (e.g. finger being used). In this work, we put emphasis (see Figure 1) into motion descrip tion (see Section 5) and score-performance alignment (see Section 6). Once the data acquisition is carried out, we store instrumental gesture data at different processing levels by following the principles and namespace extensions of the recently proposed Gesture Description Interchange Format (GDIF), as we are describing in [5]. 4. SYSTEM CALIBRATION The calibration process consists of three steps. The first two steps provide the means for obtaining the positions of each one of the string ends for any position/rotation of the violin. The third step will allow tracking the hair ribbon ends for any position/orientation of the bow. The whole calibration process takes no longer than 10 minutes, including the attachment of sensors to the instrument. We are using two sensors Si and S2. For the whole calibration process, as well as during performance, the sensor Si will be attached to the violin body. With respect to the sensor S2, it will be used during the second calibration step as a marker for annotating the positions of the string ends referred to the position and rotation of the sensor S1 (placed in the violin body), while during the third calibration step, it will be placed in the bow, and will remain fixed there during performance. Sensor placements (see Figure 2) have been chosen to minmize the effect to the performance, as confirmed by several performers. Figure 2. Detail of violin and bow placement of sensors during a performance recording. 4.1. The 3 coordinates The whole process of tracking the position of the strings and the hair ribbon is based on basic 3D algebra (e.g. translation and rotation operations). Basically, we apply translation and rotation to some known /3 coordinates (annotated during calibration) representing, for instance, string ends or hair ribbon ends positions expressed in the coordinate system of violin or bow sensors respectively. Such operations allow tracking the position of string ends or hair ribbon ends when violin or bow sensors are moving (as in performance). Next we outline the obtention of such 388

Page  00000389 /3 coordinates, as well as the operations used for tracking string and hair ribbon ends. P / V S v PV Sensor x Source Figure 3. Schematic representation of coordinate systems of source and sensor, along with relevant parameters involved in the obtention of the /3 coordinates. By looking at Figure 3, we can see two coordinate systems, corresponding to the origin and to the sensor. We can obtain the 3 coordinates in two steps. The first step is to obtain the vector b, going from the origin of the coordinate system V to the position of the point p. For doing so, we substract vector pV, which correspond to the origin of vector b, to vector a, as expressed in have devised the first calibration step in order to solve this uncertainty. We attach the sensor S2 on top of a square plastic box (we call it calibration box), as it is illustrated in Figure 4, and define a reference vertex that will be used (instead of the amorphous sensor) as a marker for annotating positions of string ends in the strings calibration step (see Section 4.3). The idea is to find the coordinates of the reference vertex in the coordinate system of sensor S2 (i.e. the 3 coordinates of the vertex). For doing so, we put the reference vertex into a small hole (0 < 0.5mm) on a surface (see Figure 4), and capture position and orientation data of the sensor while performing 3-axis rotations of the calibration box during some time (around 10 seconds is enough). As the distance between sensor and reference vertex remains constant, the measured positions during the capture are defining a sphere with the reference vertex as its center. Thus, we are able to find such center by applying least-squares fitting techniques 2. After that, the estimated center is used for computing the 3 coordinates for every recorded frame's position Ps2 and rotation Rs2 of sensor S2, defining (see Figure 3) point p as the estimated center, vector p as going from the source to the sensor, and vector a as going from the source to the estimated center. Then, we compute the vertex's 3 coordinates as an average. b = a - p (1) The second step is to rotate the vector b so that it finally becomes expressed in terms of the coordinate system V, obtaining then the 3 coordinates. This is expressed in i3p = MR1 * b (2) where MR-1 corresponds to the inverse of the rotation matrix given of V computed from the Euler angles of the rotation of V. Now, for any new position P' and rotation R' of the coordinate system of the sensor, we are able to obtain the new position of the point p, referred to the origin coordinate system, by means of applying the new rotation matrix and translation, as expressed in p' = M * 3 + P' (3) where M4 is the new rotation matrix computed from the new Euler angles of sensor, and P' is the new position of the sensor. 4.2. Sensor calibration step This is first step of the calibration procedure. Due to the amorphous shape of the sensors, the manufacturer is not able to give details about the exact point of the sensor body for which the device is tracking position and orientation. This means a source of error at the range of the diameter of the sensor body (around 0 0.5cm). This is particularily problematic when using the sensor as a marker. Thus, we Sensor S2 attached to Calibration BOX Reference Calibration BOX Calibratio SBO VCcI L S Coordinate System of Sensor S2 X Reference ' Vertex fixed during rotations Sensor S2... G. Figure 4. Schematic view of the first calibration step. Sensor S2 is attached to the calibration box. The aim is to find the 3 coordinates of the reference vertex. Now we are able to express the exact positioning of the calibration box's reference vertex in the source coordinate system (by means of eq (3)), and use it as a marker in the strings calibration step (see below). 4.3. Strings calibration The string calibration step consists on the obtention of the /3 coordinates of the eight string ends, referred to the sensor S1, placed in the violin body. For doing so, we use the reference vertex of the calibration box (see Section 4.2) as a marker. For each string end (four corresponding to the bridge ends, and another four corresponding to the finger2 389

Page  00000390 board ends), we annotate both position and rotation of the violin's sensor Si, together with the position of the reference vertex. With these annotated values, we compute now the eight 13 coordinates as outlined in Section 4.1, defining (see Figure 3) point p as the end of the string, vector p as going from the source to the violin sensor (sensor Si), and vector a as going from the source to the reference vertex of the calibration box. and we decided to use the middle point of the hair ribbon end (see Figure 6) during calibration. Bow tip I Sensor s B Bow Frog, /........................................... Reference point for tip calibration Hair rib... (bottomview) bon Reference point for frog calibration aliibration B ox Reference Vertex at String ends Bridge Sensor S2 Reference Vertex at String ends ýalibration BOX /' Fingerboard end (top nut) Figure 6. Schematic view of the second calibration step. Sensor Si remains attached to the violin body. We illustrate here the placement of sensor S2 on top of the bow, as well as the exact places of contact between hair ribbon ends (middle point fo hair ribbon width) and the bridge S2 end of the string. step. 5. MOTION DESCRIPTION Figure 5. Schematic view of the first calibration Sensor Si remains attached to the violin body. Once we have obtained the 13 coordinates of each of the eigth string ends, we easily get the new position p' of each string end for a given new violin coordinate system by means of applying equation (3), where M' is the rotation matrix computed from the Euler angles of the sensor Si, and P'is the position of the sensor S1. 4.4. Hair ribbon calibration In the hair ribbon calibration, we first remove the sensor S2 from the calibration box, and attach it to the bow, close to the bow's center of gravity, as it is illustrated in Figures 6 and 2. Sensor S2 will remain there during performance. As it was carried out for the case of the strings, we need to obtain the 13 coordinates (now referred to the bow's coordinate system) of the hair ribbon ends. This is carried out separately in two analogous steps. First, we position the bow in such way that the tip end of the hair ribbon gets placed at the bridge end of any string, an annotate position and rotation of both sensors Si and S2. With position and rotation of sensor Si, we are able to obtain (see previous section) the position of the string end. Then, we obtain the 13 coordinate of the tip end by means of steps in Section 4.1, defining (see Figure 3) point p as the end of the string, vector p as going from the source to the bow sensor (sensor S2), and vector a as going from the source to string end. Then, we repeat the process with the frog end of the hair ribbon (see Figure6), and from now on, from the computed /3 coordinates, we are able to obtain, again by applying equation (3), the position p' of the hair ribbon ends for a given position P' and rotation matrix M' of the sensor S2. We are aware of the non-zero width of the hair ribbon, By applying the procedure outlined in previous section, we are able to obtain both the ends of each one of the strings and the ends of the hair ribbon at any time of the performance. Now we give details on the computation methods we apply for the accurate obtention of several instrumental gesture parameters, as it is the case of a number of bowing parameters, the automatic estimation of the string being played, and others as an estimation of the finger position, or even a flag telling us whether the performer is actually playing. 5.1. Bowing parameters computation Responding to our requirement to obtain relevant timbrerelated bowing paramteres, we compute bow transversal position PB, bow transversal velocity VB, bow acceleration AB, bow to bridge distance DBB, and a bow pressing force parameter PF. All of these parameters are computed using the position of the hair ribbon ends, and the position of the ends of the string being played. These two pair of points represent two (ideally intersecting) lines in 3D space, so we base our computations on obtaining the intersection of the hair ribbon and the string being played. Assuming the string being played as a known parameter (it is estimated automatically, see Section 5.2), we first find the shortest path Sp between the ideal (no deformation, see Figure 7) hair ribbon, and the ideal (no deformation) string. Since both the string and the hair ribbon suffer deformations, the intersecting segment Sp will present a non-zero length. We find such segment by considering it to be perpendicular to both the ideal hair ribbon and the ideal string. Once we have found the intersections of Sp with the measured string SP,H and hair ribbon Sp,s, we are able to compute the bowing parameters as detailed next: 390

Page  00000391 Bridge D HM: Measured P's Tip SM: Measured her o, string segment hair ribbon: HA: Actual hair ribbon Fingerboard ~~SP H end SP: Shortest path between the two lines Frog PBASA: Actual string Figure 7. Measured string and hair ribbon segments, computed from their extracted end points, versus their actual configuration. Deformations have been exagerated in order to illustrate the importance of segment Sp. * The bow transversal position PB is computed as the euclidean distance between SP,H and the measured frog end of the hair ribbon. * The bow transversal velocity VB is computed as the derivative of the bow position/displacement dPB /dt. * The bow transversal acceleration AB is computed as the derivative of the bow transversal velocity dVB/dt. * The bow to bridge distance DBB is computed as the euclidean distance between Sp,s and the measured frog end of the hair ribbon. * The bow pressing force parameter PF, we use the modulus of the intersecting line segment Sp, which is perpendicular to the string and to the hair ribbon. This segment will get longer for higher deformations due to pressing force. Thus, we compute the euclidean distance between Sp,s and SP,H, expressing it cm. We are aware of the non-linear correspondence of the segment Sp length with the actual force applied by the hair ribbon to the string, although we have not yet carried out exhaustive tests in order to find some linearization function (highly dependent on bow transversal position and tilt), but rather a qualitative evaluation. For doing so, we implemented a pressing force measurement method similar to the one used in [7], and we have observed a high correlation with our pressing force parameter PF. Therefore, we believe it might become an interesting timbre-related parameter. 5.2. String automatic extraction In order to automatically detect the string being played, we have devised a method based on measuring the angle of the line joining both hair ribbon ends, and the violin plane. The procedure is outlined as follows. First, we obtain the plain of the violin from the ends of the outer strings (four points in 3D space) by means of least squares fitting techniques3. Then, we project the hair ribbon on 3 the violin plane, and compute the angle between the hair ribbon and its projection, as illustrated in Figure 8. From the computed angle, we are able to obtain a good estimation of the string being played by defining sectors in a circunference, as it is also depitced in Figure 8. For the annotation of the angle limits a EA aAD BEG, we carry out a short recording (once the system has been calibrated, as a final calibration step) asking the performer to play a known sequence of strings while we compute the angle a. The sequence of strings is the following: E -- E + A-->A -- A + D-->D -- D + G -->G Then, we manually segment the recorded sequence and extract the limits by averaging the obtained angles along segments where two strings were played, extracting angle limits in a semi-automatic manner. For one of the recordings carried out, we obtained the follwed angle limits: EAB 150, aAD -20, and aEG -190. Due to the asymmetric shape of both the bridge and the top nut, the angle limits are not centered around zero, buth slightly shifted. In order to avoid glitches in the estimation of the string, we applied a 20 hysteresis cycle at the decision boundaries, resulting into a more clean signal. Bridge Tip E String Hair ribbon Angle a n - CEA Hair PROJECTION ribbon onto violin plane A String - AD D String Frog GPtnng GDG Plan Top nut Figure 8. Automatic detection of the string being played. Left figure illustrates the angle a between violin plane and hair ribbon. Right figure depicts curcunference sectors and angle limits assigned each of the strings. Once we automatically extract the string being played, we use it for the computation of several bowing parameters (see Section 5.1). For the particular cases of bow transversal velocity and acceleration, we instead use the ends of an hypothetical fifth string placed in the middle, between string A and string D (we compute its ends from the ends of strings A and D). The reason for doing so resides on considering the computation of bow speed/acceleration to be independent of the string being played, and also on the avoidance of glitches in velocity and acceleration contours due to sudden string changes. 5.3. Other parameters We obtain two additional instrumental gesture parame ters from captured data. The first one is an estimation of the finger position by using the estimation of string being played together with the fundamental frequency contour 391

Page  00000392 extracted from audio. This extracted parameter results to highly useful for correcting the fingerboard end position of the string (replacing it by the computed finger position), and therefore leading to a more accurate computation of the bowing parameters. However, we are not yet applying it for such purpose. By relating string length to fundamental frequency of string vibration, we compute the finger position as the distance DNF from the top nut by means of equation (4), where Ls is the total string length, fos is the fundamental frequency of the open string, and fo is the fundamental frequency extracted from audio. DNF = Ls(1 - fo(4) As a second additional instrumental gesture parameter, we use the distance between hair ribbon and string segments (i.e. the modulus of the segment Sp, see Figure 7 and the bow transversal velocity VB (see Section 5.1), for computing a playing state estimator (playing/not playing), being helpful as a flag for recording segmentation purposes. We have decided to activate such flag when VB = 0 and SSp |> 0.25cm, obtaining satisfactory results by applying some hysteresis at the decision boundaries. 5.4. Results It turns out to be difficult to formally assess the reliability of the whole acquisition methodology, but we give here some figures. In order to quantitatively evaluate the accuracy of the the measurements, we have carried out a test in which we compare, in static conditions, real and extracted values of two bow transversal position PB and bow to bridge distance DBB. For doing so, we painted several ticks on the strings (starting from the bridge end, each cm up to 5 cm), and on the hair ribbon (starting from the frog end, each 15 cm up to 60 cm), and computed errors for several combinations bow transversal position and bow to bridge distance. We obtained average absolute errors in the range of 0.20 cm for bow transversal position, and in the range of 0.25 cm for bow to bridge distance, which we consider low enough for our research purposes. Apart from the intrinsic accuracy of the tracker (see Section 3.1), we believe that error propagation during the computations carried out in motion description is the main source of error. Figure 9 shows acquired instrumental gesture parameters for a scale played at different dynamics and articulations. 6. SCORE-PERFORMANCE ALIGNMENT As a first step for the annotation of our database, we convert the performed scores (Sibelius format 4) into MusicXML files including all the annotations. Then, we parse the MusicXML file and obtain all score annotations, including MIDI pitch, onset, durations, dynamics, bow direction, accents and articulation type (legato or staccato) as annotation files, by means of a custom parser that has 4 staccato de~tache staccato pia no mezzoforte forte,j J.,,,,...... 4 J............................,____________,___________,__________............. '...........^^A.,AA VVV,!,,A.AA h, A X^V ".\-I ji J... ................. _______________,,_____.....___...........-~.; f,4 I -~- -v----vV ------A--~--- ~- -Via--- nAA----- nn--*- ^ yv^^VW^ E|| - M ^A f ~~~:~~~~i~~~~* ~~~' ~~~~:~~~A~~~~~~U~~~~~^ ~~~~:~~i^ ^ r~ -___ L -i-l l^ __ i ___ j- -^_ ___ -: -I"^":"' d~,"~~~~" """"''T ^^V'^^-^"r""" "" "" " cm/s* a.-1iX: 1 05 -0.6 cm 't 3 Z cm 20 25 3) 35 40 45 ~4" seconds Figure 9. Acquired instrumental parameters for a scale played at different dynamics and articulations. From top: audio, bow position PB, bow velocity VB, bow acceleration AB, pressing force parameter PF, bow-bridge distance DBB, string being played STest, playing descriptor, and finger position DNF. been developed for our application. Such transcription provides the nominal onset/offset times of the notes present in the score, but due to the possible deviations introduced by the peformer, we need to make sure that our performance database annotations are well aligned to the score. For doing so, we correct note change times (offset/onset time of consecutive notes) coming from the score transcription, by using some of the instrumental gesture extracted parameters along with a measure of aperiodicity obtained by means of the YIN [4] fundamental frequency estimation algorithm. The procedure can be summarized as follows. First, we define a window cac, {tci - Al,i, tci + Ar,i} around each note change time tci found in the score transcription. Assuming that the performer deviations on note change times will not exceed half the duration of the pair of notes involved (preceeding and subsequent), we define the left and right widths A1,i and Ari of the time window as being half of the duration of the preceeding and subsequent notes respectively. Within the time window w ta, we find the times of any bow direction change t b from zero-crossing of bow velocity VB (see equation (5)), any string change tste from local maximum of aboslute value of derivative of the string estimation STest (see equation (6)), and pitch transitions t fo from local maxima a measure of the aperiodicity function APo obtained by means of the YIN [4] fundamental frequency estimation algo 392

Page  00000393 h c *^ oi 3o0 -alignment procedure used for annotating our performance 20 - 10 --- -- -- ------ database. 0,__________The methodology outlined here serves us for the auto___ __5 matic creation of a violin traditional performance database. ----- -We are already developing methods for applying the obtained results (annotated database) to high-level analysis 5 5.5 6 6.5 7 7.5 8.5 and modeling of violin instrumental gestures. Developed 1 models will allow us in the near future to drive a sample0.5 based concatenative violin synthesizer, while rendering o -.__ - -- -1--..... ---- of instrumental gesture parameter contours will enable to 5 5.5 6 6.5 7 7.5 8.5. _ control physical model-based violin synthesis. We also 0o aim at designing a system capable of automatically recog0 -. 1 nizing violin instrumental gestures from acquired parame5 55 6 6.5 e 7.5 8. ters. As a main improvement of the methodology, we plan to carry out a formal study of the correlation between our ure 10. Detected bow direction changes, string pressing force parameter with the actual force applied to d...o ',; n o-I.... Irnno,,. I,... h v the bow. Fig ohn c nanllu, auun pILaI I,c n1 Lras llgn. g UIII Lr Up. uVw pVsition Bp, string estimation STet, and aperiodicity function APo. At the bottom we observe the audio signal with note transitions, dashed for nominal times and solid for detected times. rithm (see equation (7)). We have illustrated in Figure 10 the detections obtained from a recording excerpt. 8. ACKNOWLEDGEMENTS This work has been supported by Yamaha Corp. We thank violinist Oriol Safia for his long performances. We would also like to thank Carlos Spa and Nils Peters for their disinterested help in calibration and accuracy tests, and to Cristina Gonzalez for reviewing the manuscript. 9. REFERENCES tbc = argtVB(t) 0 fi (5) tste = argt ý STest(t) | 0 (6) tfo0 = argmaxt APo(t) (7) Once the detection process has been carried out, we annotate the obtained change time for each time window. For those windows in which several of the three indicators of note change led to a note change detection, we give priority to the bow direction changes, followed by string changes, and finally by fundamental frequency transitions. We have not yet carried out a formal evaluation of the proposed alignment methodolgy, but we have revised the obtained corrections, observing satisfactory improvements to the score transcription note boundaries. 7. CONCLUSION We have presented a new method for real-time acquisition of violin instrumental gesture parameters using a commercial two-sensor 3D tracking system based on electromagnetic field (EMF) sensing. We have paid attention to intrusion, flexibility, and robustness. Our approach, able to obtain a number of relevant instrumental gesture parameters with low-cost processing of the raw data coming from the tracking system, has proved to be little intrusive for the performer, while being easily ported to any violin or bow, allowing each performer to use her/his own instrument. We have detailed calibration, instrumental gesture parameters computation, and the score-performance [1] A. Askenfelt. Measurement of bow motion and bow force in violin playing. Journal of the Acoustical Society oJ America, 80, 1986. [2] A. Askenfelt. Measurement of the bowing parameters in violin playing. ii. bow-bridge distance, dynamic range, and limits of bow force. Journal of the Acoustical Society of America, 86, 1989. [3] C.M.A. Coudeseune. Composing with parameters for synthetic instruments. PhD Thesis, Univ. Illinois, 2001. [4] A. de Cheveign6 and H. Kawahara. Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111:4, 2002. [5] E. Maestre, J. Janer, A.R. Jensenius, and J. Malloch. Extending gdif for instrumental gestures: the case of violin performance. International Computer Music Conference, Submitted, 2007. [6] J. A. Paradiso and N. A. Gershenfeld. Musical applications of electric field sensing. Computer Music Journal, 21, 1997. [7] N. Rasamimanana. Gesture analysis of bow strokes using an augmented violin. IRCAM DEA ATIAM, 2003. [8] E. Schoonderwaldt, N. Rasamimanana, and F Bevilacqua. Combining accelerometer and video camera: Reconstruction of bow velocity profiles. Conference on New Interfaces for Musical Expression, 2006. [9] D. S. Young. The hyperbow controller: Real-time dynamics measurement of violin performance. Conference on New Interfaces for Musical Expression, 2002. [10] D. S. Young. Wireless sensor system for measurement of violin bowing parameters. Stockholm Music Acoustics Conference, 2003. 393