Page  00000059 THE EFFECT OF TIMBRE IN CLARINET INTERPRETATION Mathieu Barthet1, Philippe Depallel,2, Richard Kronland-Martinet1, Solvi Ystad' 1 CNRS - Laboratoire de M6canique et d'Acoustique, Marseille, France 2 Music Technology Area, McGill University, Montreal (Qc), Canada ABSTRACT The perceptual importance of timbre variations is investigated in clarinet expressive music performance. Three transformations acting on timbre, rhythm and dynamics and four combinations of them were applied to solo clarinet recordings in order to remove or flatten some of the expressive variations of the performer. Twenty skilled musicians were asked to choose the interpretations they preferred in a pair comparison task. The rankings of the performances are strongly consistent and this, for two different musical excerpts coming from Bach and Mozart pieces. Multidimensional scaling shows that the most prominent factor used by listeners for the evaluation is linked to the timbre of the tones. 1. INTRODUCTION Musical interpretation is an intricate concept which is hard to describe from a scientific point of view. Many experts with various backgrounds such as musicology, psychology and acoustics have worked on the description and understanding of the phenomena related to expressive music performance either focusing on the control of the instrument or the produced sound [1]. Differences between performer interpretations can be characterized by small or large deviations of parameters related to musical expression including timing, dynamics, timbre and pitch [2]. A performer can reproduce expressive deviations of timing and dynamics in a very faithful way when repeating performances of a given piece of music with the same musical intention [1] [2]. Timbre expressive patterns can also present a high within-individual consistency during repeated performances [3] [4]. These results tend to prove that expressive deviations are not random, but due to the interpretive intentions of the performer. Among expression parameters, timing (e.g. tempo, note durations, asynchrony) and dynamics have shown to be essential when evaluating the quality of a performance [5] [6]. Fewer studies have been conducted on the influence of timbre variations in music performance, likely due to the difficulties in defining and analyzing this parameter. In this paper, we investigate the relative importance of timbre, rhythm, and dynamics expressive patterns in performance evaluation. To assess this issue our approach was based on analysis-synthesis giving us the possibility to alter just one factor or several simultaneously. Perceptual tests were carried out on 20 skilled musicians to analyze the effect of the transformations by means of preference judgments. We chose to focus on Western tonal music played on a specific instrument, the clarinet, which acoustic properties allow a fine control of the timbre during the musical sound production. 2. METHODOLOGY 2.1. Sound corpus A professional clarinet performer was asked to play several times two excerpts of standard baroque and classical musical pieces. He was allowed to slightly modify his interpretation. The first excerpt comes from an adaptation for clarinet of Bach's Suite II for cello (BWV 1007/12), and the second one from Mozart's Clarinet Quintet in A major (K 581). Both excerpts were played at a rather slow tempo (respectively 48 and 44 bpm) in order to extend the scope of expressiveness. The musical excerpts were recorded in an anechoic chamber. 2.2. Analysis-Synthesis model A quasi additive synthesis approach has been adopted to resynthesize the original recorded sequences with or without alterations. The tones of the original sequences are first segmented (see [4] for more details) and then analyzed. The resynthesized sequences are obtained by the juxtaposition of the resynthesized tones. The chosen sound model decomposes an audio signal into a deterministic part, constituted by a sum of quasi-sinusoidal components, plus a noise part. Thus, the original tone s(t) can be written as: H s(t) ZAh(t)cos[Ih(t)] + b(t) h~l t 4P h(t)= 27rJofh(t) dt +4Ph(0) (1) where Ah (t), 4P h(t), and fh(t) are respectively the instantaneous amplitude, phase, and frequency of the hth among H sinusoids, 4Ph(0) is the initial phase, and b(t) is the noise component. This synthesis method is particularly suitable for changing tone aspects since sounds are reconstituted as a superposition of partials which frequency and amplitude can be individually controlled. 59

Page  00000060 The analysis of the tones is performed through the use of a bank of bandpass filters. This provides us with shortband analytic signals associated to the tone's frequency components. The frequencies of the filters are adapted to match the frequencies of the tones' components (which follow an harmonic series in the case of the clarinet). The instantaneous amplitude and phase of the tones' components are then derived from the analytic signals. 2.3. Transformations Three basic transformations have been defined to modify independently the expressive variations of timbre, rhythm, and dynamics. We are then able to cancel some timbre variations and/or the performer's rhythmic deviations from the score, and/or soften the original dynamics variations. The original attack and release of the tones were kept to maintain the timbral identity of the instrument. In order to avoid an influence of the intonation as an interpretation criterion, we decided to fix the fundamental frequencies of the tones to their mean value f0 during the sustained part. The instantaneous frequencies of the h tones' components were hence fixed to h * f 0. We verified that this frequency modification was only weakly perceptible in the case of the selected musical excerpts. As the Spectral Centroid is known to be one of the most important perceptual dimension of timbre, we wanted to test the impact of its alteration on the perception of musical interpretation. To attenuate the Spectral Centroid variations during the course of a tone, we used the transformation proposed in [7] in the case of isolated tones. It consists in eliminating the spectral flux while keeping the RMS envelope unchanged. As a matter of fact, eliminating the variations of the spectral envelope shape over time induces the Spectral Centroid to be fixed to a constant value. The Timbre Transformation (TT) can be written as follows: Stimuli Transformation description Mo No transformation MR "Mechanical" rhythm MD Dynamics flattening MT Spectral Centroid freezing MRD Combination of TR and TD MTR Combination of TT and TR MTD Combination of TT and TD MTRD Combination of TT, TR and TD Table 1. Description of the stimuli 2.4. Perceptual test For each of the Bach and Mozart recording sets, we retained the most expressive performances. We then applied the three basic transformations described section 2.3 and their four combinations to produce the various performances listed in table 1. A preference judgment test by paired comparisons [8] was carried out by S 20 participants (judges). Rather than asking a group of judges to rank the set of musical sequences, we decided to present the sequences by pairs and asked the judges which musical interpretation they preferred. The different pairs are the N(N-1)=28 combi2 nations of the N 8 stimuli. In this way, each stimulus is forced to be compared to all the others. This could not happen in a ranking test. Due to the nature of the task and the sometimes subtle differences between the stimuli, the participants were chosen amongst skilled musicians and represent a large panel of performers (clarinetist, guitarist, pianist, violinist, etc.). In order to assess the influence of the musical excerpt, each participant was asked to attend two sessions, one with the sound corpus derived from the Bach sequences, and the other with the one derived from the Mozart sequences. The order of the sessions was determined randomly. A training stage was first carried out so that participants got used to the task as well as the computer interface. The stimuli used for the training were not the same as the ones used in the experience. All pairs of stimuli were presented in a random order. The designation of the first and second stimuli within a pair was also random. Participants could listen to the sequences of each pairs as many times as they wished. At the end of the test, participants had to answer a questionnaire which aims at specifying which strategies they used to make their choices. 3. RESULTS AND DISCUSSION 3.1. Perceptual data The perceptual data can be represented by S individual preference matrices P8 defined for each judge s. The elements of P8, noted a8 (i, j), design whether the stimulus i TT: Ah(t) H- A'/(t) AhErms (t) h =l Ah where A'$ (t) is the new instantaneous amplitude of the hth tone's component, Ah is the time-average of the hth harmonic over the sustained part of the tone, Erms (t) is the RMS envelope of s(t). The segmentation process mentioned above let us obtain the interonset interval namely the duration from the beginning of a tone to the beginning of the next tone. The rhythm transformation TR cancels the performer's rhythmic deviations from the score, by fixing the durations of the tones to durations corresponding to a direct mathematical rendering of the musical score. This transformation is obtained from time-scale modifications on the instantaneous frequencies and amplitudes of the tone's components. A compressor/limiter has been implemented in order to flatten the dynamics variations of the original musical sequences (TD). The compression was followed by a loudness equalization. 60

Page  00000061 has been preferred to the stimulus j: V(ij) [1;NV], i, (3 a1 when the judge s prefers the stimuli i S when the judge s prefers the stimuli j from 0 to 7 (times preferred). The median and quartiles of these individual rankings are shown on figure 1, which 3) hence gives a picture of the judges' mean ranking during the Bach session. Results for the Mozart session are quite similar. The diagonal of the matrix Ps is set to 0. Considering a sample of S judges, we define the sample preference matrix M by the sum of the individual preferences matrices P: s S=l (4) 6, 5 0 -(U) a) -E 3 S2 1 Bach session I I - I i - I I I Q For each different pair, the matrix M reveals the number of judges in the sample who preferred the first stimulus when compared to the second one. Its elements are noted m(i,j). 3.2. Sample homogeneity In order to compute the degree of agreement among individuals in their preferences, we used a nonparametric measure of association, the Kendall coefficient of agreement u for paired comparisons. Values of u have been calculated for the Bach and Mozart session and can be found in table 2. We can notice that the agreement coefficient remains very high whatever the session is. The next step consists in checking its validity. We tested the null hypothesis that there is no agreement among the raters against the alternative that the degree of agreement is greater than what one would expect had the paired comparisons been done at random. As the total number of judges is large (S > 6), a large-sample approximation to the sampling distribution, X2, is used. It is asymptotically distributed as a X2 distribution with 28 degrees of freedom. Results can be found in table 2. In both cases (Bach and Mozart), we may reject the null hypothesis with a risk a < 0.001. We concluded that there is a strong agreement among the participants in their preferences both when the sequences were excerpts from Bach or Mozart pieces. MO MR MD MT MRD MTR Musical sequences MTD MTRD Figure 1. Box and whisker plots of the judges' rankings for each sequences of the Bach session. The box has lines at the lower quartile, median, and upper quartile values. The whiskers show the extent of the rest of the data. Outliers are represented by crosses. Note that the number of times each sequence has been preferred is necessarily between 0 and 7. The sequence which has been the best rated on average is the one which preserves the original variations of rhythm, dynamics and timbre Mo. Note that all the sequences which went through the timbre transformation process (MT, MTD, MTR, MTRD) were on average the least preferred, regardless of the subjective nature of the judgments. The cancellation of the Spectral Centroid variations during the sustained part of the tones induces a loss of the tone quality which is perceived as a loss of musical expressivity. Hence, timbre variations appear to be one major feature of music performance. 3.4. Multidimensional scaling analysis (MDS) We carried out multidimensional scaling in order to map the musical sequences in a space so that their relative positions in the space reflect the degree of perceived proximity between them. Therefore, the first step has been to transform the (N * N) sample preference matrix defined in section 3.1 into a (N * N) dissimilarity matrix. The notion of perceptual distance between the musical performances can be reflected by the preferences of the judges. Let us consider the number of judges who preferred a given interpretation A compared to another interpretation B. The higher (or the lower) the number of judges, the more dissimilar A and B are. A number of judges close to the half of the sample's size (S) indicates that the judges do not agree either for subjective reasons, or because A and B are perceptually similar; this leads to Test session Nonparametric statistics u X dof Bach 0.58 338.20 28 Mozart 0.52 303.80 28 Table 2. Coefficients of agreement among the raters for the Bach and Mozart sessions 3.3. Musical performance ranking As the ratings of the judges were strongly correlated, we observed the global ranking of the musical performances. Individual rankings were identified by adding up the number of times each of the eight sequences was chosen as the preferred one over the 28 trials. This number ranges 61

Page  00000062 almost random choices. We retained the latter hypothesis thanks to the remarks of the raters who found some stimuli difficult to discriminate due to their perceptual closeness. According to these considerations, we built a class of functions f, in order to transform the sample preference matrix M into a dissimilarity matrix D. f, is defined as: Vx e [0; SIN, f(x) = (2x- S)2n (5) where x designs the m(i, j) and n is a non-null integer. After some experiments, n = 1 was chosen. The MDSCAL algorithm developed by Kruskal was used to perform both metric (assumption of interval measurements) and nonmetric (assumption of ordinal measurements) MDS. The nonmetric procedure is retained as it produces a better fit (Kruskal's stress) to the data than the metric one. This procedure yields to solutions such that the distances in the derived space are in the same rank order than the original data. The initial configuration of points is found using the classical multidimensional scaling solution. The rate of decline of the stress as dimensionality increases and Shepard diagrams helped us to define a reliable number of dimensions. The method indicated a 3-dimensional space for both Bach (stress 9.7 x 10-5) and Mozart (stress 1.1 x 10-3) sessions. Figure 2 shows the projection of the MDS solutions on the first two axes for both the Bach and Mozart sessions. MDSCAL with "static" tones. We do not yet have a reliable interpretation for the second axis. 4. CONCLUSION This paper investigates the importance of timbre variations compared to rhythmic and dynamics variations in clarinet music performance. For this purpose, an analysissynthesis procedure was developed to generate from Bach and Mozart solo clarinet performances, two sets of performances for which timbre, rhythmic and dynamics variations were removed or flattened. There is a strong consistency of the preferences between participants for both sets of performances. The average ranking of the participants shows that the least preferred sequences are those which timbre variations (represented by the Spectral Centroid) were removed. Multidimensional scaling confirms that the predominant factor used to rate the performances was the timbre which is directly linked to the tone quality. 5. ACKNOWLEDGMENT This project has partly been supported by the French National Research Agency (ANR, JC05-41996, "senSons", http://www. sensons.cnrs-mrs.fr/). 6. REFERENCES [1] Gabrielsson, A., The Performance of Music, Psychology of Music, Academic Press, 2nd ed., 1999. [2] Palmer, C., "Music Performance", Annu. Rev. Psychol. 48: 115-138, 1997. [3] Farner, S., Kronland-Martinet, R., Voinier, T., and Ystad, S., "Timbre variations as an attribute of naturalness in clarinet play", CMMR05, pp. 45-53, Springer, Pisa, Italy, 2005. [4] Barthet, M., Kronland, R., and Ystad, S., "Consistency of timbre patterns in expressive music performance", DAFx06, pp. 19-24, McGill University, Montreal, Canada, 2006. [5] Repp, B. H., "Diversity and commonality in music performance: An analysis of timing microstructure in Schumann's Triumerei", J. Acoust. Soc. Am., 92(5):2546-2568, 1992. [6] Sundberg, J., Friberg, A., and Fryd6n, L., "Rules for automated performance of ensemble music", Contemporary Music Review, 3: 89-109, 1989. [7] McAdams, S., Beauchamp, J. W., and Meneguzzi, S., "Discrimination of musical instrument sounds resynthesized with simplified spectrotemporal parameters", J. Acoust. Soc. Am., 105(2): 882-897, 1999. [8] Susini, P., McAdams, S., and Winsberg, S., "A Mul tidimensional Technique for Sound Quality Assessment", Acta Acustica, 85: 650-656, 1999. O Bach session (TR + Mozart session.-MTR MT HMT +MRD CV 0 O '(I) c a, E 0 (D Ql c/1D VIR (ARD JATD MT tRD SMTD -0.4 -0.6 0 MO o10 -1 -0.5 0 Dimension 1 0.5 1 Figure 2. Two-dimensional projections of the MDS configurations for the Bach (circles) and Mozart (crosses) sessions The locations of the eight different types of performances along the two principal dimensions are very similar for the Bach and Mozart musical excerpts, which proves that the participants used the same factors to rank the performances in each session. The horizontal axis seems to refer to a tone factor as it opposes the sequences with fixed Spectral Centroid variations (MTRD, MTD MT) from the ones with the original Spectral Centroid variations (Mo, MR, etc). This opposition also appears in the comments of many participants who mention that they preferred the performances with "lively" tones to the ones 62