Page  00000001 Stochastic Estimation of Bsf Tuukka Ilomiki* and Yki Kortesniemit *Sibelius Academy tuukka.ilomaki@siba.fi tHelsinki University of Technology yki.kortesniemi@hut.fi Abstract Similarity plays an important role in providing unity for music, so composers and music analysts alike can benefit from similarity measures. This paper discusses the properties of bsf a similarity measure for twelve-tone rows, estimates stochastically its behaviour, and devises an efficient computation for it and other comparable measures. 1 Introduction A piece of music needs to establish a balance between unity and diversity. Diversity is created by having passages with different identities, and unity is created by the listener being able to recognize relations between the passages. (Mead 1985) Repetition of the same or similar musical material is one means to achieve unity. Consequently, a composer needs means to control the similarity and diversity of his musical material. This need is emphasized in atonal music due to the lack of coherence afforded by tonal relationships. Indeed, Schoenberg's major aim in creating the twelve-tone method was to maintain unity in atonal pieces. Equally, a music analyst also needs to evaluate the similarity between musical objects. The analyst can rely on intuition, but similarity measures provide a formal tool for the task. There are several established similarity measures for sets of pitchclasses (henceforth: pcs), but for twelve-tone rows only a little research exists. Lewin (1976) has suggested a promising similarity measure for rows which he terms badness of serial fit (henceforth: bsf). Unfortunately, the properties of bsf have not been properly studied. In this paper we discuss the properties and overall behaviour of bsf, and present an efficient data representation for its computation.1 After finishing this paper we discovered an even faster algorithm (three orders of magnitude) for computing bsf (Pruesse and Ruskey, 1994). However, it cannot be used for any of the comparable measures described in the paper; for those our results still apply. 2 Similarity measures Traditionally, similarity has been described as a distance between the two objects (e.g. two rows) being compared, so similarity measures are, in fact, dissimilarity measures. In addition, a wellbehaving distance satisfies the requirements of a metric, which is a mathematical formalization of our intuitive notion of the concept of distance. Formally, a metric on a set S is a mapping from SxS to the real numbers satisfying the following three requirements for all x, y, z (Armstrong 1983): (i) d(x,y) > 0 with equality iff x - y, (ii) d(x,y)= d(y,x), and (iii) d(x,z) < d(x,y) + d(y,z). The third requirement is known as the triangle inequality. It is the core of a metric and formalizes the concept of nearness: if x is near to y and y is near to z, then x must be relatively near to z, too. Or to take another view, the shortest path from x to z cannot be longer than the path from x to z via y. In a metric, the triangle inequality holds in all cases. In contrast, for a measure that systematically violates the triangle inequality, it must always hold at least in 66.7% of the cases, because for distances a, b, c, if a + b < c then certainly a + c > b and b + c > a. Further, in a measure, where the distances are selected at random, the triangle inequality holds in 83.3% of the cases. From another point of view, 66.7% corresponds to 0% of triangles that fulfil the inequality in all three positions, 83.3% corresponds to 50%, and 100%, naturally, corresponds to 100%. Now, bsf and logarithmic bsf raise two questions: do they satisfy the requirements of a metric and is the distribution of values sensible? For instance, a measure that places all rows at an equal distance from other rows is a valid metric, but, obviously, has no value in evaluating similarity. 3 Bsf We can think of a twelve-tone row as defining the mutual order of every pair of pcs: the first pc precedes 11 pcs, the second pc precedes 10 pcs, etc. Therefore, the row can be described as 11 + 10 +... + 1 = 66 ordered pairs of pcs. Proceedings ICMC 2004

Page  00000002 In mathematical terms, a twelve-tone row defines a linear order on the set of pitch-classes {0, 1,..., 11}. An intersection of two linear orders is a partial order containing all the ordered pairs shared by the two linear orders, so the order of all pairs is not necessarily defined.2 For example, the two rows in Figure 1 are the same except pcs 6 and 9 have switched places. Thus, in their intersection, the order of pcs 6 and 9 is not defined. A linear extension of a partial order is a linear order that contains the partial order as a subset. o ", o t,,, bo o Figure 1. Two rows from Alban Berg's Lyric Suite. Babbitt (1962) was the first to suggest evaluating the similarity of rows based on the number of shared ordered pairs of pcs. Rothgeb (1976) then defined a similarity measure that counts the number of ordered pairs not shared by the two rows. Bsf, for its part, describes the degree of similarity of the rows in terms of linear extensions. Consequently, calculating bsf requires two steps: First, the protocol, i.e., the pairs of the partial order defined by the intersection of two rows, is created. Then, the number of rows satisfying this protocol is counted. In general, the smaller the protocol is, the more there are rows satisfying it. However, the relationship between an intersection and the number of its linear extensions is very complex.3 For example, if (0,1) is the only (non-reflexive) ordered pair of the intersection, half of the rows qualify; therefore it can be concluded that the two rows are very dissimilar. On the other hand, if the order of every other pair is the same except the order of 0 and 1, only two rows qualify, and they are very similar. So, in Figure 1, the order of the following 11 pairs of pcs has been changed: (9,7), (9,2), (9,8), (9,1), (9,3), (9,6), (7,6), (2,6), (8,6), (1,6) and (3,6), and the bsf of these two rows is 42. The values in bsf range from 1 to 479,001,600. The minimum value is one, as a row is the only linear extension of itself. Also, as a row and its retrograde have no common ordered pairs of pcs, their intersection has 12! = 479,001,600 linear extensions. Unfortunately, bsf does not define a metric. Obviously, the first requirement is not satisfied, as 2 The intersection always contains all reflexive pairs (i.e. pairs of the form (x,x)), but throughout this discussion they can be ignored. 3 Indeed, it has been proved to be a #P-complete problem (Brightwell and Winkler 1991). the minimum value is 1. The second requirement, however, holds trivially. Finally, the triangle inequality does not hold. Consider the linear orders on the set {0,1,2}. Table 1 shows three pairs of linear orders, their intersections and bsf. As bsf(012,021) + bsf(021,210) = 5 < 6 = bsf(012,210) shows, the triangle inequality does not hold for these three linear orders. Table 1: Example of bsf on the set {0, 1, 2} Linear orders 012 021 021 210 012 210 Intersection Bsf 01, 02 2 21 3 6 Lewin also suggested using logarithmic scale in bsf (without stating explicit reasons). We wanted to find out whether the logarithmic variant would define a metric. Nicely, In 1 = 0 so the first requirement of metric is satisfied, and the second, again, holds trivially. The scale is easily calculated to extend to In 479,001,600 - 20. The third requirement is not so easy to evaluate: bsf of the rows in Figure 1 can be calculated by hand, but for the general case, a computer application is needed. Further, from a compositional point of view, also the set of rows satisfying the protocol is of interest: they provide raw material similar to both rows. It should be noted that the idea behind bsf can be extended to any property of rows, e.g. ordered triplets of pcs or the harmonies formed by consecutive pcs. In this paper, we use bsf as our example. 4 Baseline implementation A twelve-tone row is usually represented as a string of pcs, which is naturally represented as an array of integers on a computer. With such a representation, the first step of bsf, creating a protocol, is performed by taking all pairs of one row, and, for each pair, checking whether it is found in the other row. The matching ones are then added to the list containing the pairs of the protocol. The second step is then generating all rows, and checking for each, if it contains the ordered pairs of the protocol. With this approach, calculating the bsf of two rows using straightforward C++ code took about 140 seconds on a PowerMacintosh G4 at 867MHz. Evaluating bsf of several pairs of rows simultaneously reduced the time to approximately 120 seconds per bsf of two rows, as the cost of generating all rows is shared amongst the protocols. Clearly, a brute force approach would be too slow to answer our questions from Section 2: Testing all (12!)3 - 1026 possible triangles would last longer than 240 billion years even if every person Proceedings ICMC 2004

Page  00000003 on earth had one of the aforementioned machines participating in the calculation. Even eliminating all symmetric cases would not bring us even close to the realm of possibility. 5 Optimised implementation We used three methods to speed up the process. First, to get an idea of the behaviour of bsf, we do not need exact results - a stochastic method can be used to get an approximation. Secondly, the calculation can be distributed to a number of machines, and thirdly, we can optimise the implementation. We can turn the calculation into a Monte Carlo survey by calculating bsf of randomly chosen rows. To get high quality random rows (necessary for a reliable survey), we used an algorithm that creates random permutations provided we have random numbers (Reingold, Nievergelt, and Deo, 1977). To get the required high quality random numbers we used the Mersenne Twister (Matsumoto and Nishimura 1983), which is particularly suited for stochastic surveys. A large number of random rows provides a suitable approximation for the metric question and the distribution. In this case, a set of some 20 million triangles gives us high enough confidence. Still, the calculation of these 20 million triangles would take our computer 228 years. Secondly, distributing the task provided a further boost - we used some 200 machines with an effective speed of 51 times our computer. Hence, the time required was dropped to 4 years. The one thing we had to make sure was that the machines were not generating the same random permutations thus skewing the results. Fortunately, the Mersenne Twister has a particularly long period, which means that if it is initiated with different values, it can be used on several computers without getting same sequence of random numbers. The third improvement we made was the optimised implementation. In the baseline implementation, most time is spent matching rows to protocols - traversing the array to find out if a pair in the protocol is in the row. So, the goal was to make this comparison faster, even at the cost of making some other operations more expensive. The key point was that we devised a much more efficient method to represent rows and protocols. There are 12*11 = 132 possible ordered pairs of distinct pcs. A linear order contains exactly 66 of them. However, linear orders are trichotomous, that is, for each pair of distinct pcs x and y, either x < y or y < x. Therefore 66 bits are enough to encode the pairs in a linear order as shown in Figure 2. I 0<1 0<21... 0<11 1<2 1<31...19<11 10<11 Figure 2. The 66 bits representing a row. Partial orders are not trichotomous, as the order of every pair of pcs is not defined. Therefore, an intersection of two rows requires all the 132 bits (= two 66-bit vectors). The bit-vector representation of an intersection is created as follows: if rl and r2 are two rows as bit-vectors, the first 66 bits of the intersection (i[0]) are computed using the bit operation rl && r2, and the last 66 bits (i[1]) are computed as rl || r2. In other words, a bit 1 in the first half of the intersection means that both rows share this property, and a bit 0 in the latter half means that both rows share this property. Given a row as a bit-vector r, it is straightforward to test whether it is a linear extension of a given intersection i[]. Bit-vector r must have all the bits set that are set in the first half of the intersection (i[0]), and all the bits unset that are unset in the latter half (i[1]). Therefore we are testing that ((r && i[0]) == i[0]) && ((r || i[l]) == i[1]). Creating a bit-vector representation of all rows is costly, but now the comparisons become very cheap. And, if there are multiple intersections to evaluate, the same bit-vectors can be used for all of them. For a single bsf, the baseline implementation performs better, but the optimised solution shows its strength as the number of cases rises (Figure 3). In our case, the time per triangle was reduced by factor of 50 to less than 3 seconds per intersection and it was now possible to complete the calculation in roughly one month. Baseline Optimised ------------ 1 2 3 4 5 6 7 8 Figure 3. Running times of the implementations as a function of number of intersections to evaluate. 6 Results and Analysis In total, we evaluated 26 million triplets and saved 40 million distances for the distributions. The triangle inequality holds in approximately 71.39 + 0.02% (confidence of 95%) of the cases, which corresponds to 14% of triangles fulfilling Proceedings ICMC 2004

Page  00000004 requirement 3. We can conclude that bsf violates the triangle inequality rather systematically. The survey further suggests that the triangle inequality does hold for logarithmic bsf, as it proved correct for all tested cases. In the absence of a formal proof, we cannot be sure that it always holds, but the likeliness is very high, indeed. From these results we can conclude that the logarithmic bsf is a well-behaving distance, and hence, a candidate for a useful similarity measure. bsf Logarithmic bsf Figure 4. Distribution of distances of bsf and logarithmic bsf. Bsf and logarithmic bsf have very different distributions of values. As Figure 4 shows, the distribution of logarithmic values creates a centred belllike curve: only a very small number of rows are considered close, which concurs with the intuition of there being only a limited number of similar rows. Also, the majority of the rows are not extremely distant, either, but more or less indifferent with only a few truly dissimilar rows. In contrast, in the regular bsf, 99.8% of the values are in the lowest percent of the scale. 0 50 100 150 200 250 300 350 400 450 Figure 5. Time per triangle for different set sizes (in seconds). The implementation also presented a few interesting findings. First, there is a maximum size for the set of triangles being evaluated simultaneously beyond which the time per triangle goes up (Figure 5). In our case, the maximum size was determined by the size of the LI cache of the processor, which could fit almost 300 triangles. Apparently, the limiting factor is how fast we can feed new triangles to the processor - and the L1 cache is naturally significantly faster than the L2 cache. The behaviour remained the same with other processors having different size L1 caches - then the optimal set size was correspondingly affected, and a bigger cache provided better performance. Currently, counting bsf for just one pair is not cheap, because the creation of bit-vectors of all the rows is time consuming. However, a computer with sufficient memory (about 4 GB) could store all rows in memory and significantly speed up the calculations. As to future work, a mathematical proof that the triangle inequality holds for the logarithmic variant would be a natural extension of this work. A completely different direction would be to study, the usefulness of the similarity values: how well can they be used to analyse compositions and how much assistance they can provide to composers. 7 Conclusions Logarithmic bsf seems to be a viable tool to evaluate similarity of twelve-tone rows. It defines a metric and it has a distribution, where most rows are neither similar nor dissimilar. Representation of musical data as a bit-vector improved the performance significantly. The same approach should prove to be successful in other computation of musical data. The source code (in C++) is available at http://www.iki.fi/tuukka.ilomaki/icmc2004. References Armstrong, M. (1983). Basic Topology. New York: Springer-Verlag. Babbitt, M. (1962). "Twelve-Tone Rhythmic Structure and the Electronic Medium." Perspectives ofNew Music 1(1), 49-79. Brightwell, G. and P. Winkler. (1991). "Counting Linear Extensions." Order 8(3), 225-242. Lewin, D. (1976). "On Partial Ordering." Perspectives of New Music 14(2)/15(1), 252-257. Mead, A. (1985). "Large-Scale Strategy in Arnold Schoenberg's Twelve-Tone Music." Perspectives of New Music 24(1), 120-157. Matsumoto, M. and T. Nishimura. (1983). "Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator." ACM Transactions on Modeling and Computer Simulation 8(1), 3-30. Pruesse, G. and F. Ruskey. (1994). "Generating Linear Extensions Fast." SIAM Journal on Computing 23(2), 373-386. Reingold, E., J. Nievergelt, and N. Deo. (1977). Combinatorial Algorithms: Theory and Practice. Englewood Cliffs: Prentice-Hall. Rothgeb, J. (1976). "Some Ordering Relationships in the Twelve-Tone System." Journal of Music Theory 11(2), 176-197. Proceedings ICMC 2004