# Temporal-Gestalt Segmentation - Extensions for Compound Monophonic and Simple Polyphonic Musical Contexts: Appl. to Works

Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact mpub-help@umich.edu to use this work in a way not covered by the license. :

For more information, read Michigan Publishing's access and usage policy.

Page 7 ï~~Temporal-Gestalt Segmentation-Extensions for Compound Monophonic and Simple Polyphonic Musical Contexts: Applications to Works by Boulez, Cage, Xenakis, and Ligeti Yayoi Uno and Roland Hubscher, College of Music & Department of Computer Science, University of Colorado, Boulder CO 80309-0301 email address: uno@spot.colorado.edu; rolandh@cs.colorado.edu Abstract: This paper presents extensions of James Tenney and Larry Polansky's temporal-gestalt segmentation algorithm and applications to selected works by Pierre Boulez, John Cage, lannis Xenakis, and GyOrgy Ligeti. The algorithm is based on the Gestalt principle of cohesion and segregation; temporal-gestalt (TG) boundaries are formed on the basis of the disjunction measure (DM) of musical events, the weighted sum of intervallic distances between a given musical event and the preceding one with respect to the musical parameters involved. TG boundaries are formed hierarchically from the lowest to successively higher levels of musical structure. In our extension, we posulate two categories of polyphony, compound monophony and simple polyphony, after Tenney. I. Introduction. Recent studies offer several different analytical approaches to the segmentation of posttonal musical repertory; Jean-Jacques Nattiez's semiological approach, Lerdahl and Jackendoff's generative theory and its extension to atonal contexts, and most recently, Taavola and Lefkowitz's system for determining segmentation based on the ratios of discontinuities with respect to musical parameters involved. Our approach expands upon a hierarchical model developed by James Tenney and Larry Polansky in the late '70s. Their segmentation algorithm explores the effect of relative weights of musical parameters on segmentation at various hierarchical levels of musical structure. While the application of the original algorithm was limited to monophonic music, our research provides preliminary extensions for application to two types of polyphony in post-tonal music. Our objective in this research is two-fold: 1).to examine the relative salience of all musical parameters in determining the segmentation of posttonal music; 2) to define the "fit" between syntax and design-the correspondence between the structural boundaries determined by the generative syntax and the formal boundaries articulated by the segmentation algorithm. First, salience refers to the relative strength of a given musical parameter in effecting a break or disjunction in the continuity of musical events. It is important to consider not just the roles of pitch and duration in isolation, but the roles of all other musical parameters in this regard. Second, the syntax that generates a piece, for instance, twelvetone rows, represents the underlying grammatical organization, while the segmentation algorithm models the change in nuance or rhetorical emphasis within the musical continuity of a given piece. As in speech, the knowledge of grammar is essential to the comprehension of the underlying structure, but the rhetorical facets of speech, such as inflection, nuance, and vocal quality determine the emotive character as well as shape the internal grouping of musical events within a piece. The segmentation algorithm provides a tool that measures the extent to which the rhetorical or gestural design supports the syntactical structure of a given piece. ii. Fundamental Tenets. Tenney and Polansky's approach to segmentation is based on the Gestalt principles of proximity and similarity. Their fundamental hypothesis is that cohesion and segregation is measured by the interval magnitude of the musical parameters involved. Wider intervallic distances in musical parameters between adjacent time-spans cause a disjunction in the musical continuity, while narrower intervallic distances cause the adjacent time-spans to cohere as a temporalgestalt (TG) unit. Tenney and Polansky define a TG unit as musical event that is internally cohesive and externally segregated from comparable time-spans immediately preceding and following it. First, the input values of various musical parameters, i.e., pitch, duration, dynamics, timbre, temporal density, are stored as a record in a linked list as shown in Fig. la. The values in each parameter are encoded in real numbers, then scaled to fit the common range between 0 and 1. Each TG at the root level, the lowest level of hierarchy, therefore, contains a set of real numbers that represents the values of participating musical parameters. Fig. la (duration Element pitch (E) dynamics td, timbre, DM leftChild rightChild (LIEIi I EZR nextElement Second, the disjunction measure (DM) of each note (indent n) is computed by taking the weighted sum of intervallic distances to the previous note (index n-l) with respect to number (k) of musical parameters, P, involved as follows: k DM[nI= Y(Pilnl-Pitn-1l)*Wi i=1 ICMC Proceedings 1994 Music Works

Page 8 ï~~Temporal-Gestalt Segmentation Uno and Hiibscher The weights for musical parameters are exterior variables that can be adjusted by the analyst everytime the program is run. The analyst calibrates the relative weights for the musical parameters (scaled between 0 and 1) heuristically until the optimum TG segmentation, formal divisions that approximate those based on one's hearing of the piece, is obtained. Finally, the TG boundaries are determined at successively higher levels by the following procedure. Given TGe(x,y) where e = hierarchical level and the time-span of the TG ranges between x and y, then the TG element at the root level is defined as TG1[i] = n[i], where n[i] is a note at index i. Then the higher levels are defined recursively as follows: TGe[i...k] = TGe[i] TGe[i+1]...TGe[k]. The procedure can also be illustrated graphically as shown in Fig. lb. Here, the first disjunction at the root level occurs at x. The hierarchical levels above root are called clang, sequence, segment, section, after Tenney. The program terminates when less than three TGs are formed at the top of the hierarchy. Fig. 1 b F~SEQUFENCE " C". 11 c7v lq >I l!: 1 4 / 5 7',, ROOT 1 2 3 4 5 6 7 x-1 x x+1 peak III. Extensions for Compound Monophony and Simple Polyphony. In designing the extensions of the algorithm for polyphonic musical contexts, I postulate two categories of polyphony, compound monophony and simple polyphony, after Tenney [Meta+Hodos, 102]. Compound monophony refers to music with multiple parts or voices that are, nonetheless, perceived as one composite musical gesture due to the lack of contrapuntal, registral, or timbral differentions among them. For instance, Boulez's Structures la comprises two piano parts, but the parts cannot be distinguished polyphonically due to the pointillistic, athematic character of the music and the constant registral intersection between the parts. Cage's Music of Changes (Book 1) and Xenakis's Herma also fall into this category. Since in compound monophony, musical gestures are heard as a composite unit, we introduce a procedure for averaging and compressing simultaneities into a single musical TG at the root level. As the "before" and "after" conditions in Fig. 2b illustrate, the parametric values of notes that fall on the same time-points (thereby forming a simultaneity) become averaged and compressed into a single strand of TGs. Wherever a simultaneity is followed by another, the DM is averaged as the weighted sum of the DM between all members of the two simultaneities as shown in Fig. 2a. Fig. 2a l k DM[siml,sim2] = XDM(mi,n)*wi,*vj m; an Fig. 2b L6 1 1 before: after C 0 1 2..4 5 6..9 10..11 12..15 16 In averaging the parametric values of a simultaneity, it is assumed that the outer pitches should be weighted more in relation to the inner pitches of the simultaneity; the perceptual salience of the outer notes, of course, varies according to the range, spacing, and timbre of the instrument. At this preliminary level, the outer notes are weighted more strongly based on the bandwidth of the simultaneity for pitch; that is, where the outer notes lie within the regions defined by the standard deviation and the mean for the pitch range of a given piece. As shown in Fig. 2c, the weighting of the outer notes, k, varies according to regions, R, delimited by the distance of standard deviation around the mean, so that the further away the range of the simultaneity is from the mean, the outer notes are weighted more strongly in relation to the innernotes, and vice versa. Fig. 2c m = mean 4= standard deviation R = region k = weight I i { R3 I m-sd m+sd m- (sd*2) A m+ (sd*2) m- (sd*3) m+ (sd*3) mean The second category called simple polyphony refers to music with multiple voices or parts that can be perceived independently of one another due to contrapuntal, registral, and/or timbral differentiations among them. Here the algorithmic extension is applied to the first piece from Gy3rgi Ligeti's Ten Pieces for Wind Quintet. While the piece is characterized by chromatic voice-leading within narrow, intersecting pitch ranges, the timbral differences among the instruments, e.g., flute, English horn, clarinet, and horn, lend polyphonic distinction to the five parts. Music Works ICMC Proceedings 1994

Page 9 ï~~Temporal-Gestalt Segmentation Uno and Hiibscher In the algorithmic extension, the five parts are stored as independent strands of TGs at the root level. Here, in addition to the weights, the analyst explores the level at which the strands are merged. The analyst determines this level according to the optimum TG segmentation obtained. The strength of the polyphonic independence of the five parts is modeled by the hierarchical level at which the five strands become eventually merged: the higher the level at which the strands are merged, the greater the level of independence of individual parts, and vice versa. In the case of Ligeti's work, the TG strands are merged at the lowest or root level. This corroborates our intuitions that the polyphonic independence of the five parts, aside from their timbral differences, is relatively weak; one can.hear gestural connections that cut across different voices frequently. IV. Application. The relative weights for the four works are shown in Fig. 3a; the parameter with the highest weight for each work is highlighted in bold. Sim refers to the external weight placed on simultaneities wherever they occur. Fig. 3a dur durat pitch dyn attack sim The differences in the relative weights of the four works parallel the relative salience of musical parameters in regulating the TG boundaries. Furthermore, Fig. 3b shows the TG segmentation of each work that results from the given weights. Here, the vertical axis of each graph displays the fluctuations in DMs at the next-tohighest TG level, while the horizontal axis depicts elapsed time in seconds. The letters or Roman numerals above the plotted lines show the extent to which the TG boundaries correspond with the structural boundaries articulated by the underlying syntax. In Boulez's Structures la, the TG boundaries are articulated primarily by the weights of simultaneities (0.8) and attack-point duration (0.6). Simultaneities create audible junctures in the course of the piece since the rhythmic alignment of rows allow them to occur only at the beginning of new row boundaries. While this work is frequently cited as an example of integral serialism in the early '50s, note that the gestural design of the piece is shaped not by the serialized parameters, e.g., pitch, duration, dynamics, and attacks, but by the changes in polyphonic density -a factor that was determined on an ad hoc basis. In fact, Boulez's serial method proved to be largely ineffective since the syntactical relationships between pitch and duration become randomized and obscured at the musical foreground. The smaller weights associated with pitch and duration show the relative insignificance of the serial structure in articulating the global design of the piece. Boulez: 0.2 Cage: 0.05 Xenakis: 0.1 Ligeti: 0.1 0.6 0.4 0.9 0.1 0.8 0.15 0.5 0.2 0.3 0.1 0.01 0.01 0.02 0.9 0.8 0.3 0.1 0.5 dur = sustained duration, durat = attack-point duration; dyn = dynamics; sim = simultaneity Fig. 3b <---- exact correspondence; <-......approximate correspondence Boulez: Structures la 10.0 8.0 10.0 8.0 6.0 4.0 sec. 1 MOO) II sa.2 (445) ii. DJab VI VU ixx j J 6.0 D M 4.0 2.0 0.0 Cage: Music of Changes (Book I) 1.G I (000) n.c. 2.G) I J (J.76) sc. 4c(.92) Unit I Unit 2 nt3 2.0 0.0 0.0 75.0 150.0 225.0 t i m e (sec.) 300.0 375.0 450.0 525.0 0.0 ti me (sec.) 55.0 110.0 165.0 220.0 Xenakis: Hernia 0.5 0.4 Division 1 (0.0) ii Division 2 unit] (0.0) I unit2 (.04) I Iun nint4 iiunilS 0v3 6) o (.05) (.06) A A C.AB (AB AB 0.2 = 0.0 100.0 200.0 300.0 time(sec.) Ligeti: Ten pieces for Wind Quintet, I 1.0 0.8 0.6 D M 0.4 0.2 AD I | V.V 400.0 0.0 30.0 60.0 90.0 120.0 time (sec.) Music Works 9

Page 10 ï~~Temporal-Gestalt Segmentation Uno and Hiibscher The assigned weights for Boulez, nonetheless, produce a good fit between row and TG boundaries. Notice how the structural mid-point of the piece (VI), where the rows proceed in retrograde, coincides with where the maximum disjunction occurs. In comparison, the attack-point duration plays a prominent role in regulating the segmentation of Cage's and Xenakis's works. Most importantly, the salience of duration resonates with Cage's aesthetical view of the primacy of duration in establishing a foundation for musical structure. While Cage used the I Ching to derive the musical content of this piece, the musical realization attests to a careful interplay between chance and choice; for instance, it is his choice in determining the probability of sound and silence to be fifty-fifty: this factor allowed for abrupt temporal discontinuities, e.g., prolonged silence between units that contribute to the peak in DM at unit boundaries. As illustrated by the graph, it shows the most abrupt changes in the fluctuation of DMs; it reaches a peak at the beginning of units 2 and 3, as the temporal density increases suddenly. Xenakis's Herma is also dependent primarily on the salience of attack-point duration; it is the temporal discontinuities, abrupt changes between activity and silence, that punctuate the internal juntures in the piece. The pitch organization of Xenakis's Herma is based on the sets A, B, and C, and the transformation of these sets based on the logical operations of union, intersection, and negation (or complementation). The letters inside Xenakis's graph, indicate correspondences between set boundaries and TG boundaries; the statistically homogeneous dispersion of pitches (owing to the stochastic operations) makes it otherwise impossible to distinguish where transitions occur with respect to the underlying set transformation. Finally, Ligeti's work is distinguished from the other three works in the primacy of dynamics in articulating the TG boundaries. This relatively short piece is characterized by two contrasting textures. The first part features micropolyphonic, meandering chromaticism in the five instruments that becomes rhythmically and registrally condensed. The transition to the second part (m. 16 1/2 - end) is characterized by sudden shifts in the dynamic and registral levels; the texture becomes homophonic as the instruments enter in pairs of two or three to sustain long held notes at the registral and dynamic extremes (ffi). The TG segmentation mirrors the abrupt contrast between the two sections; the first section shows gradual decline in DMs as the micro-polyphonic texture is sustained, followed by the sudden peak (at elapsed time 93.38 sec.) that parallels the abrupt textural change in the second section. V. Future Considerations. The segmentation algorithm presents a "neutral" level of perception, where the disjunction is based on the composite changes in intervallic magnitude of musical parameters; the TG segmentation is, however, not influenced by memory of events, knowledge of syntax, familiarity with the piece, all those factors that vary according to the listener's background [what psychologists call a set]. Further refinements need to be introduced as follows: 1) include context-sensitive criteria, i.e. pattern recognition, motivic recurrence, in regulating the segmentation; 2) allow flexibility in the disjunction criterion, so that it is not based solely on the condition that the DM must be greater than the previous and following TGs; 3) develop a "dynamic" as opposed to the "static" system of weights presently used so that the weights can be altered in the course of a piece as deemed necessary; and 4) refine the criteria and procedure for measuring disjunction in simple polyphonic contexts. The last category merits special attention. According to Hartmann and Johnson's study of stream segregation, our ears tend to shift in and out of linear and vertical gestural connections in audition. Therefore, the disjunction for simple polyphony may be more accurately modeled by weighting the DMs formed linearly within each polyphonic strand with the DMs formed vertically between polyphonic strands. This can be expressed as follows. Given n1,n2 and m1,m2 = adjacent TGs for n and m parts, dt = time[n,]-time[m2] and time[m,]-time[n2], dp = the pitch difference p[n2 ]-p[m2], k = scaling factor to change the influence of the vertical distance, exp = exponential function and T changes the shape of sigmoid from a step to a very smooth function. Then the composite DM can be measured as follows: DM[(n1l,n2)(m,,m2)] = DM[n1,n2]+DM[m,m2]+ k*(DM[n1j,m2]+DM[m,n2]) /(1+exp((h-dt*dp) IT). The value of k is inversely proportional to the linear degree of polyphonic independence between voices: the lesser the value of k, the more strongly is the DM weighted on the linear DMs within each strand, and the greater the value of k, the more strongly is the DM weighted on the vertical relationships between voices. The segmentation algorithm can be expanded, in this way, to test different criteria involved in our perception of polyphony. References: Hartmann, W. M. & Johnson, Douglas. "Streamsegregation & Peripheral Channeling." Music Perception, 9/2 (1992):155-184. Lerdahl, Fred. "Atonal Prolongational Structure." Contemporary Music Review, 1987. Nattiez, Jean-Jacques. "Var6se's 'Density 21.5': A Study of Semiological Analysis." Music Analysis, vol.1/3 (1982): 243-340. Taavola, Kristen and Lefkowitz, David. "Generalizing Segmentation: A Multi-Dimensional Approach / Piece-Specific Weighting System." New England Music Theory Conference, 1993. Tenney, James and Polansky, Larry. "Temporal Geitalt Perception in Music," Journal of Music Theory 24/2 (1980):205-242. Uno, Yayoi. The Roles of Compositional Aim, Syntax, and Design in the Assessment of Musical Styles: Analyses of Piano Musicby Boulez, Cage, Babbitt, and Xenakis Circa 1950. Ph. D. dissertation: University of Rochester, 1994. Music Works 10 ICMC Proceedings 1994