Page  204 ï~~The Mathematical Implications of a Pulse-Ribbon Perceptual Organization of Pitch Gregory H. Wakefield Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor MI 48109 ghw@eecs. umich. edu Abstract We present an alternative many-to-one mapping of frequency space onto pitch space that is motivated by the Patterson pulse-ribbon model of pitch perception. It provides a logical accounting of the pitch salience of single sinusoids, the dominance region, and the perceptual decomposition of harmonic sources into their tone chroma and tone height; phenomena for which standard harmonic-based analysis methods must resort to ad hoc assumptions. We develop the mathematical structure of this chroma analysis and discuss its extension to problems of source parsing and pitch-strength scaling. 1 Introduction Visual representations of acoustic signals are often used to draw inferences about the composition of the signal. Yet, just because one "sees" in the representation what one believes is heard in the signal does not mean that algorithms for automatic pattern recognition are necessarily easier to formulate in the visual domain than in the original one-dimensional acoustic domain. The present paper considers the mathematical attributes of one such visual representation that has been proposed for extracting pitch information from the acoustic signal and contrasts these with the attributes of more conventional methods. We will argue that, in this case, "seeing" does suggest a mathematically useful approach to pitch determination. 2 The Auditory Image Model and the Pulse-Ribbon Transformation A common goal in computational audition is to provide an algorithmic account of the transduction, transmission, and processing of acoustic signals by the human auditory system. Insight into the effects of such mappings of acoustic signals is often gained by visualizing the intermediate stages of processing through computational auditory models. Patterson's Auditory Image Model (AIM) is one such instantiation of a computational auditory model which focuses primarily on the transformation of acoustic energy into a stabilized auditory image (Patterson et al, 1995). This stabilized auditory image (SAI) has arguments of frequency (or equivalent rectangular bandwidth center frequency), integration interval (a temporal delay term that functions something like an autocorrelation lag), and absolute lime. A number of interesting features of auditory 1. Work supported by the MusEn Project with funds provided by the Office of the President of the Universit" of Michigan and by a grant from the Army Research Office. Ft. Meade. Md. source separation can be found by viewing the SAI movie as features on the frequency-integration interval surface evolve over time. Patterson has proposed a further transformation of the SAI to provide a primitive representation of pitch (Patterson, 1986). This transformation wraps the SAI along a log21 spiral so that a ray from the origin intersects integration times which are related by powers of 2, as suggested by the figure below: 8 4 2 0 -2 -4 -6 -10 -5 0 5 10 Patterson has suggested that the perception of pitch is determined by features of this pulse-ribbon pattern as organized along the intersection of rays with the spiral The mathematical consequences of this pulse-ribbon model of pitch perception are the focus of the present paper. 3 Mathematical Properties of the Pulse-Ribbon Model 3.1 Chroma Equivalence Classes It turns out to be easier to develop the mathematical properties of the pulse-ribbon model by considering the frequency dual of the log2t spiral representation. In this case, the spiral is indexed by frequency rather than by integration time, so that frequencies along the same ray are related by powers of 2. Let t', denote the equivalence class of frequencies Wakefield 204 ICMC Proceedings 1996

Page  205 ï~~Ca = {lf= 2k2, k = 1,2,...,aE [0, 1)} (1) Then frequencies that share a common ray of the frequency dual of the pulse-ribbon representation belong to the same equivalence class. The equivalence class Ca arises frequently in musical contexts. The index a (or some monotonic transformation thereof) is called the chroma of a sinusoid at frequency fe Ca and the specific k- (which determines which octave) for that frequency is called its (tone) height. Two important mathematical properties of C, the set of Ca as a is varied from 0 to 1, are that: C is a complete covering of frequency; and that the sets of C are mutually disjoint. The first property means that any frequency has a chroma value; the second property means that this value is unique, e.g., no two chroma equivalence classes share common members. 3.2 Harmonic Equivalence Classes The partials of many musical instruments can be modeled, to a first approximation, by a harmonic series. Thus, we could say that the musical instrument playing at a given fundamental X defines its own equivalence class of frequencies H. H, = {f'= kX,k=1,29...,.X?>0} (2) Two important mathematical properties of H, the set of H as X ranges over all frequencies, are that: H is a complete covering of frequency; but that the sets of H are not mutually disjoint. The first property means that any frequency has a fundamental; but, the second property means that this value is not unique, e.g, two different Hx can share common frequencies. The second property of harmonic equivalence classes poses serious problems for standard PDAs. Since these algorithms are organized around harmonic equivalence classes that have frequencies in common, different pitch hypotheses are not conditionally independent. This not only complicates the statistical analysis of such algorithms, but it may compromise the performance of such algorithms. 3.3 Chroma Representations of Harmonic Series: Perceptual Correlates The mapping of the frequencies of a harmonic series into its chroma classes for a fundamental /0 results in the following octave sets being mapped into individual chroma classes: {.fo, 2fo, 4fo, 8fo, 16fo... }{3f0, 6f0, 12f0....} This mapping reveals that the chroma class associ ated with the fundamental of the harmonic series is heavily weighted by the low-order partials of that series, and that this relative weighting diminishes as one considers the chroma classes built around the 3rd, 5th, 7th, etc. partials. This set-membership property supports the observation that the dominance region for residue pitch lies between the 3rd and 8th partials, since these partials enter into the first four equivalence classes of a chroma representation of a harmonic series. It also is suggestive of the long-standing, but poorly understood, role of odd and even harmonics in timbre - the chroma class associated with the fundamental provides a dense sampling of the even harmonics below 10, while the next four class in the expansion pick up the remaining odd harmonics. Finally, this representation explains the salience of the pitch of a sinusoid: since it maps into exactly one chroma class, there should be no ambiguity as to its pitch (except for octave errors, which clearly points to the need for a second process that determines tone height (e.g., Patterson, 1990)). In contrast, harmonicbased pitch representations must invoke an entirely different set of rules to handle this simple and least ambiguous case. 3.4 The Chroma Spectrum and PDAs The pulse-ribbon models suggests that PDAs operate on an auditory form of the short-time power spectrum of the signal after it has been collapsed into a measure of the strength of each chroma equivalence class. We refer to this measure as the chroma spectrum Sc(oa) = M[S(/), Vf E Ca l (4) where S() is intended to be some auditory transformation of the short-timepower spectrum of the signal and M is some measure function that aggregates the multiple values into a single number. The actual specification of S() and M[ turn out to be less important than the mathematical result that multiple-hypotheses tests of chroma involve disjoint data and are, therefore, conditionally independent. In studies of PDAs, we have observed that chroma pre-processing reduces sensitivity to the presence of additional pitch sources and background noises and degrades gracefully in ways that match human judgment in cases of ambiguous pitch which we attribute to this conditional independence. References [Patterson, 1986] Patterson, R.D. (1986). "Spiral detection of periodicity and the spiral form of musical scales," Psychology of Music 14, 44-61. [Patterson, 19901 Patterson, R.D. (1990) "The tone height of multi-harmonic sounds," Music Perception 8, 203-214. [Patterson et al., 1995] Patterson, R.D., Allerhand, M., and Giguere, C., (1995). "Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform," J. Acoust. Soc. Am. 98, 1890-1894. ICMC Proceedings 1996 205 Wakefield