Page  436 ï~~Exploring instrument constancy through a new method for the design of synthetic instruments1 Gregory H. Wakefield and Mary Simoni Department of Electrical Engineering and Computer Science Center for Performing Arts and Technology, The School of Music University of Michigan, Ann Arbor MI 48109 ghwceecs.um ich.edu msimoni@umich.edu Abstract A design interface for efficiently exploring and subsequently refining a synthesized musical instrument is proposed. Results are reported that compare the performance of this design interface with others and it is argued that the proposed technique is superior in several areas including sensitivity and time-to-design. Using this design interface, questions of instrument constancy are explored, i.e., what acoustic features inform the listener that the same instrument is playing but in a different register or performance style. Results for changes in fundamental frequency are presented. I Introduction The present paper addresses a small portion of the much larger problems of musical instrument design in digital synthesis. Synthetic instrument design requires a synthesis method, a range of parameters within the chosen synthesis method, a design method that is appropriate for the chosen parametrization, and a method for extrapolating (or generalizing) the selected values of the parameters. 1.1 Highly Interactive Design Methodologies In contrast with technology of a decade ago, microprocessor and dedicated digital signal processing architectures permit a sufficiently powerful computational resource to realize, in real time, a perceptually rich variety of instruments under several methods of digital synthesis. Advances in processing speed affords interactive design methodologies that allow for very short feedback loops between the musician as observer and the musician as a controller which can be modeled as a closedloop control system. Although highly interactive design methodologies have many advantages over other design methodologies, these systems suffer some drawbacks. If the range of the parameter space used to define the instrument exceed I Work supported by the MusEii Project from funds provided by the Office of the President of the University of Michigan. 2. To help understand this distinction, consider the process of painting as one in which the artist-as-controller applies paint to a canvas under the supervision of the artist-as-observer The process is successful as long as the feedback loop is sufficiently short. Imagine how the process of painting would change if the artist could not see what they were painting until several minutes had elapsed. the means of control of those parameters, then the potential benefits of the instrument design are lost. This is a common problem, particularly when the degrees of freedom far exceed the typical controllers, e.g., a joystick control for exploring a design space of three or more dimensions. Even if the controller is properly matched to the dimensionality of the parameter space, the range of the parameter space may prevent the user from efficiently searching such a space. This is a common problem found in audio equalizers where, despite the fact that 1/3 octave systems provide the degrees of freedom often required to properly balance a system, a user may fail to realize a reasonable solution simply because they fail to explore the range of the instrument. Highly interactive systems must also take into account the mapping between the parameters of the instrument design space and the observer's perceptual space. A controller that maps small changes in the instrument design space while producing large changes in the perceptual response, diminishes the effectiveness of the observer as controller. Likewise, large changes in the instrument design space which result in little perceptual change, degrade the design simply because the observer cannot learn the proper mapping of their perceptual response to the control structure. Often the worst forms of mismatch result when nominally uncoupled parameters map into highly coupled perceptual responses. For example, a variation in pitch coupled with intensity for a fixed-frequency sinusoid creates substantial perceptual problems for users of a "pitch/ loudness controller" in which frequency and amplitude are manipulated by separate controllers. Wakefield & Simoni 436 ICMC Proceedings 1996

Page  437 ï~~1.2 Active Sensory Tunings Control Structures We have coined the term Active Sensory Tuning (AST) to refer to a body of techniques that combine psychophysical, optimization, and computer interface principles to create computer-aided search engines by which human observers can efficiently explore large parameter design spaces (Sterian, et al., 1995 With respect to the closed-loop control system above, AST provides additional structures that direct the observer's control into perceptually consistent regions of the instrument design space. This is achieved at the cost of increased complexity of the controller and increased observer load. Despite these costs, our applications of AST have shown that observers are far more accurate discerning optimal parameter values without direct control of the parameters than when they are allowed to directly manipulate such values. Like the closed-loop control system in the previous section, AST is iterative in nature. At the k-th iterate, AST gathers information concerning the optimality of the current set of parameter values, and then generates a new set of parameter values according to a set of update rules. The k+l-st iterate proceeds with the new set of parameter values such that, if the update rules are properly chosen, the set of parameter values converge to a locally optimal solution as k grows large. The difference between AST and the closed-loop control systems above is that AST governs the selection of new parameter values as opposed to the human designer, who must tweak and adjust each parameter directly In AST, the human designer controls the parameters indirectly through their judgments of quality of parameter values and AST invokes update rules to adjust the parameters. We have developed update rules based on several optimization methods that assume little is known about the functional form of the design criteria. Update rules have been drawn from line-search algorithms, Hooke and Jeeve's algorithm, and various modifications of genetic algorithms (Runkle and Wakefield, 1995). Update rules are consistent with the well-known limitations of psychophysical measurement and modeling. Each requires no more than ranked judgments of the ob server Among the many caveats and restrictions in developing AST systems for instrument design, perhaps the most severe is that there is a psychophysical upper bound to the number of ranked judgments a designer can perform. generally on the order of 10. A second restriction is that the number of iterations should remain I Tuning. in this context, is analogous to searching rather than the instantiation of musical temperament. We prefer the acro n'm to have the T rather than the S. in the hundreds, as opposed to the tens to hundreds of thousands that are typically expected of optimization methods. These two factors limit the range of the parameter space so that it is theoretically possible for the designer to reach any desired point in the course of search. 2 Synthetic Instrument Design 2.1 Synthesis Method and Parametrization We have chosen to work within an additive synthesis method (sum-of-sinusoids) in which an instrument s(t) is specified by the amplitudes 4 = (A..A\.) and time constants" (Ti..., of N = 20 harmonic partials for a given fundamental f0, e.g., s(t;f.. t0\,. A, T) = 20 (1) (1 - exp(-//T0\)) cxp(-t/t,,)Acos(2nnfot) n = 1 We also assume one time constant o,\, that characterizes a common rise time for each partial in the instrument's attack. 2.2 AST Design Method The parametric representation is further refined by our choice of a genetic algorithm (GA) search engine for the AST design. Each amplitude and time constant is represented by a 4-bit word so that an instrument in the design space is completely specified by a 4(20 + 20) = 160 bit vector and by a function that relates the 4-bit word to the actual values of the amplitude and time constants. We have explored several quantization rules for mapping time constants into 4-bit words, e.g., linear or logarithmic quantization from T M. to Zt.1.A We have also explored several ranges for logarithmic quantization of amplitude, e.g., 10 - 50 dB. Upon specification of the amplitude and time constant quantization, the fundamental frequency, and the attack time constant, the AST design is initialized by randomly selection of 10 samples from a set of 2i6 possible samples. The designer's task is to rank order these 10 samples according to their own preference. These rankings are used by a genetic algorithm to determine the reproduction probabilities of each member of the sample from which a second generation, e.g., sampling of the set of possible samples. is created by rules of reproduction and mutation. The designer ranks each new generation that has been created based on their rankings of the prior generation until the designer judges that they have reached an acceptable instrument design. ICMC Proceedings 1996 437 Wakefield & Simoni

Page  438 ï~~2.3 General Results Ten samples is too few to guarantee that the designer can cover the 160 dimensional binary space in the process of their search. Instead, the design method will tend to converge within the first ten generations or so to a subset of this space as determined by both the initialization and the designer's preference. In practice, this means that although the instruments designed by this simple form of AST are found to be sonically pleasing by the designer and other listeners, they may not be what the designer had in mind when they first started the design process' The outcome confirms that the AST method is weakly controllable in this application but that the design space is sufficiently rich to support locally optimum instrument designs. The weak controllability provided by the AST method should be contrasted with a control structure in which the designer can manipulate directly each of the twenty amplitudes and time constants. We found that designers were almost always more successful in achieving a sonically pleasing design using AST than a direct control structure in shorter span of time. 2.4 Extended instrument design Depending on the application, the designer may choose to instantiate the instrument at more than one fundamental frequency and incorporate additional control structures, such as intensity or different performance styles, in order to increase the expressiveness of the synthetic instrument. While it is possible for the designer to utilize the AST methodology to define explicitly the instrument at each new fundamental frequency, intensity level, and type of performance style, the weak controllability of the present AST implementation poses a major impediment to the design process. This shortcoming can be handled in at least one of two ways. The first is to modify the AST implementation to provide the capabilities for "zooming" into a desired region of the design space and developing extensions within that region. The second is to identify an alternative set of design rules whereby instruments can be extrapolated from a single exemplar at a given fundamental, intensity level, and performance style to a broad variety of such. We are currently working on zooming techniques. In the present report, we will discuss the application of two extrapolation rules for extending the pitch range of a synthetic instrument. 1. This is analogous to a wvood sculptor who must work around knots and other variations in the texture and consistency of' thc wood as they arc revealed in the process of' carving as opposed to a carpenter wvho can frame a house from an assemblance of' parts according to an architcct's plan. 3 Extrapolation of an instrument design across fundamental frequency In the following, we focus on two simple rules for extrapolation of an instrument design for variable fundamental frequencies, and report psychophysical results from tests of these rules. To further simplify the development, we will assume that each partial shares the same attack and decay time constants. While it is clear from musical acoustics that no one single rule fully characterizes the behavior of instruments over different fundamentals, we consider two limiting cases of"instrument constancy" Body-Invariant Model. The first limiting case is to model the instrument as the convolution of an ideal periodic pulse sequence with period To (p(':T)) and the impulse response of a linear time-invariant system h7() s(t) = Jh(t - T)p(t"T,,)dr (2) A steady-state approximation to this convolutional model yields the sum-of-sinusoids synthesis form () = XH(nw0)cos(nwo) (3) /! where coo = 2r/To and H(ncoo) is the amplitude response of the linear system. In this case, we see that the invariant property under a change in fundamental is the amplitude spectrum of the linear system. An observer hearing instrument invariance must infer this amplitude spectrum from the "spectral envelope" of the signal s(t) 2 This model is a reasonable first-order approximation to the invariance observed in string instruments. Partial-Profile Invariant Model. A contrasting model is suggested by the pipes of an organ. In this case, changes in pitch are realized by changes in the length of the pipe in such a way that a new partial series is generated according to the same acoustic rules. Thus, the steady-state approximation to this system-variant model yields the sum-of-sinusoids synthesis form (t) = cos (n oo) (4) where H,, denotes the relative strengths of the partial 1. This "hearing out" of a spectral envelope is not as out of line as might first appear Indeed. research on speech perception suggests that listeners do this as part of extracting information from speech and there are many' signal processing techniques that have been developed to computationally calculate the spectral envelope. Wakefield & Simoni 438 ICMC Proceedings 1996

Page  439 ï~~series. This model is a reasonable first-order approximation to the invariance observed in woodwinds, for example. A listener who hears instrument invariance in this case must be sensitive to the profile of partial amplitudes, regardless of the absolute frequencies of this profile. To evaluate these two rules of invariance, a twenty-partial instrument was synthesized by drawing amplitudes at random over a linear (0-1) or 40-dB dynamic range. Observers listened to the original and extrapolated forms in a randomized A-B comparison task and were forced to choose which form best preserved the instrument characteristics of the original. On any given trial, observers were free to listen alternately to A or B or the original, as many times as needed before reaching a decision. The actual assignment of which rule was applied to A or B was randomized from trial to trial. Twenty different instruments were synthesized and studied in this manner. In implementing the body-invariant model, a cubic spline fit using the spectrum of the original instrument as knot points was used to determine the amplitudes at the octave and at the fifth. Since we did not extrapolate, the body-invariant model was always bandlimited to the highest frequency of the original instrument (i.e., 2200 or 4400 Hz). To preserve this bandlimiting feature, the upper cutoff of the partial-profile invariant model was similarly limited. The figure below shows the partial amplitudes for one such instrument (#16) for a fundamental of 110 Hz (original, top panel) and the extrapolated spectra for fundamentals of 220 Hz (bodyinvariant, middle panel, and partial-profile invariant, bottom panel). 100 [ 11 L fi{LLLI 0 500 1000 1500 2000 2500 100.1 1 0 5O 1000 1500 2000 2500 10 1J I 102 0 500 1000 1500 2000 2500 Some preliminary results for one subject are presented in the table below as a function of the fundamental frequency (11l0 or 220 Hz), the dynamic range of the partials (40 dB or linear (0-1)), and whether the octave or the fifth was used to generate the shifted fundamental. The leftmost column indicates the instrument identi fication number The data are organized by ranking the average of the responses for the first three conditions in which the signals all shared the same dynamic range. Each condition presents the averages of 3 judgements where the observer's response is scored I for bodyinvariant or 2 for partial-profile invariant. The majority of the conditions support a bodyinvariant model for extrapolating a design to another' fundamental frequency. However, there are numerous cases in which the partial-profile invariant model is preferred. In these cases, there is very little ambiguity about the appropriateness of the partial-profile invariant model over the body-invariant model. The data also raise some concern regarding how to properly extrapolate to multiple notes. While the first four conditions all prefer the body-invariant model, a different design conclusion would result for Instrument #2, for example, had the fifth been used rather than the octave. It is also clear that the absolute spectral variation is important: when extrapolating spectra distributed randomly over linear amplitude, we see very different results even though the spectra are monotonically related. FO (Hz) DR (dB: Comp 4............. 8.................. 13.................. 20............... 7..............: 17................ 2........... 5............1. 14................ 15......... 19 9.......... 12 3............................. 6 10 16 18 110 40 oct 1.0 1.0............. 1.0.................... 1.0................ 1.3................. 1.0................ 1.0............. 1.0.................. 1.0............... 1.3 1.3 1.7.............. 1.3...........2... 2.0.............7.. 1.7............... 2.0 2.0 110 220 40 40 fifth oct 1.0 1.0.......... 0,........................... 0........ 1.0 1.0................................ 1.0 1.0................................... 1.0 1.0..................................... 1.7 1.0 1.0 1.7..................... 1.0 2.0.................................. 1.3 1.3 1.7 1.0.................................... 1.3 1.3................................. 2.0 1.3 1.0 1.7.....1 3............ 7.. 1.3 1.7................................ 1.0 2.0............. 0...................................... 2.0 1.0 2.0 1.3 AVG 1.0................ 1.0 1.0............ 1.1.................. 1.1......................... 1.1 1.2.................. 1.2.................. 1.3.................. 1.3................ 1.3................... 1.4................. 1.6................. 1.6............1...6 1.7............................... 1.7.......... 1.7 110... Lin oct 1. 7 2.0.............; 1.3.................. 1.3................. 2.0 1.3................. 2.0................. 1.3 2.0................ 2.0.................. 1.0................ 1.7 2.0.......... 2...0................. 1.3........... 3 1.3................ 1.0 2.0 References [Sterian et al., 1995] Sterian A, Runkle P, Wakefield GH. Active sensory tuning of windnoise using a genetic algorithm. 1995 International Coqf on Acoustics, Speech, and Signal Processing. [Wakefield and Chiang, 1990]. Wakefield GH. Chiang C-M. An adaptive tuning procedure for the psychophysical study of complex sounds.,JIAcoust Soc Am: 88, S144, 1990. [Runkle and Wakefield, 1995]. Runkle PR, Wakefield GH. Methods for Incorporating Human Preferences in Adaptive Systems. (Iniversity of Michigan: ('ommunicationls and Signal Processing Lahoratory lechnical Report, 1995 ICMC Proceedings 1996 439 Wakefield & Simoni