Page  00000001 DISSCO: A Unified Approach to Sound Synthesis and Composition Hans G. Kaper Mathematics and Computer Science Division Argonne Nat'l Laboratory e-mail: kaper@mcs.anl.gov ABSTRACT DISSCO (Digital Instrument for Sound Synthesis and Composition) represents a unified and comprehensive approach to sound synthesis and composition. Its components share a common formal approach and use similar tools, and it delivers a final product (a musical "event") that does not require further processing. DISSCO consists of a Library for Additive Sound Synthesis (LASS) and a Composition Module (CMOD). Release 1.0 of DISSCO is available as open-source software. 1. INTRODUCTION DISSCO (Digital Instrument for Sound Synthesis and Composition) represents a unified and comprehensive approach to sound synthesis and composition-unified in the sense that its components share a common formal approach and use similar tools, comprehensive in the sense that it delivers a final product (a musical "event") that does not require further processing. DISSCO is a black box that takes data provided by the user and produces a finished object. DISSCO consists of two parts: LASS, a C++ Library for Additive Sound Synthesis, and CMOD, a C++ Composition Module. Release 1.0 of DISSCO is available as open-source software [7]. Details of LASS and CMOD are discussed in Sections 2 and 3, respectively; some recent results obtained with DISSCO are described in Section 4. 2. LASS LASS is built around the idea that a score is a collection of sounds and a sound a collection of partials [1]. It takes advantage of the Standard Template Library (STL) Containers and Iterators. A good number of its features and most of its functionality reflect previous work done with DIASS [4] and DISCO [3]; however, LASS is no longer a MusicN-type program. Oscillators and wavetables have been replaced by function evaluations, sounds are no longer produced by "instruments," and there is no score. LASS can generate an XML file as a record of the score, but this file is not needed as input. The paradigm underlying LASS can be summarized as follows [1]. A piece is a complex wave, the result of superimposing a collection of sounds. Each sound in turn is Sever Tipei Computer Music Project School of Music University of Illinois e-mail: s-tipei@uiuc.edu a superposition of its constituent partials. Partials are the elementary building blocks of a piece; they can be simple sine waves, other wave types, or even white noise. LASS computes the samples that describe the resulting complex wave by adding the contributions from all the partials active at the particular instant of time. 2.1. Features and Functionality The central elements of LASS are sounds and partials, whose static and dynamic attributes (parameters) are associated with enums. Static parameters are the start time, duration, wave type, and relative (maximum) amplitude of each partial; examples of dynamic parameters are the frequency, phase, amplitude envelope, and amplitudes and rates of vibrato (FM), tremolo (AM), amplitude transients, and frequency transients. Parameter values assigned to a sound, such as start time and duration, are shared automatically by all the sound's partials. In addition, the user has the option of specifying individual values for each partial. For instance, a frequency value (440 Hz) can be specified for the entire sound s, s.setParam(FREQUENCY, 440); and all its partials will acquire integer multiples of the 440 Hz frequency; or an individual partial can be assigned a specific frequency, as in s.get(2).setPartialParam(FREQUENCY, 1234); The tuning of partials can be distorted through the FREQUENCY-DEVIATION feature, which modifies each partial's frequency by one-half the distance between it and its nearest neighbor. The result is a nonharmonic tuning of the sound's components through a somewhat random process. Like many other parameters, frequency is a dynamic variable. For glissandi or sound bends, the frequencies of all partials in a sound vary in the same way; on the other hand, one can "detune" a sound whose partials are in a harmonic relationship by applying a different function to each of the constituent partials, thus distorting the ratios of their frequencies. Multiplication by a factor of 1 will leave a frequency unchanged, a factor of 2 will produce a

Page  00000002 pitch one octave higher, and a factor of 0.5 will lower the pitch by one octave. Dynamic variables, which are specified by functions of time, are known to musicians as envelopes. The Envelope class in LASS handles basic operations such as getting the value of an envelope at a specified time or multiplying two envelopes. An envelope can consist of any number of segments. The ith segment is defined by the coordinates (xi, yi) of its starting point in a time-amplitude plane, its type (LINEAR or EXPONENTIAL), and an attribute (FIXED or FLEXIBLE) indicating whether its length is fixed or can be stretched or compressed-a useful attribute, for instance, if a sound is repeated with the same attack and decay but different durations. Here is the specification of the ADSR envelope shown in Fig. 1. ADSR Envelope 0.00 0.10 0.20 0.70 1.00 0.00 1.00 0.80 0.80 0.00 EXPONENTIAL FIXED LINEAR FLEXIBLE LINEAR FLEXIBLE EXPONENTIAL FIXED 0.7 O.Z a 0! 0.1 0.; 0.1 2.2. Modifiers Frequency modulation (vibrato) and amplitude modulation (tremolo) change the main ingredients of the wave at subaudio rates. In LASS, envelopes control evolution of the frequency and amplitude. The user can specify the magnitude (amplitude) of the modulation as a fraction of the basic frequency or amplitude and its rate in hertz. To obtain a constant effect, one uses a one-segment linear envelope with y values 1 at both endpoints. A similar treatment is given to transients, narrow spikes in the frequency and/or amplitude. In sounds produced by acoustic instruments, transients typically occur at the onset of the vibration, in a fraction of the first second. LASS enables the user to apply frequency and/or amplitude transients over any portion of the sound or over its entire duration, depending on the shape of the envelope used. As in the case of FM and AM, the magnitude of the spike and its rate of occurrence can be specified; in addition, the width of the spike can be controlled, the default value being 1,103 samples at a 44.1 kHz sampling rate, or about 0.025 seconds. Two more features deal with modification of the sound in an acoustic environment. Reverberator is an implementation of the reverberator described in Moore's text [5, Section 4.4]. A simple way to use this feature is to specify only the size of the room by a number between 0 (no reverb) and 1 (max reverb). For more detailed control, one can invoke a second constructor and specify the percentage of reverberated vs. direct sound (a dynamic variable), the ratio between the response of high and low frequencies, the gain of the all-pass filter, and the delay (in seconds) of the first echo response. A third constructor replaces the high/low ratio with low-pass gains and the gains of the comb filter. Reverberator can be applied to the entire score, to selected sounds, or even to individual partials within a sound. Spatializer distributes sound waves over a number of tracks, where each track corresponds to a particular output channel. Pan, a simple spatializer, distributes the sound across two channels in a ratio specified by the user. MultiPan implements precisely controlled spatialization over an arbitrary number of speakers in either of two ways. In the first method, one specifies the fraction of the total amplitude that is assigned to each individual speaker and the precise moment of the assignment during its duration. Thus, the user can create "unrealistic" effects such as having all speakers at maximum strength at the same time. The second method assumes that the speakers are arranged in a circular pattern around the listener, and one maps the sound to speakers using polar coordinates. 2.3. Rendering Once the samples have been computed, the score is rendered in.au format with Score::render. Before writing the score to a file with AuWriter::write, the user has the option of applying one of several clipping management techniques: CLIP (sample values outside a specified range are Time. x Figure 1. Typical ADSR envelope. Envelopes are stored with an identifier (id number) in an EnvelopeLibrary. They are usually (but not necessarily) normalized, so scaling may be needed. Envelopes can be created on the fly and applied to any dynamic variable at any time level (see Section 2.2). LASS pays special attention to the perceived loudness of a sound. The perceived loudness depends on the composition of the sound and is a nonlinear function of the amplitudes of the constituent partials [2]. LASS uses the ISO equal-loudness level contours and a partition of the frequency range into critical bands to adjust the amplitude of each partial so as to correctly contribute to the target loudness of the composite sound. The evolution of a partial's amplitude is controlled by means of an envelope whose peak is scaled according to its relative strength in the spectrum. Since LASS allows for detailed control of each partial through envelopes and for any number of partials, a practically infinite variety of spectra can be created. The user can also obtain crescendo or diminuendo effects over the duration of a sound by taking advantage of the envelope multiplication feature and combining the envelope of each partial with other envelopes that incorporate the desired effect.

Page  00000003 clipped to the threshold value), SCALE (a maximum amplitude am is computed for the entire score, and all sample values are scaled by a factor 1/am), CHANNEL_SCALE (SCALE applied per channel), ANTICLIP (a maximum amplitude am is computed for the entire score, and all sample values exceeding a specified threshold value at are scaled by a factor at/am), or CHANNEL_ANTICLIP (ANTICLIP applied per channel). If no technique is specified, the default option is NONE. An XML file can be created as a record of the score. This file contains all the sounds and all the partials with their attributes, as well as all the envelopes employed. Unlike the score of MusicN programs, the XML file is not needed to generate the sounds. 2.4. User Access LASS has a graphic user interface (GUI), written in Java, which generates C++ code. It is intended not so much for composers, who are generally interested in producing a piece with hundreds or thousands of sounds, as for use in introductory computer music courses to enable the students to experiment with various changes in the data and experience their effects on the aural qualities of a sound. More than 25 sample programs offer an option to become familiar with the capabilities of LASS. They are short programs that concentrate on generating one or two sounds to which a particular feature has been applied. One can hear their impact on the sound and, at the same time, see clearly which lines of code were involved. Release 1.0 of LASS is available as open source software on the Web [7], along with documentation produced with Doxygen and a tutorial. 3. CMOD LASS is a library that does not include a "main" program. To take full advantage of it as a compositional tool, one needs another piece of software to drive it. The informed user can either write the code that best suits a specific goal or use CMOD, the module that accompanies LASS. CMOD, the Composition MODule, also written in C++, represents a "floating hierarchy" that creates objects in any order. Similar to LASS, it involves collections whose members are, in turn, themselves collections of objects. CMOD consists of classes dealing with various aspects of a composition and a number of utilities. It has inherited its basic structure from DISCO [3] and takes advantage of LASS features such as the Envelope class. 3.1. The Event Class The central element of CMOD is the abstract Event class from which other classes are derived. Each event contains an arbitrary number of layers, and each layer contains an arbitrary number of objects of various types. An object could itself be an event containing other objects; an entire piece is viewed as an event, as are each of its sections and collections of simultaneous or sequential sounds (chords or melodies). In addition, one can imagine other events representing various traditional or not-so-traditional elements of form. New objects may be added to the list, and not all events need to be active at all times. Since the number of events and their relationships may differ from one project to another, they are simply called Top, High, Medium, Low, and Bottom. At the end of this chain is the sound, which is no longer an event: beyond the Bottom object, LASS takes over. All events have at least four attributes: name, start time, duration, and type; all except Bottom also have a density attribute. The name of an event is that of a text file containing the information needed for the realization of the event. Two methods of the abstract Event class are shared by all derived classes: Build and CreateNewObjects. Build establishes the main characteristics of the object: the number of layers and the number of types in each layer. Since each object (except the ones created by Bottom) is itself a collection, the number of objects contained in this new collection is also determined here. CreateNewObjects is a loop whose upper limit is the number of objects contained in the event. A pass through the loop corresponds to the creation of a new object, whose start time, duration, and type are selected from a file of admissible values or according to a prescribed probability distribution following a procedure borrowed from DISCO [3]. At the bottom of the loop, SelectNextEvent checks the name and type of the new object and calls its constructor. 3.2. Floating Hierarchies The way CMOD is set up-classes derived from a templatelike Event, objects that are in fact collections of lower level events-suggests a hierarchical order. This suggestion is reinforced by the fact that, unlike LASS, CMOD does have a "main" program that triggers the top event (the piece). This hierarchical structure is, however, not stationary. Levels can be skipped, a top event might be a collection of bottom objects, or the highest level of the structure might be only a low-level event; in fact, these arrangements may coexist in the same piece. Moreover, the objects of a collection do not have to be ordered in time, so similar or related objects can be placed at different instances and connections can be made between events that belong to different hierarchical levels. The "main" program acts like a benign administrator, a facilitator who presides over an orderly process without imposing its own will. It provides only the means (I/O files, envelope library, score, reverberator objects, etc.), starts the process and makes sure that it ends with a product (the sound file). 3.3. Implementation The Bottom object provides instructions for LASS. In stead of SelectNextEvent, it calls Implement, which creates a sound in LASS and sets parameters such as start time and duration,

Page  00000004 Sound s; //create sound s.setParam(START TIME, stimeSec); s.setParam(DURATION, durSec); followed by frequency, number of partials, and loudness. Various options are available. For example, the frequency can be selected randomly from a continuum, part of a well tempered scale, an overtone of a low fundamental, or part of a sequence (tone row), or it could be filtered in by a sieve [6, Chapter 9]. Similarly, deciding the number of partials is part of a more involved process, where a spectrum is created by choosing an envelope for each partial and scaling it according to some preexisting rule. The modifiers described for LASS are present in CMOD as well. They are identified by enums, each has an associated probability of occurrence and, depending on whether they are static or dynamic, a value or a scaled envelope. A simple mechanism ensures that related features such as vibrato magnitude and rate are treated as a group. Reverberation and spatialization are treated separately but similarly to the way the modifiers are handled. 3.4. Utilities Selection procedures that apply at all levels have been implemented as "utilities" (not a class). Utilities deal with three main areas: random choices and/or choices operated on a continuum, choices involving discrete elements, and envelope making. Other specialized routines are dedicated to various tasks, from partitioning a segment into golden mean ratios to translating density percentages into numbers of sounds per second or traditional note values. A separate class, Dataln, concentrates all read and write operations and the handling of files. 3.5. Input/Output The user must create the input files following a general format. Almost every input line starts with a tag identifying the operation and an abundant use of enums. The records are interspersed with lines marking the functions that need the subsequent data. Following is a short example illustrating the assignment of frequency and loudness, Bottom:AssignFreq methodl WELLTEMPERED method2 SEQUENCE offset OBJNUM step 4 58 64 68 32 loud-ReadComputeFloat method SEQUENCE sones 4 120.0 160.0 100.0 110.5 SDuration: 17 units 17 sec Sound 1: start time 6 duration 7.4 type 10 13 partials frequency=55 sones=207 4. RESULTS AND FUTURE WORK LASS has been used for four semesters in the teaching of Computer Music and Computer-assisted Composition courses at UIUC. It has also been used hands-on by students enrolled in courses dealing with late 20th century music and by nonmusic freshmen attending modules on music and technology in Discovery courses. A tape piece dARIA was realized with LASS in 2004 and performed at UIUC and at the University of Wisconsin-Milwaukee. Both CMOD and LASS are works in progress. They are being used while new developments take place-a situation that we expect will continue for the foreseeable future. 5. REFERENCES [1] Hans G. Kaper and Sever Tipei. Formalizing the concept of sound. In Proc. 1999 Int'l Computer Music Conference, Beijing, China, pages 387-390, 1999. [2] Hans G. Kaper and Sever Tipei. Loudness scaling in a digital synthesis library. In Proc. 2004 Int'l Computer Music Conference, Miami, Florida, pages 398-401, 2004. [3] Hans G. Kaper, Sever Tipei, and Jeff M. Wright. Disco: An object-oriented system for music composition and sound design. In Proc. 2000 Int'l Computer Music Conference, Berlin, Germany, pages 340-343, 2000. [4] Kristopher Kriese and Sever Tipei. A compositional approach to additive synthesis on supercomputers. In Proc. 1992 Int'l Computer Music Conference, San Jose, California, pages 394-395, 1992. [5] F. Richard Moore. Computer Music. Prentice Hall, Englewood Cliffs, New Jersey, 1990. [6] Iannis Xenakis. Formalized Music. Pendragon Press, Stuyvesant, New York, 1992. [7] http://dissco.sourceforge.net. Acknowledgments. The authors acknowledge the contributions from Braden Kowitz, who designed and implemented the basic features of LASS, and the students in the "Advanced Computer Music" seminar taught at the University of Illinois between 2002 and the present, who contributed to the further development of LASS. The work of H.K. was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract W-31-109-Eng-38. and an example of a log file, TOP LEVEL: daria.dat:HIGH 1: H/C5: StartTime:: Duration: 241 units 19 units 241 sec 19 sec IBOTTOM 1: B/sChord05 | StartTime: 0 units 0 sec