Page  106 ï~~The SmOKe music representation, description language, and interchange format Stephen Travis Pope The Nomad Group, Computer Music Journal, CCRMA P. O. Box 60632, Palo Alto, California 94306 USA Electronic Mail: stp@CCRMA.Stanford.edu ABSTRACT The Smalmusic Object Kernel (SmOKe) is an object-oriented representation, description language and interchange format for musical parameters, events, and structures. The author believes this representation, and its proposed linear ASCII description, to be well-suited as a basis for: (1) concrete description interfaces in other languages; (2) specially-designed binary storageand interchange formats, and (3) use within andbetween interactive multimedia, hypermedia applications in several application domains. The textual versions of SmOKe share the terseness of note-list-oriented music input languages, the flexibility and extensibility of "real" music programming languages, and the non-sequential description and annotation features of hypermedia description formats. This description defines SmOKe's basic concepts and constructs, and presents examples of the music magnitudes and event structures. The intended audience for this discussion is programmers and musicians working with digitaltechnology-based multimedia tools who are interested in the design issues related to music representations, and are familiar with the basic concepts of software engineering. Two other documents ([Smallmusic 1992] and [Pope 1992]), describe the SmOKe language, and the MODE environment within which it has been implemented, in more detail. 1: INTRODUCTION The desire has been voiced (ANSI 1992; Dannenberg et al. 1989; MusRep 1987; MusRep 1990; Smallmusic 1992), for an expressive, flexible, abstract, and portable structured music description and composition language. The goal is to develop a kernel description that can be used for structured composition, real-time performance, processing of performance data, and analysis. It should support the text input or programmatic generation and manipulation of complex musical surfaces and structures, and their capture and performance in real time via diverse media. The required language have a simple, consistent syntax that provides for readable complex nested expressions with a minimum number of different constructs. The test of the language and its underlying representation will be the facility with which applications can be ported to it. Smallmusic Object Kernel consists of primitives for describing the basic scalar magnitudes of musical objects, abstractions for musical events and event lists, and standard messages for building event list hierarchies and networks. Structures in the underlying music representation can be described ina text-based linear format, and in terms of the of in-memory data structures that might be used to hold tnem, It is intended that independent parties be able to implement compatible abstract data structures, concrete interchange formats, and description languages based on the formal definition of SmOKe (Smalimusic 1992). The naming conventions and the code description examples use the Smalltalk-80 programming language (Goldberg and Robson 1989), but the representation should be easily manipulable in any object-oriented programming language. For readers unfamiliar with Smalltalk-80, another document (available via InterNet ftp from the file named "reading.smalltalk.t" in the directory "anonymous@ccrma-ftp.stanford.edu:/pub/stS0"), introduces the language's concepts and syntax to facilitate the reading of the code examples in the text. In this document, SmOKe examples are written between square brackets in sans-serif italic typeface. 2: EXECUTIVE SUMMARY The SmOKe representation can be summarized as follows. Music (i.e., a musical surface or structure), can be represented as a series of "events" (which generally last from tens of msec to tens of sec). Events are simply property lists or dictionaries; they can have named properties whose values are arbitrary. These properties may be music-specific objects (such as pitches or spatial positions), and models of many common musical magnitudes are provided. Events are grouped into event lists as records consisting of relative start times and events. Event lists are events themselves and can therefore be nested into trees (i.e., an event list can have another event list as one of its events, etc.); they can also map their properties onto their component events. This means that an event can be "shared" by being in more than one event list at different relative start times and with different properties mapped onto it. Events and event lists are "performed" by the action of a scheduler passing them to an interpretation object or voice. Voice objects and applications determine the interpretation of events' properties, and may use "standard" property names such as pitch, loudness, voice, duration, or position. Voices map event properties onto parameters of I/O devices; there can be a rich hierarchy of them. A scheduler expands and/or maps event lists and sends their events to their voices. Stored data functions can be defined and manipulated as breakpoint or summation parameters, "raw" data elements, or functions of the above. Sampled sounds are also describable, by means of synthesis "patches," or signal processing scripts involving a vocabulary of sound manipulation messages. 106.

Page  107 ï~~3: THE SmOKe LANGUAGE 3.1 Linear.Description Language The SmOKe music representation can be linearized easily in the form of immediate object descriptions and message expressions. These descriptions can be thought of as being declarative (in the sense of static data definitions), or procedural (in the sense of messages sent to class "factory" objects). A text file can be freely edited as a data structure, but one can compile it with the Smalltalk-80 compiler to "instantiate" the objects (rather than needing a special formatted reading function). The post-fix expression format taken from Smalltalk-80 (receiverObject keyword. optionalArgument) is easily parseable in C++, Lisp, Forth, and other languages. 3.2 Language Requirements The basic representation itself is language-independent, but assumes that several immediate types (such as numbers, strings, and symbols) are representable as ASCII/ISO character strings in the host language. The support of block context objects (in Smalltalk), or closures (in LISP), is defined as being optional, though it is considered important for complex scores, which will often need to be stored with interesting behavioral information. (It is beyond the scope of the present design to propose a metalanguage for the interchange of algorithms.) Dictionaries or property association lists must also either be available in the host language or be implemented in a support library (as must unique symbols and even associations in some cases [e.g., std. C]). 3.3 Naming and Persistency The names of abstract classes are known and are treated as special globals. The names of abstract classes are used wherever possible, and instances of concrete subclasses are returned, as in [Pitch value. 'c3'] or ['c3'pitch] both returning a SymbolicPitch instance (strings are written between single-quotes in SmOKe [and Smalltalk]). The central classes support "persistency through naming" whereby any object that is explicitly named gets stored in a global dictionary under that name until released. 34 Score FrmI A SmOKe score consists of one or more parallel or sequential event lists whose events may have interesting properties and links. Magnitudes, events, and event lists are described using class messages that create instances, or using immediate objects and the terse post-fix operators. These can be named, used in one or more event lists, and their properties can change over time. There is no pre-defined "level" or "grain-size" of events; they can be used at the level of notes or envelope components. patterns, grains, etc. The same applies to event lists, which can be used in parallel or sequentially to manipulate the sub-sounds of a complex "note," or as "motives," "tracks," or "parts." Viewed as a document, a score consists of declarations of, or messages to, events, event lists and other SmOKe structures. It can resemble a note list file or a DSP program. It is structured as executable Smalltalk-80 expressions, and can define one or more "root-level" event lists. There is no "section" or "wait" primitive; sections that are supposed to be sequential must be included in some higher-level event list to declare that sequence. A typical score will define and name a top-level event list, and then add sections and parts to it in different segments of the document. 4: SmOKe MUSIC MAGNITUDES Abstract descriptive models for the basic music-specific magnitude types such as pitch, loudness or duration are the foundation of SmOKe. These are similar to Smalltalk-80 magnitude objects in that they represent partially- or fully-ordered scalar or vector quantities with (e.g.) numerical or symbolic values. Some of their behavior depends on what they stand for, and some of it on how they're stored. These two aspects are the objects' "species" and their "class." The pitch-species objects 440.0 Hz and 'c#3', for example, share some behavior, and can be mixed in arithmetic with (ideally) no loss of "precision." The expression (261.26 Hz + MIDI key number 8) should be handled differently than (1/4 beat + 80 msec). The class of a music magnitude depends on the "type" of their values (e.g., floating-point numbers or strings), while their species denote what they represent Only the species is visible to the user. Music magnitudes can be described using prefix class names or post-fix type operators, e.g., [Pitch value: 440.0] or [440.0 Hz] or ['a3' pitch], [Amplitude value. 0.7071] or [-3 dB]. The representation and interchange formats should support all manner of weird mixed-mode music magnitude expressions (e.g., [('c4' pitch) + (78 cents) + (12 Hz)]), with "reasonable" assumptions as to the semantics of the operation (coerce to Hz or cents?). Applications for this representation should support interactive editors for music magnitude objects that support the manipulation of the basic hierarchy described below, as well as its extension via "light-weight" programming. The examples below demonstrate the verbose (class message) and terse (value + post-operator) forms of music magnitude description. Note that comments are delineated by double quotes (or curly braces) in SmOKe. Music Magnitude Examoles (Duration value: 1/16) asMS "Same as [1/16 beat); answers 62." (Pitch value: 36) asHertz "Same as [36 pitch]; answers 261.623." (Amplitude value: 'ff') asMidi "Or [#ff ampi]; answers 106." ('f#4' pitch), ('pp' dynamic) "Terse examples; value + post-op.".: (261 Hz), (1/4 beat), (-38 dB), (56 velocity), (2381 msec), (0@0 position) 107

Page  108 ï~~5: SmOKe EVENT OBJECTS The simplest view of events is as Lisp-esque property lists, dictionaries of property names and values, the relevance and interpretation of whom is left up to others (voices and applications). Events need not be thought of as mapping one-to-one to "notes," though they should be able to faithfully represent note-level objects. There may be one-to-many or many-to-one relationships between events and "notes." Events may have arbitrary properties, some of whom will be common to most musical note-level events (such as duration, pitch or loudness), while others may be used more rarely or only for non-musical events. Events' properties can be accessed as keyed dictionary items (i.e., events can be treated as record data structures), or as direct behaviors (i.e.,,events can be thought of as programmatic). One can set an event to be "blue" (for example), by saying [anEvent at. #color put: #blue "dictionary-style accessing 1 or more simply [anEvent color: #blue "behavioral accessing] (whereby #string means unique symbol whose name is string). Events can be linked together by having properties that are associations to other events or event lists, (as in [anEvent #soundsLike: otherEvent]), enabling the creation of annotated hypermedia networks of events. Event properties can also be active blocks or procedures (in cases where the system supports compilation at run-time as in Smalltalk-80 or Lisp), blurring the differentiation between events and "active agents." Events are created either by messages sent to the class Event (which may be a macro or binding to another class), or more tersely, simply by the concatenation of music magnitudes using the message "," (comma for concatenation), as shown in the examples below. Common properties such as pitch, duration, position, amplitude, and voice are manipulated according to "standard" semantics by many applications. Event Examles "Event creation examples--the verbose way (class messages)." [event:- (Event newNamed: #flash) color: #white; place: #there) [(Event duration: (1/2) pitch: #c2 loudness: #mf) playOn: aVoice] "Create three events with mixed properties--the terse way" [(440 Hz), (1/4 beat), (-12 dB), Voice default] "abstract props." [38 key, 280 ticks, 56 vel, (#voice -> 4)] "MIDI-style props." [(#c4 pitch, 0.21 sec, 0.37 ampl, (Voice named: #oboe)] "note-list style" 6: EVENT LISTS Events are grouped into collections-event lists-where a list is composed of associations between start times (durations starting at the start time of the event list) and events or sub-lists (nested to any depth). Schematically, this looks like: (EventList = (dur => event]), (dur2 => event2),...) where (x => y) means association with key x and value y. Event lists can also have their own properties, and can map these onto their events eagerly (at definition time) or lazily (at "performance" time); they have all the property and link behavior, and special behaviors for mapping with voices and event modifiers.. The messages [anEventList add: anAssociation] and [anEventList add: anEventOrEventList at: aDuration],-along with the corresponding event.removalmessages, can be used for manipulating eventlists in thestatic representation or in applications. Event lists also respond to Smalltalk-80 collection-style control structure messages such as [anEventList collect: aSelectionBlock] or [anEventUst select: aSelectionBiock], though this requires the representation of contexts/closures. The behaviors for applying functions (see below) to the components of event lists can look applicative (e.g., [anEventList apply: aFunction to: aPropertyName]), or one can use event modifier objects to have a stateful representation of the mapping. Applications will use event list hierarchies for browsing and annotation as well as for score following and performance control. The use of standard link types for such applications as version control (with such link types as #usedToBe or #viaScriptllbSi4), is defined. A named event list is created (and stored) in the first example below, and two event associations are added to it, one starting at 0 (seconds by default), and the second at 1 sec. Note that the two events can have different types of properties, and the handy instance creation messages such as [dur: duration Value pitch: pitch Value amp: amplVaule]. The second example is the terse format of event list declaration using the behavior of (duration -> event) associations such that [(aMagnitude) => (animmediateDictionary)] returns the association [(duration with value aMagnitude) => (Event with given property dictionary)]. One can use dictionary-style shorthand with event associations to create event lists, as in the very terse way of creating an anonymous (non-persistent) list with two events in the second example. The third example shows the first few notes from the c-minor fugue from Bach's Well-Tempered Clavichord in which the first note begins after a rest (that could also be represented explicitly as an event with a duration and no other properties). Note that there is one extra level of parentheses for readability. Event List Examples "Event Lists the verbose way" [ (EventList newNamed: #testl) add: (0 -> (Event dur: 1/4 pitch: 'c3' ampi- 'mf'); add: (1i-> ((Event new) dur: 6.0 ampl" 0.3772 sound: #s73bw)) ] "Terse Lists--concatenation of events or (dur -> event) associations." [ (440 Hz, (i/I beat), 44.7 dB), (1 -> ((1.396 sec, 0.714 ampl) sound: #s73bw; phoneme: #xu))J] 108

Page  109 ï~~"C-minor fugue theme." ( (0.5 beat -> ((1/4 beat), ('c3' pitch), (voice: harpsichord))), ((1/4 beat), ('b2' pitch) ), ((1/2 beat), (c3' pitch)), ((1/2 beat), ('g2' pitch)), ((1/2 beat), ('a-flat2' pitch)) 7: EVENT GENERATORS AND MODIFIERS The EventGenerator and EventModifier packages provide for music description and performance using generic or compositionspecific middle-level objects. Event generators are used to represent the common structures of the musical vocabulary such as chords, ostinati, or compositional algorithms. Each event generator subclass knows how it is described-e.g., a chord with a root and an inversion, or an ostinato with an event list and repeat rate-and can perform itself once or repeatedly, acting like a function, a control structure, or a process, as appropriate. EventModifier objects hold onto a function and a property name; they can be told to apply their functions to any property of an event list lazily or eagerly. 8: FUNCTIONS. PROBABILITY DISTRIBUTIONS AND SOUNDS SmOKe also defines functions of one or more variables, several types of discrete or continuous probability distributions, and granular and sampled sounds. The description of these facilities is, however, outside the scope of this paper, and the reader is referred to (Smallmusic 1992). 9: VOICES AND STRUCTURE ACCESSORS The "performance" of events takes place via Voice objects. Event properties are assumed to be independent of the parameters of any synthesis instrument or algorithm. A voice object is a "property-to-parameter mapper" that knows about one or more output or input formats for SmOKe data (e.g., MIDI, note list files, or DSP commands). A StructureAccessor is an object that acts as a translator or protocol convertor. An example accessor might respond to the typical messages of a tree node or member of a hierarchy and know how to apply that language to navigate through a hierarchical event list (by querying the event list's hierarchy). SmOKe supports the description of voices and structure accessors in scores so that performance information or alternative interfaces can be embedded. The goal is to be able to annotate a score with possibly complex real-time control objects that manipulate its structure or interpretation. Voices and event interpretation are described in (Pope 1992). 10: CONCLUSIONS The Smallmusic Object Kernel (SmOKe) is a representation, description language and interchange format for musical data that eases the creation of concrete description interfaces, the definition of storage and interchange formats, and is suitable for use in multimedia, hypermedia applications. The SmOKE description format has several versions, ranging from very readable to very terse, and covering a wide range of signal, event, and structure types from sampled sounds to compositional algorithms. SmOKe can be viewed as a procedural or a declarative description; it has been designed and implemented using an object-oriented methodology and is being tested in several applications. More explicit documents describing SmOKe, and the Smalltalk-80 implementation of the system in the MODE system, are freely available via Internet file transfer. 11: ACKNOWLEDGEMENTS SmOKe, and the MODE of which it is a part, is the work of many people. Craig Latta and Daniel Oppenheim came up with the names Smallmusic and SmOKe. These two, and Guy Garnett and Jeff Gomsi, were part of the team that discussed the design of SmOKe, and commented on its design documents (Smallmusic 1992). REFERENCES ANSI 1992 Journal of Technical Development, ANSI Working Group X3VI.8MSD-7 (now ISOIEC DIS 10744). R. B. Dannenberg, L. Dyer, G. E. Garnett, S. T. Pope, and C. Roads, "Position Papers for a Panel on Music Representation," Proc. of the ICMC, San Francisco: ICMA, 1989. A. Goldberg and D. Robson, Smalltalk-80: The Language, (revised and updated from 1983 edition). Menlo Park: AddisonWesley, 1989. MusRep 1986, R. Dannenberg, J. Maloney, et al., "Network discussion on music representation" USENET MusRep 1990, G3. Diener, L. Dyer, G3. E. Gamnett, D. Oppenheim, S. T. Pope et al., "Notes of meetings on music representation at CCRMA, Fall, 1990." USENET Newcomb, N. A. Kipp, and V. T. Newcomb. "The HyTime Hypermedia/Time-based Document Structuring Language," Communications of the ACM, vol. 34, no. 11, pp. 67-83 S. T. Pope, "Interim Dyna~iano: An Integrated Computer Tool and Instrument for Composers," Computer Music Journal 16:3. Fall, 1992. Smallmusic 1992. 03. E. Garnett, J. Gomsi, C. Latta, D. Oppenheim, S. T. Pope et al., Smallmusic discussion group notes, Credo documents (from which this document was derived), and MODE User Primitive specification available from Smallmusic@ XCF.Berkeley.edu as email or via anonymous InterNet ftp from the server ccrma-ftp.stanford.edu in the directory... pub/st8O (see the README file there). 109