Page  00000431 SPED - A SOUND FILE EDITOR Jaska Uimonen Department of Musicology P.O.Box 35 (Vironkatu 1) 00014 University of Helsinki jaska.uimonen@helsinki.fi ABSTRACT Sound Processing Kit Editor (abbr. SPED) is a nonlinear audio editor, which is implemented using open source software components. For programmers SPED provides a C++ class library, which encapsulates a basic editing functionality. For end users SPED provides a command line interface, which can be used as it is or programmed in the shell environment. In this paper I will present a design, which fulfils many constraints found also in professional level software. 1. INTRODUCTION Computer-based sound editing has been utilized commercially since the late 1970's [1]. In the 1980's there we're many proprietary editing systems built around specialized computer hardware [2], [3]. At the same time researchers introduced also software only systems [4], [5], [6], [7], [8]. Today commercial software is widely used in music production and is usually running in a standard computer. Lately we have also seen the birth of open source sound editors [9], [10]9 [11]q [12]. Why to build another sound file editor then? I believe the approach taken when designing SPED is special in many ways. First it is largely based on connections of signal processing objects, which makes the software easily extendable. Second the edit decisions are saved into platform and network independent XML documents. Third the structure of the software is simple making it easy to learn the basics of editing. 2. DESIGN GUIDELINES Some basic objectives were set for the software development. These are non-destructive and text based editing and operability without graphical user interface. 2.1. Non-linearity and non-destructive editing Editing software should have the property non-linear, which means, that the recorded sound can be accessed randomly. Non-linearity leads to the concept of nondestructive editing, which means, that the editing operations don't touch the underlying sound files in any way. Non-destructive editing is usually achieved by keeping a list of consecutive sections to sound files and playing these sections seamlessly one after another. This kind of list is often called edit decision list. Non-destructive editing is required, because it is one of the key properties distinguishing computer-based systems from splicing analogue tape. Non-destructive 431 behaviour makes possible the undoing of any editing operation without copying the sound files in their preediting state. Non-linearity and non-destructive editing have their roots in disk drive technology, but in this paper the disk optimization issue is not touched; It is assumed, that the disk or any other mass storage device is fast enough. 2.2. Text based editing The modified operations should be saved in text based edit decision lists, which can be inspected easily and modified with a simple text editor. The editing software might work faster using a binary format, but the transparency of the software will suffer. Text based lists have also their benefits and downsides. User can easily view and edit simple lists. It is possible to easily start new project by inserting the information of a single sound file into a text based list. However with few editing operations the list becomes too complex to read and it is almost mandatory to use some kind of automation to ease the list manipulation [13]. 2.3. Operation without graphical user interface The editor should be usable without graphical user interface or it should not make any assumptions about it. This feature may seem strange if we think of the modern computer based editing systems which basically always have some kind of graphical user interface. On the other hand when comparing to analogue tape splicing one might ask how mandatory the graphical user interface really is. One more practical reason for this feature is that the UL design for sound editors is an art in itself and is out of the scope of this paper. 3. NEW COMPONENTS TO SOUND PROCESSING KIT SPED is largely based on Sound Processing Kit (abbr. SPKit), which is a portable object-oriented class library for audio signal processing [14]. In this context SPKit is used mainly because of its good representation of connections between signal processing objects and the ease of implementing new objects. For SPED it was necessary to implement two new SPKit components. 3.1. SPKitList The purpose of SPKitList class is to play separate consecutive processing chains. In its simplest form

Page  00000432 SPKitList can be used to play a section from a sound file. Before processing starts user adds processing chains to the SPKitList instance. When samples are requested SPKitList returns the samples from the first processing chain and when there aren't anymore samples it moves to the next processing chain. SPKitList is derived from the SPKit's base class SPKitProcessor. SPKitList contains references to other SPKitProcessor based objects and can therefore form recursive trees with other SPKitProcessor derived classes. 3.2. SPKitLinearFader SPKitLinearFader is a class that creates a simple linear fade by calculating a new fader value for every sample. The length and direction of the fade is given in the initialization of the object and at that time the corresponding real time parameters are also calculated. The object stops returning samples after the fade duration amount of samples have been returned. 4. GENERAL STRUCTURE OF THE EDITOR SPED has a simple high-level structure, which enables to extent its functionality to the desired direction. This high-level structure consists of small command line programs and C++ classes. 4.1. High level helper files SPED has one global info file, which contains the path to the current projects folder (see fig. 1). All project folders have their private info files, which contains at least the current edit decision list's index. In current implementation the info file contains also the index and region of the edit decision list to be copied. In the info file it would be also possible to have all kind of general information like project's sample rate and channel count. The project folder has separate subfolders for audio and the edit decision lists. Audio can be copied to the projects audio folder but it can also reside anywhere on the computer's file system. EDITOR INFO FILE Path to project folder PROJECT INFO FILE Current list index Region start Region end Copy list index Copy region start Copy region end | CURRENT EDL FILE Figure 1. Files used for the editing operations 4.2. Handling of the edit decision lists Whenever an operation on the edit decision list is made an object structure is created. Manipulations are made to this structure in memory and the modified structure is written back to a new file. In object-oriented programming this kind of object creation and disposal is referred to as object serialization and de-serialization. Edit decision list files are numbered in consecutive order. The number of each file is appended to the file name. The index of the current list is being held in another file, which means that undoing is simply going back to the file with previous number in its name. 5. STRUCTURE OF THE EDIT DECISION LIST There have been several attempts to standardise the edit decision list or some other kind of temporal behaviour in structural format. All these could be used as the format for the edit decision list, but for some reason they haven't gained popularity as the editing software's native formats. Yonge proposed SGML based format for the edit decision list [15]. This idea was later formalized in the AES31-3 simple project interchange format, which includes the definition of Edit Decision Markup Language (abbr. EDML). Temporal structures can be found also in Synchronized Multimedia integration Language (abbr. SMIL) [16]. In SPED the idea is to maximize the program state info in the edit decision list. This can be done quite easily without complicating the parsing too much. 5.1. SPED list document format SPED uses Extended Markup Language (XML) [17] as the format in its edit decision lists. The use of XML can be justified with several reasons. XML is developed to represent hierarchical structures, which is the case also with the signal processing chains. Parsing of the XML document can be done with several public and properly tested parsers. The structure can also be formally described and validated with XML tools. Finally XML files are in text format, which was one design constraint for the software. 5.2. SPED list document structure XML tags are defined to reflect the SPKit objects to be serialized. XML document structure can be defined formally in document type definition (abbr. DTD) or in a schema file. In figure 2 you can find a listing of a simplified DTD definition of SPED edit decision list. 432

Page  00000433 <!ELEMENT SPKitProcessor (SPKitList I SPKitReader I SPKitSum I SPKitLinearFader)*> <!ELEMENT SPKitList (SPKitProcessor)+> <!ELEMENT SPKitReader (fileName, offsetInFrames, lengthInFrames)> <!ELEMENT fileName (#PCDATA)> <!ELEMENT offsetInFrames (#PCDATA)> <!ELEMENT lengthInFrames (#PCDATA)> <!ELEMENT SPKitSum (SPKitProcessor)+> <!ELEMENT SPKitLinearFader (direction, lengthInFrames)> <!ELEMENT direction (#PCDATA)> <!ELEMENT lengthInFrames (#PCDATA)> Figure 2. Basic document type definition for SPED edit decision list. All complete processing chains are surrounded with <SPKitProcessor> tags. This means that the edit decision list should start and end with these tags. Other SPKit objects have their own tags, which include subtags for the class' private members. The most important of these is the SPKitReader class, which reads samples from a sound file. If a class contains other processing chains, these chains should be separated with <SPKitProcessor> tags. In figure 3 you can see a valid SPED edit decision list with corresponding object structure. The time ordering of the processing chains in the list is from left to right. a) b) <xSPKitProcessor> Amp <SPKitAmp> <gain>0.8</gain> </SPKitAmp> List <SPKitProcessor> '<SPKitReader> Read~er Amp <PfleName>violin.wav</fileName> <offsetInFrames>O'</offsetInFrames> <length>20000</Iength> Reader </SPKitReader> </SPKitProcessor> <SPKitProcessor> violin.wav <SPKitAmp> <gain>0.8</gain> </SPKitAmp> <SPKitReader> s ileName>violin.wav</fileName> '<offsetInFrames>5OOOO'</offsetIn Frames> <length >1O000</Iength> </SPKitReader> </SPKitProcessor> </SPKitList> </SPKitProcessor> </xml> Figure 3. a) Example of a edit decision list and b) the corresponding object structure. 6. MANIPULATION OF THE EDIT DECISION LIST As stated before most editor operations are to do with manipulating edit decision lists. Two basic cases exists: playing the SPKit object tree and making a cut or paste to an existing list. The files can be manipulated manually with a text editor and in simple cases this is quite easy and handy. However when editing continues further and the list structure becomes more complex the manual editing is awkward and some kind of automation must be available. 6.1. Parsing the object tree The software objects in SPED are divided into two domains: Objects created in parsing and actual SPKit objects created from parse domain objects. The parse objects are used in the tree manipulation and the actual SPKit objects when playing the tree. The reason for the division is that SPKit objects don't have a standard way to serialize themselves into file. Also there isn't an easy way to walk through the tree with SPKit objects. The parse objects are created with a simple lexer and parser configuration. Whenever a start tag is detected a matching parse object is created and its data members are filled accordingly. The object tree is created using a queue list structure. Whenever a <SPKitProcessor> tag is detected a new queue is added to the list. Whenever the end tag is detected the current queue is pulled empty and the objects are connected together to form a complete SPKit processing chain. When the tree is played the SPKit chain can be created on the fly, which means that we don't have to keep the parse objects in memory. 6.2. Tree manipulations for cut and paste All tree operations boil down to one procedure, which is finding and separating a part of the tree within a given timeline. The only place where time is present in the list structure is the SPKitReader object, which holds the start and end point of the playable region. The tree traversal rules for finding a section within a given timeline are as follows: 1) The duration of the SPKitProcessor chain is defined in its child SPKitReader, SPKitList or SPKitSum nodes. 2) The duration of the SPKitList node is its child nodes duration added together. 3) The duration of the SPKitSum node is its child node with the highest duration. If we want to make a cut, we clone the sections before and after the cut and connect them together. Similarly paste is an operation where three trees are connected together i.e. beginning, paste section and end. 6.3. Crossfades When connecting together parts of a sound some kind of crossfade is usually desired. A crossfade is needed to avoid abrupt changes in the connection point that might cause audible artefacts. In figure 4 you can see how crossfades are implemented in SPED. 433

Page  00000434 Figure 4. Composition of software components forming a) simple cut and b) crossfaded cut inside one sound file. If a crossfaded cut is needed the file or underlying structure is divided into three sections. First section is the part before the cut. Second section is the crossfade with parts of the sound file faded and summed together. The last section is the part after the cut. 7. CONCLUSIONS In this paper I have presented a simple sound file editor called SPED, which uses open source components as building blocks. The editor works in a non-destructive way using edit decision lists. Text-based editing is achieved by using XML as a format in the edit decision lists. SPED is sound file editor, which can be used in learning the basics of computer based sound editing or as a more advanced sound editing utility. It is extendable and its functionality can be changed with minor effort. 8. REFERENCES [1] Ingebretsen, R. B. & Stockham, T. G. "Random Access Editing of Digital Audio", The Journal of the Audio Engineering Society, Vol. 32, No. 3, March, pp. 114-122, 1984. [2] Kirby, D. G. 1988. "The exploitation and realisation of a random access digital audio editor", AES 85th Convention, November 3-6 Los Angeles, 1988. [3] Moorer, J. A. & Abbott, C. & Nye, P. & Borish, J. & Snell, J. "The Digital Audio Processing Station: A New Concept in Audio Postproduction", The Journal of the Audio Engineering Society, Vol. 34, No. 6 pp. 454, 1986. [4] Griffiths, M. & Bloom, P. J. "A Flexible Digital Sound Editing Program for Minicomputer Systems", AES 68th Convention, March, Hamburg, 1981. [5] Lentczner, M. "Sound Kit - A Sound Manipulator", Proceedings of the International Computer Music Conference, Burnaby, Canada, 1985. [6] Weisser, A. & Komly, A. "Description of an Audio Editing System Using Computer Magnetic Hard Disk". AES 80th Convention, March 4-7, Montreux, Switzerland, 1986. [7] Jaffe, D. & Boynton, L. "An Overview of the Sound and Music Kits for the NeXT Computer". In Pope, S. T. (ed.). The WellTempered Object: Musical Applications of Object-Oriented Software Technology. MIT Press, Cambridge, 1989. [8] Eckel, G. "A Signal Editor for the IRCAM Musical Workstation", Proceedings of the International Computer Music Conference, Glasgow, 1990. [9] Garton, B. & D. Topper. "RTCmix - Using CMIX in Real Time", Proceedings of the International Computer Music Conference, San Francisco, 1997. [10]Mazzoni, D. & Dannenberg, R. B. "A Fast Data Structure for Disk Based Editing", Computer Music Journal, Volume 26, Issue 2, July 2002. [11] http://ardour.org. [12] http://www.eca.cx/ecasound. [13] Kim, T. & Kim. K & Lee, K. "Simple and Consistent SMIL Authoring: No More Structure Editing and No More Errors", IEEE Symposium on Visual Languages - Multimedia Computing on the World Wide Web, Seattle, Washington, 2000. [14]Lassfolk, K. "Sound Processing Kit - An Object-Oriented Signal Processing Framework", Proceedings of the International Computer Music Conference, Beijing, China, 1999. [15] Yonge, M. "Professional Audio File Interchange", Audio: The Second Century - AES UK Conference, June, 1999. [16]Bulterman, D. & Grassel, G. & Jansen, J. & Koivisto, A. & Layaida, N. & Michel, T. & Mullender, S. & Zucker, D. eds. "Synchronized Multimedia Integration Language (SMIL 2.1) W3C Recommendation 13 December 2005", http://www. w3o rgifRfMIl. [17] Bray, T. & Paoli, J. & Sperberg-McQueen, C. M. & Maler, E. & Yergeau, F. & Cowan, J. eds. "Extensible Markup Language (XML) 1.1 (Second Edition) XML l. (Second Edition), W3C Recommendation, 16 August 2006", http:// ww.w3. or TR/2006/REC-xml 1 - 20060816/. 434