Page  00000349 mungerl-: TOWARDS A CROSS-PLATFORM SWISSARMY KNIFE OF REAL-TIME GRANULAR SYNTHESIS Ivica Ico Bukvic, D.M.A. Virginia Tech Department of Music DISIS, CCTAD, CHCI Ji-Sun Kim Virginia Tech Computer Science, CHCI Dan Trueman, Ph.D. Princeton University Department of Music dan @mu.sic.princeton. edu Thomas Grill University of Music and Performing Arts, Vienna ABSTRACT mungerl- is a new and enhanced version of a powerful Max/MSP real-time granular synthesis external found in the PeRColate library. Apart from added features and optimizations, including ability to generate theoretically unlimited number of grains per second using real-time input, mungerl- also offers GPL-licensed platformtransparent code using theflext library. Thus, mungerlis compatible with both Max/MSP and Pure-Data without any alterations to the source. Due to object's complexity, as well as a significantly less common porting path from the Max/MSP/C to a platformagnostic C++/flext, this project has resulted in a number of improvements in flext build scripts and supporting documentation. It has also generated an invaluable list of caveats which are presented in this paper in hope to foster more platform-agnostic object design and porting efforts. Since its release, mungerl- has been featured in two interactive multimedia creations whose technical and artistic impact is also addressed in this paper. 1. INTRODUCTION Originally introduced in 1947 as a theory by Dennis Gabor [14], sonified by lannis Xenakis in 1971 [28], and made real-time by Barry Truax in 1988 [25], granular synthesis is by no means a new technology. Yet, in part due to its inherent versatility, granular synthesis remains a prominent technique in contemporary digital audio vocabulary. Its recent adoption into the mainstream audio software, such as Propellerhead's introduction of Maelstrom synthesizer in Reason 2.0 [20], certainly attests to this ongoing trend. Late 1990s have introduced proliferation of portable computing, accessible DSP-oriented programming languages, and the ensuing popularity of audio-visual environments geared towards interactivity, most notably PD/Gem [21, 12] and Max/MSP/Jitter [10]. As a result, there was a growing need for an integrated real-time granular synthesis object which would offer balance between versatility, and ease of use. Although only a peripheral object of the PeRColate library originally authored by Dan Trueman and R. Luke DuBios [4, 26, 27], munger- external to this day remains arguably one of the most powerful granular synthesis objects for Max/MSP. PD community quickly realizing the importance of the PeRColate collection had made a genuine effort towards a native port. As a result, in 2002 Olaf Matthes had generated a near complete Pure-Data conversion. Unfortunately, in part due to ongoing changes in gcc, munger-'s ported code had soon failed to operate properly, maxing out the CPU usage in a seemingly random fashion and ultimately producing garbled and therefore unusable audio output. Another unfortunate shortcoming of Matthes's formidable efforts was the fact that original authors of the PeRColate library had no resources and/or interest in maintaining two concurrent code bases, since their work revolved predominantly around the Max/MSP platform. More so, multiple #ifdefs in Olaf's code had made the source difficult to maintain. For these reasons, Matthes's contributions had fallen by the wayside, leaving PD community without an ability to tap into the vast potential of the munger- object. In 2007, after having an opportunity to attain deeper understanding of the flext library [16], an effort was made at Virginia Tech's newly established Digital Interactive Sound and Intermedia Studios (DISIS) [13] to port munger- to the flext framework and thus make the object platform-transparent with practically no added overhead to its code maintenance. Several steps had to be taken before port could commence, one of them being permission to redistribute source under a flext-compatible license, in this case GNU General Public License [15]. After several days of dealing mostly with platform-specific idiosyncrasies which were ironically tied to environment variables otherwise considered to be platform agnostic, mungerl- 1.0.0 was released, offering new features and full backwards compatibility. Since its initial release in March 2007, the object has seen several updates. Latest version 1.3.1 provides fully modular memory allocation model, code optimizations, and virtually unlimited number of realtime grains per second which are restricted only by the raw CPU power. 2. PORTING CODE OUTSIDE THE BEATEN PATH Althoughflext library is a very stable and mature crossplatform framework, its adoption has been associated predominantly with the PD community. As a result, apart from few exceptions (i.e.fftease [18] port initiated by the flext author Thomas Grill) most of the porting efforts have been either from PD to flext, or were new objects designed entirely within the flext environment (such as Thomas Grill's xsample, pool, py, etc., Tim Blechman's chaos and tbext, Frank Barknecht's fluid-, 349

Page  00000350 syncgrain-, etc., all of which are a part of the PD/flext CVS repository [22]). Consequently, documentation regarding porting objects from the Max/MSP C-based source to the flext C++ environment had been sparse. munger- is a relatively complex external with its own concurrently moving and rotating buffer as well as ability to interface with external buffers. Its latest version also relies upon the Synthesis Toolkit (STK) in order to provide ADSR [24] grain envelope. munger- in and of itself provides up to sixteen outputs, ability to specify grain content by manually traversing the current buffer, accessing random points from the buffer, as well as read the same in different directions. The object allows for up to fifty simultaneous grains per sample. The resulting absolute grain density per second depends upon grain separation and rate variation (1st and 2nd inlets respectively), individual grain size and size variation (3rd and 4th inlets respectively), absolute minimum allowable grain size ("minsize" message), and the "delaylength" message (whose size reflects the size of the moving sub-buffer and consequently largest possible grain). Grains can be transposed using random as well as pitch-based transpositions. Each grain is also spatialized using customizable random deviation. Unlike stereo spatialization which has its own dedicated inlet, multichannel diffusion requires "spatialize" message entry into the leftmost inlet in order to specify random amplitude deviation and the default channel amplitude gain. As can be observed from its features, munger- has proven to be a formidable test of flext's robustness. Although the initial port was relatively quick, its cleanup and testing was encumbered by slowdowns stemming from the platform-specific compiler and engine peculiarities which for the most part fell outside flext's domain. 2.1. Unexpected Caveats Apart from the expected and relatively well documented API alterations which were necessary for the code to conform to the flext framework and thus become transparent to both Max/MSP and PD], there were several unexpected considerations whose scope will likely have a universal impact in other similar development/porting efforts. They are listed below. 2.1.1. From C to C++ Apart from endless and in most cases dubious argument which programming language offers better performance, C++ as an object-oriented environment provides superior scalability and better memory management [17]. Although the porting effort was relatively straightforward, compiler idiosyncrasies had proven to be the most time-consuming bugs. One such idiosyncrasy is related to the fact that Microsoft Visual C++ (a.k.a. MSVC) [19], rather than Cygwin [11] environment was used on the Windows platform. rand( function which is often critical component of algorithmic systems produces different value range in MSVC (namely 0-32767) from that of gcc (used by Linux and OSX). As a result an additional #ifdef was necessary to standardize value range of the original munger~ object across platforms: #ifndef GNUC #define RANDOM (rand()) #else #define RANDOM (random ()%32768) #endi f 2.1.2. flext, SndObj, and STK flext has an inherent ability to interface with SndObj and STK libraries. As a result, compiling these libraries for developers may require some additional overhead despite the common building environment. One of the very useful features which would clearly benefit from additional documentation is an ability to build Universal Binary (UB) [3] external. STK by default is not UBfriendly and therefore needs to be built with UB explicitly enabled. This step, however, requires only a minor alteration to the STK's makefile: CFLAGS += -isysroot /Developer/SDKs/MacOSX10.4u.sdk -arch ppc -arch i386 Once flext is compiled, the process of building an external is relatively straightforward. The flext build script relies upon the package,.txt file which is for the most part self-explanatory. flext external uses the building process identical to that of flext itself. The newly generated external is by default statically linked against supporting libraries, making its distribution and/or integration seamless. It is however worth noting that dynamic linking is also available and in this case preferred as it minimizes resource duplication by multiple concurrent instances of flext objects. 2.1.3. Optimizing Code Due to message translation inherent to flext, we expected the platform-agnostic external to introduce larger CPU footprint than its native counterpart. Our experiments have shown, however, that mungerl~ when compiled with optimization flags provides practically identical performance to that of the native mungerobject. For this reason, we treated the anticipated overhead as negligible. 3. WHAT'S NEW? The new cross-platform port mungerl- introduces several enhancements over its precursor, some of which provide new ways of shaping sound, others that enhance internal object data and feedback monitoring, and finally those which are strictly internal overhauls for the purpose of streamlining the code. 350

Page  00000351 mungerl- with its ever-growing complexity also includes comprehensive documentation for both 3.1. New Features platforms. mungerl~ introduces two new user-controllable features which have a direct impact on the audio output. One of them is "discretepan" option (passed into the first inlet) which can be toggled on or off (default). Unlike default mungerl-'s behaviour which projects the same grain onto every channel with varied amplitude (provided the random amplitude deviation set via "spatialize" command is other than zero), "discretepan" sends each grain to one particular channel generating a granular swarm whose full spectral composition can be discerned only from a location where all channels are equally perceivable. In this respect, mungerl~ provides spectral diffusion of grains (albeit with limited control over grain location). The second feature is a modular number of output channels and concurrent voices (with arbitrary ceiling imposed at 64 and 1000 respectively for the sake of limiting human error and potential memory overflow). The two options are set via optional 2nd and 3rd arguments (1st backwards compatible argument is reserved for internal buffer size). mungerl~ is thus capable of delivering densities beyond 100,000+ grains per second. This number, however rapidly drops off with increased grain lengths and proportionally increased CPU overhead. For this reason it is not uncommon for mungerl~ to overload even the most contemporary hardware. Another architectural peculiarity retained from the original munger- prevents the external from generating more grains than there are samples per second. This is due to the MINSIZE variable which determines absolute minimum grain size possible. Therefore, the absolute grain density per second is currently limited only by the sampling rate and the raw CPU power. 3.2. Object Monitoring and Feedback mungerl~ is designed to provide easy monitoring of multiple instances within the same session. For this purpose, mungerl~ offers an ability to add unique name to every instance. This name is consequently reflected in all of the output generated in the console. Instance name can be set via optional 4th argument which also ignores "" (underscore) entries, interpreting them as means of extending object's visual width in PD. Due to improved, potentially high-volume data monitoring feedback, mungerl~ also offers "verbose" option which allows for setting the following four verbosity levels: 0 =all messages (including errors) off 1 =only errors and warnings (default) 2 =all messages 3 =all messages plus number of grains per second 3.3. Under the Hood With its modular voice and channel output, mungerl~ offers dynamic memory allocation of its resources. This is especially important due to its potentially large memory footprint. This dynamic model is in part achieved through the vector-based implementation of larger and more memory demanding data arrays. mungerl~ also has aflext-based method which, when coupled with external buffer, continually checks for external buffer's validity. While generally unlikely, and mostly limited to human error (i.e. by explicitly erasing the external buffer while audio engine is turned on), were it not for the aforesaid implementation, such an occurrence would inadvertently bring down the entire application with irreversible data loss. 4. NOTES TO END-USERS mungerl~ has several idiosyncrasies which require enduser attention. Although it is our aim to address many of these in object's future iterations, currently the best way of dealing with them is through awareness. When generating a multichannel instance of mungerl~ no audio will be outputted until object is given a "spatialize" message followed by an array of numbers reflecting channel-specific amplitude gain and random amplitude deviation, a.k.a. spread. For instance, "spatialize 0.1 0.5 0.1 0.5 0.1 0.5" message to the object would map an amplitude gain of 0.5 and a gain spread of 0.1 to the first three output channels). Internal and external buffers can be used interchangeably via the "buffer <buffer-name>" message. "buffer" message by itself reverts back to internal buffer. It is important to note that in Max/MSP, when external buffer mysteriously disappears or is explicitly deleted, mungerl~ will cease all output until its buffer is manually set to another buffer (internal or external). PD in such a situation automatically reverts to internal buffer. "maxvoices" message which was used originally to remap the number of used voices out of a hardwired maximum of 50 is now deprecated and is retained only for legacy purposes. Although it can to some extent optimize the main loop (i.e. in situations when not all allocated voices are needed, which is controllable via "voices" command), its impact is negligible. Due to its backwards compatible implementation, however, "maxvoices" can have an impact on the number of available voices. For instance, if an object is initiated with 70 voices (using object's optional 3rd argument), and is given "maxvoices 50" command, "voices" message will only allow for up to 50 voices. For this reason, use of "maxvoices" in munger1~ is considered deprecated and its use is discouraged. 351

Page  00000352 5. REAL-WORLD PERFORMANCE 5.1. Benchmarks In order to assess object's performance two benchmarks have been conducted on Linux, OSX, and Win32 platforms. For this purpose we used the mungerl~ help file with two different settings. Since internal mungerl~'s sample buffer is hardwired to 64 bytes, it remained constant in all tests. Max/MSP's I/O Vector Size was set to 512 while Signal Vector size was 8. PD's internal buffer was calculated to the closest millisecond (namely 512x8/48,000 which yielded approx. 9 milliseconds). All tests were performed using internal laptop soundcards. The two scenarios were as follows: Test 1: 100ms grain size, 50ms grain size variation, 0 grain separation and grate variation, 8-channel output with default spatialization values, minsi ze 1, delaylength 300, and a sampling rate of 48,000Hz. Test 2: Oms (or minimum internally hardwired) grain size, Oms grain size variation, 0 grain separation and grate variation, 8-channel output with default spatialization values, minsize 1, delaylength 300, and a sampling rate of 96KHz. An AMD64 3000+ (1.8GHz) notebook running 32 -bit Windows XP and Wuschel's ASIO4ALL driver [5] with 512x4 internal buffering (lower internal latencies were not reliable even at very low CPU utilization, in part likely due to the lack of native ASIO support by the embedded sound chip), in the first test was able to generate 135 simultaneous voices per sample, bringing the total number of grains per second to 6,750. The second test's reliable ceiling was at 82,500 grains. A Macbook Pro 1.83GHz with a Core Duo processor running OSX 10.4.9 in the first test generated 150 simultaneous voices, with a total of 7,500 grains per second. In the second scenario, the 96,000 sampling rate upper limit was reached with Max/MSP's DSP Status panel showing a peak 79% CPU utilization, suggesting that output was limited by the sampling rate. Linux tests were conducted on the same hardware as Windows, namely a dual-boot AMD64 3000+ laptop. However, due to the fact that Linux's default audio driver (ALSA) [1] performs vastly better in native hardware sampling rates, instead of dealing with addedbuffering of asoundrc-based virtual devices [2], we opted for conducting tests using 48KHz, the native sampling rate of the onboard Realtek sound chip. We expected that this choice would thus generate near identical CPU overhead induced by the low-latency operation: since ALSA's direct access to hardware introduces virtually no additional buffering overhead, its performance was deemed equivalent (or as equivalent as possible) to that of an ASIO and Core Audio drivers on Windows and Apple platforms respectively. For this test PD was run in realtime mode (-rt flag) with pdwatchdog reniced to ensure elevated priority in the case of CPU overload. Linux's measured grain density in the first test was identical to that of its Windows counterpart suggesting that flext, STK, and consequently mungerl~ enjoy equivalent optimizations across platforms. The data produced by the second test, was generated using 48KHz sampling rate and is provided here purely for referential purposes as it lacks uniform (or near uniform) testing conditions. Nonetheless, ensuing data showed PD plateau at 48,000 grains per second with approx. 50% CPU utilization, suggesting equivalency to that of its Windows counterpart, and once again reaffirming external's CPUfootprint uniformity across platforms. As can be observed from the aforesaid benchmarks, mungerl~ exhibits near identical platform-agnostic performance. The supporting data also suggests that mungerl~ could potentially benefit from multithreaded optimizations which would warrant quantum leaps in performance on dual-core CPUs available in the Macbook Pro and other modern portables. Due to the absence of the aforesaid multithreaded design, its performance appears to be strikingly proportional to the CPU clock speeds. Preliminary tests have also shown that CPU footprint increases with a larger number of output channels. Therefore it is not unreasonable to expect attainability of greater densities using stereo output, than those presented above. 5.2. Art Since its milestone 1.0.0 release mungerl- has been utilized in at least two performance-based multimedia creations authored by Ivica Ico Bukvic, both of which are covered here in order to provide but a hint of its real-world performance. One of them is Pandora interactive multimedia work for color-based gesturetracking hyperinstrument, interactive visuals, voice, laptop and a quad output. Pandora relies heavily upon mungerl- in order to generate sustained reverb-like textures whose amplitudes are in part controlled via grain density, as well as produce dense spectral layers built from the captured vocal material and its concurrent pitch, length, and amplitude permutations. Given that the majority of Pandora's CPU overhead stems from real-time video processing, 3D rendering, and motion tracking, only one instance of mungerl~ was used. Perhaps more importantly this one instance has proven more than adequate in generating the aforesaid dense aural textures. Pandora was premiered in April 2007 as part of the debut DISIS event in Virginia, US. An audio-visual studio rendering of Pandora's performance is available at ht~tp: //ico bukvic. net/Video/. Second work titled Soul for baritone, audience, laptop, and quad output which was also premiered in April 2007 relies almost exclusively upon mungerl~'s diverse signal processing potential as well as its newfound ability to generate dense real-time granular textures. Inspired by Emily Dickinson's poetry, the 352

Page  00000353 work calls for three concurrent instances of mungerlwith discrete buffers. Each instance is connected to a simplified on-screen version of the gesture interface utilized in Pandora and is used at specific points throughout the piece. The work's closure reengages all three instances with their respective buffers intact in order to produce a dramatic cumulative effect, climaxing in a timbrally rich wall of sound. This particular gesture utilizes 120 (40 per instance) concurrent grains per sample. In addition to DSP techniques inherited from Pandora's interface, Soul also deals with a more subtle pitch detuning which is associated with ideas and motives inherent to the poetry. A recording of Soul's premiere featuring baritone Dr. Theodore Sipes is available at http: //ico.bukvic, net/Audio/. 6. FUTURE DEVELOPMENT While mungerl- is already a stable, production-ready external, its real-world use has identified several potentially useful additions. As a result, we have generated a roadmap with an aim to implement these in the near future. The following fixes are listed according to their priority. 6.1. ADSR Overhaul Although current ADSR envelope implementation generates default values for all channels and its grainspecific settings are alterable via "oneshot" events, currently there is no facility which would alter ADSR shape globally nor revert individual voice's ADSR "oneshot" alterations back to the global setting. There is also a consideration to enhance the ADSR model to include more than just four points inherent to the classical ADSR model. We aim to address the aforesaid deficiencies with the ADSR overhaul, including an "ads r" message which will provide the aforesaid global envelope alteration. 6.2. Spectral and Duration-Based Diffusion One of the new features is expansion of "discretepan" paradigm. In addition to the existing two methods of spatialization, we will also provide the following two modes: Mode 2 will generate spectral diffusion based on pitch Mode 3 will focus on spatializing all content according to the data received from a likely new inlet which will take two values: current grain swarm center channel (provided as a float-point virtual source), and the grain swarm spread surrounding the aforesaid center. Similar to the aforesaid mode 2, a "durationpan" and "amplitudepan" will be implemented to spatialize grains according to their duration and amplitude respectively. 6.3. Built-in Initialization of Multichannel Diffusion As reported in chapter 4, mungerl- when instantiated as a multichannel object will not output any audio until it receives "spatialize" command. Its future iterations will provide safe default spatialization values in order to enable immediate audio output. 6.4. Absolute Minimum Grain Size In order to attain greater resolution in the main DSP loop we will assess decreasing the absolute minimum grain size and its impact on the aural as well as CPU output. 6.5. Multithreaded model and Vectorization of Code Series of tests will be conducted to assess feasibility of a multithreaded design that will be capable of utilizing advantages of multi-core CPUs. This, coupled with code vectorization via SIMD [23] instructions should provide a considerable boost in performance. Currently there are no plans to pursue Altivec optimizations. 7. OBTAINING mungerlmungerl- is a free open-source GPL-licensed external which is currently downloadable from latest. tar.gz. In its latest version 1.3.1 released in May 2007, it comes with source, build packages for gcc and MSVC environments, help files, and prebuilt binaries for Max- Win32-i386, Pd-Linux-i386, and Max-Mac- UB. As always, contributions to code as well as submission of pre-packaged binaries for other platforms are most welcome. 8. CONCLUSION mungerl- is an open-source versatile, scalable, and platform-agnostic real-time granular synthesis external for Max/MSP and PD. Currently limited only by the raw CPU power, barring any fundamental API changes, it is future-proof while warranting minimal increase in the code maintenance overhead over its Max/MSP-native precursor. 9. ACKNOWLEDGMENTS Special thanks go to Dan Trueman and R. Luke DuBois for making and open-sourcing this great external, Thomas Grill for the incredibly useful and vastly underused flext layer, Perry R. Cook and Garry P. Scavone for STK, and obviously the entire PD and Max/MSP communities for making these tools arguably the most modular and versatile creative multimedia environments available today. 353

Page  00000354 10. REFERENCES [1] ALSA, "Advanced Linux Sound Architecture", Cited 2007; Available from http://www.alsaprogect~oro. [2] ALSA, "asoundrc file", Cited 2007; Available from htp://www~.alsa-project.or /alsa-doc/docphp/asoundrcnphp. [3] Apple, "Universal Binary Programming Guidelines, Second Edition", Jan. 2007, Cited 2007; Available from /Conc eptual/univers albinary/universal binarvpdf. [4] Arslan, B., Brouse, A., Castet,J, Filatriau,J.J., Lehembre,R. Noirhomme, Q. and Simon, C. "Biologically-driven musical instrument," Proceedings of eNTERFACE'05 Summer workshop on multimodal interfaces, Mons, Belgium, 2005. [5] ASIO4ALL, "Universal ASIO Driver For WDM Audio", Cited 2007; Available from http://www~.asio4all~com. [6] Bulka, D. and Mayhew, D. Efficient C++.. Performance Programming Techniques, Addison-Wesley Professional, 1st edition, 1999. [7] Cadiz, R. and Kendall, G. "Fuzzy logic control toolkit: real-time fuzzy control for Max/MSP and Pd", Proceedings of International Computer Music Conference, New Orleans, Lousiana, USA, 2006. [8] Cook, P. R. and Scavone, G. "The Synthesis Toolkit (STK)." Proceedings of the International Computer Music Conference, Beijing, China, 1999. [9] Cycling '74. "Max/MSP: A graphical programming environment for music, audio, and multimedia", Cited 2007; Available from http://www1cyclin 74.=om!products/mnaxmsp. [10] Cycling '74. "Jitter: A Brilliant Collection of Video, Matrix, and 3D Graphics Objects for Max", Cited 2007; Available from htt1://ww [11] Cygwin, "Cygwin", Cited 2007; Available from http:/iwww.c; winacom. [12] Danks, M. "Real-time image and video processing in GEM." Proceedings of the International Computer Music Conference, Thessaloniki, 1997. [13]DISIS, "Digital Interactive Sound and Intermedia Studio", Cited 2007; Available from http://disis.rmusic.vt9edu. [14] Gabor, D. "Acoustical Quanta and the Theory of Hearing." Nature 159 (4044): 591-594, 1947. [15] GNU, "General Public License (GPL)", Cited 2007; Available from http://wwwxwnuior/cowleft/gpl~html. [16] Grill, T. "flext - C++ layer for cross-platform development of Max/MSP and pd externals", Cited 2007; Available from http://gzrrrr~.osrsrext/flext/'. [17] Lippman, S. B. and Lajoie, J. "C++ Primer (3rd Edition)", Addison-Wesley Professional, 1998. [18] Lyon, E. "FFTease: A collection of Max/MSP objects implementing various forms of spectral sound processing", Cited 2007; Available from: htlp://\ware/M axMSP/FFTease. [19] Microsoft, "Visual C++ Developer Center", Cited 2007; Available from http://msdnrmicrosoft~com/visual'. [20] Propellerhead Software, "Reason,", 2003, Cited 2007; Available from http://xwww.1 hatsn ew rsn25.pdf. [21]Puckette, M. "Pure Data." Proceedings of International Computer Music Conference. San Francisco, 1996. [22] Pure Data. " Pure Data Computer Music System." 2007, Cited 2007; Available from [23] Stewart, J. "An Investigation of SIMD instruction sets", 2005, Cited 2007; Available from http://noisymime~org/blogimages/SIMD.p df. [24] The Synthesis ToolKit in C++ (STK), "STK ADSR envelope class", Cited 2007; Available from hltp://ccmna. ml. [25] Truax, B. "Real-time granular synthesis with a digital signal processor," Computer Music Journal, vol. 12, no. 2, pp. 14-26, 1988. [26] Trueman, D. and DuBois, R.L. "PeRColate", Cited 2007; Available from http://nusic.columbiaedu/PeRColat. [27] Trueman, D. and DuBois R. L., PeRColate manual, 2001, Cited 2007; Available from hfttp:// ate manual1pdf, 2001. [28] Xenakis, I. Formalized Music: Thought and mathematics in composition, Indiana University Press, 1971. 354