ï~~Proceedings of the International Computer Music Conference (ICMC 2009), Montreal, Canada August 16-21, 2009 JACKTRIP: UNDER THE HOOD OF AN ENGINE FOR NETWORK AUDIO Juan-Pablo Cdceres & Chris Chafe Center for Computer Research in Music and Acoustics (CCRMA) Stanford University {j caceres, cc}@ccrma. stanford.edu ABSTRACT The design of a platform for bi-directional musical performance using modern WAN networks poses several challenges that are different from related applications, e.g., synchronous LAN studio systems or uni-directional WAN streaming. The need to minimize as much as possible audio latency and also maximize audio quality requires specific strategies which are informed, in part, by musical decisions. We present some of the key design elements of the JackTrip application which has evolved through several years of deployment in musical work over wide-area networks. 1. INTRODUCTION The SoundWIRE group at CCRMA' focuses on experiments with bi-directional musical performance. Concerts and rehearsals between Stanford and places like New York, Belfast, Banff, Beijing, or Santiago are now routine. JackTrip is the application which powers up these online collaborations. Presently, it's a Linux and Mac OS Xbased system which supports multi-machine network performance over best-effort Internet. The technology being used builds on early work by research groups at McGill University [11] and Stanford University [7]. The basic approach is to send uncompressed audio (avoiding the latency introduced by compression encode/decode algorithms) through high-speed links like Internet2. It supports any number of channels (as many as the computers or network paths can handle). Since best-effort network protocols are used, adequate network provisioning is a must. The subject of this article is JackTrip's design relating to several issues that come up in implementing such a system. It is hoped that these solutions can serve as a point-ofdeparture for further applications in this same area. The design achieves (i) the highest audio quality possible, by using uncompressed linear sampling and redundancy to recover from packet loss; (ii) throughput maximization, which gets audio packets onto and off of the network as soon as the sound card can deliver them; (iii) working with any 1 bfp://ccrma.stanbford.edugroups/soundwire!. number of channels (depending on available computer processing power and bandwidth); (iv) flexibility in routing and mixing audio channels from and to the different hosts. 1.1. Peer-to-peer Network Audio Latency WAN connections inevitably introduce transmission delays between two or more hosts. For non-interactive and "soft" real-time applications, this delay is less of a problem than for high-quality collaborative music performance. The latter places extremely stringent bounds on latency and jitter. The longer the audio latency between musicians, the harder it is for them to play synchronously [5]. Time delays as short as 25 milliseconds are already problematic for professional ensembles like string quartets.2 It's the total delay between sound capture and sound projection which counts. This splits out into (i) acoustic (air path) delays, e.g., the distance between an instrument and the capture microphone and between the speakers and ears; (ii) analog to digital and digital to analog conversion (ADC/DAC) delay, i.e., the time it takes for an analog source to be transformed into digital and back; (iii) settings chosen for audio quality and packetization, including audio sampling rate and bit depth resolution, buffer and packet sizes, and others; (iv) network transmission delays, including physical (geographical) distance, transmission delays induced by switches, routers, firewall and network congestion among others. The default transport protocol in JackTrip is UDP, a lowoverhead, fast mechanism for transmitting packets (see [9] for a good description). The application's own header data accompanies each audio packet to describe local properties like audio buffer size, sampling rate, bit depth, number of channels, a sequence number and a time stamp. Currently, JackTrip uses Jack [3] as its host audio server. Jack has several advantages: it runs on Linux and Mac OS X, it has the ability to make audio connections between many different audio clients on the same host, and its current implementation takes advantage of multi-processor machines [10]. 2Recordings of experiments with the St. Lawrence String Quartet are available at http://ccnrm.sanfordeduigroups/soundwireresearch/s sqi. 509
Top of page Top of page