ï~~Proceedings of the International Computer Music Conference (ICMC 2009), Montreal, Canada
August 16-21, 2009
JACKTRIP: UNDER THE HOOD OF AN ENGINE FOR NETWORK AUDIO
Juan-Pablo Cdceres & Chris Chafe
Center for Computer Research in Music and Acoustics (CCRMA)
Stanford University
{j caceres, cc}@ccrma. stanford.edu
ABSTRACT
The design of a platform for bi-directional musical performance using modern WAN networks poses several challenges that are different from related applications, e.g.,
synchronous LAN studio systems or uni-directional WAN
streaming. The need to minimize as much as possible audio
latency and also maximize audio quality requires specific
strategies which are informed, in part, by musical decisions.
We present some of the key design elements of the JackTrip application which has evolved through several years of
deployment in musical work over wide-area networks.
1. INTRODUCTION
The SoundWIRE group at CCRMA' focuses on experiments with bi-directional musical performance. Concerts
and rehearsals between Stanford and places like New York,
Belfast, Banff, Beijing, or Santiago are now routine.
JackTrip is the application which powers up these online collaborations. Presently, it's a Linux and Mac OS Xbased system which supports multi-machine network performance over best-effort Internet. The technology being
used builds on early work by research groups at McGill University [11] and Stanford University [7]. The basic approach
is to send uncompressed audio (avoiding the latency introduced by compression encode/decode algorithms) through
high-speed links like Internet2. It supports any number of
channels (as many as the computers or network paths can
handle). Since best-effort network protocols are used, adequate network provisioning is a must.
The subject of this article is JackTrip's design relating
to several issues that come up in implementing such a system. It is hoped that these solutions can serve as a point-ofdeparture for further applications in this same area.
The design achieves (i) the highest audio quality possible, by using uncompressed linear sampling and redundancy
to recover from packet loss; (ii) throughput maximization,
which gets audio packets onto and off of the network as soon
as the sound card can deliver them; (iii) working with any
1 bfp://ccrma.stanbford.edugroups/soundwire!.
number of channels (depending on available computer processing power and bandwidth); (iv) flexibility in routing and
mixing audio channels from and to the different hosts.
1.1. Peer-to-peer Network Audio Latency
WAN connections inevitably introduce transmission delays
between two or more hosts. For non-interactive and "soft"
real-time applications, this delay is less of a problem than
for high-quality collaborative music performance. The latter
places extremely stringent bounds on latency and jitter. The
longer the audio latency between musicians, the harder it is
for them to play synchronously [5]. Time delays as short
as 25 milliseconds are already problematic for professional
ensembles like string quartets.2
It's the total delay between sound capture and sound
projection which counts. This splits out into (i) acoustic (air path) delays, e.g., the distance between an instrument and the capture microphone and between the speakers
and ears; (ii) analog to digital and digital to analog conversion (ADC/DAC) delay, i.e., the time it takes for an analog
source to be transformed into digital and back; (iii) settings
chosen for audio quality and packetization, including audio sampling rate and bit depth resolution, buffer and packet
sizes, and others; (iv) network transmission delays, including physical (geographical) distance, transmission delays induced by switches, routers, firewall and network congestion
among others.
The default transport protocol in JackTrip is UDP, a lowoverhead, fast mechanism for transmitting packets (see [9]
for a good description). The application's own header data
accompanies each audio packet to describe local properties
like audio buffer size, sampling rate, bit depth, number of
channels, a sequence number and a time stamp.
Currently, JackTrip uses Jack [3] as its host audio server.
Jack has several advantages: it runs on Linux and Mac OS
X, it has the ability to make audio connections between
many different audio clients on the same host, and its current implementation takes advantage of multi-processor machines [10].
2Recordings of experiments with the St. Lawrence String Quartet are
available at http://ccnrm.sanfordeduigroups/soundwireresearch/s sqi.
509