In other words, the garbage collector can introduce any
amount of jitter, and makes Java inherently unsuitable for
real-time applications. In practice though, we found that
the duration of garbage collections with the latest Sun
JVM was never above one millisecond. As a general rule
of thumb, it is always good to avoid garbage in the first
place but our experience showed that even regular memory allocations (few objects per cycle) during processing
will not affect the overall performance. The IBM JVM [9]
on the other hand typically took about 8 milliseconds per
garbage collection with our application.
There is an additional point that is worth noting: we
suggested in the previous section to use real-time priority threads. Unfortunately, the garbage collector runs in
a separate thread over which we have no control. This
thread does not run in real-time, but as a low priority
thread because garbage collection is supposed to be a
background task. Thus, when the garbage collector has
to block the other running threads, the entire application
is blocked, waiting on a low-priority thread that can be
preempted at any time by other running applications. This
phenomenon is known as priority inversion and can seriously affect the jitter. Fortunately, this problem is completely resolved by operating systems that implement priority inheritance: on such an operating system, if a low
priority thread blocks a higher priority thread, the priority of the low priority thread is temporarily raised to that
of the blocked thread. The latest Linux kernels implements priority inheritance, making jitter problems related
to garbage collection nearly unnoticeable.
2.5. Inherent Audio Processing Latencies
Finally, there are some audio effects that have some inherent latency that adds up to the total latency of the system.
Examples are all effects working in the frequency domain:
it is necessary to work with buffers whose sizes must be
large enough to capture the lowest frequencies we want
to process. A pitch shifter for instance typically requires
buffers of at least 10 milliseconds to perform acceptable
results, and even more to perform good results. We refer
to the buffers discussed in section 2.1 as the audio buffers,
and to the buffers required for a particular audio effect as
the effect buffers.
Setting the audio buffers size to the size of the effect
buffers is a possible, but not optimal solution: both the
record delay and the synchronization delay discussed in
section 2.1 are proportional to this size. A better solution
is to use a fraction of the effect buffer size for the audio buffer size: multiple captured audio buffers are joined
together to form a single effect buffer, and each effect
buffer is then split back into multiple audio buffers after
the transformation to feed the audio playback device.
The advantage of this scheme is that the effect buffer
size only affects the record delay. The synchronization
delay still depends on the audio buffer size, which is a
fraction of the effect buffer size only. There is a small issue though: the processing of an effect buffer must be fast
enough to fit within the duration of an audio buffer. We
have to ensure that the first part of the result, corresponding to an audio buffer, can be played in time. This can be
a problem with CPU intensive effects if the audio buffer
size is much smaller than the effect buffer size.
3. CURRENT STATE
In this section, we discuss the current state of our framework and give preliminary results on practical tests.
3.1. Decklight 4
Decklight 4 is our audio processing framework written in
Java on Linux. It implements all the ideas presented in
this paper. It uses RtAudio for audio I/O and sets realtime priority threads for the audio processing. Various
audio effects are implemented. They range from simple
time-domain effects such as filters and echoes, to complex Fourier-domain effects such as pitch shifting. The
only part that is written in C is the access to RtAudio and
to the Linux threading API. In both cases the C code is a
straightforward mapping between the actual API and Java
methods and is thus portable to all platforms where RtAudio is available (notably Windows and Mac OS X). The
rest of the application, including all audio effects and the
fast Fourier transform are written entirely in Java.
The actual architecture of the framework is based on
our previous projects Decklight 2 and 3 [2] and is thus
not described here again. The only notable difference in
Decklight 4 is the introduction of the Java language, and
the focus on audio processing only.
3.2. Results
In order to validate our theory, we implemented two audio
effects on top of Decklight 4, and we compared the theoretical latencies against experimental measurements. To
make the measurements, we sent the left output of a mono
audio signal through our application, and we mixed the
result with the right channel that was directly wired. The
result was recorded in an audio editor on another machine,
and the latency was computed using the delay between the
left and right channels.
The first audio effect was a ten-band equalizer using a
cascade of Butterworth band-pass filters. The latency inherent to this effect itself is negligible. The audio buffers
were set to a size of 16 samples. The second effect was
a pitch shifter working in the frequency domain. It uses
overlapped windows of 512 samples at a sample rate of
48000Hz. Its inherent latency is thus about 10.67 milliseconds. The audio buffer size was set to 32 samples.
The application was running with the Sun JVM version
6 on a Linux machine. The sound hardware was a Realtek SoundBlaster card. The audio driver was accessed by
RtAudio through JNI.
Table 1 shows the theoretical and measured latencies
with these two effects. This table reveals that there is only
a small difference between the theoretical and measured
latencies. These differences correspond to the duration
101