REAL-TIME, LOW LATENCY AUDIO PROCESSING IN JAVA

Juillerat, Nicolas; Arisona, Stefan Müller; Schubiger-Banz, Simon

« Prev Next »

In other words, the garbage collector can introduce any amount of jitter, and makes Java inherently unsuitable for real-time applications. In practice though, we found that the duration of garbage collections with the latest Sun JVM was never above one millisecond. As a general rule of thumb, it is always good to avoid garbage in the first place but our experience showed that even regular memory allocations (few objects per cycle) during processing will not affect the overall performance. The IBM JVM [9] on the other hand typically took about 8 milliseconds per garbage collection with our application. There is an additional point that is worth noting: we suggested in the previous section to use real-time priority threads. Unfortunately, the garbage collector runs in a separate thread over which we have no control. This thread does not run in real-time, but as a low priority thread because garbage collection is supposed to be a background task. Thus, when the garbage collector has to block the other running threads, the entire application is blocked, waiting on a low-priority thread that can be preempted at any time by other running applications. This phenomenon is known as priority inversion and can seriously affect the jitter. Fortunately, this problem is completely resolved by operating systems that implement priority inheritance: on such an operating system, if a low priority thread blocks a higher priority thread, the priority of the low priority thread is temporarily raised to that of the blocked thread. The latest Linux kernels implements priority inheritance, making jitter problems related to garbage collection nearly unnoticeable. 2.5. Inherent Audio Processing Latencies Finally, there are some audio effects that have some inherent latency that adds up to the total latency of the system. Examples are all effects working in the frequency domain: it is necessary to work with buffers whose sizes must be large enough to capture the lowest frequencies we want to process. A pitch shifter for instance typically requires buffers of at least 10 milliseconds to perform acceptable results, and even more to perform good results. We refer to the buffers discussed in section 2.1 as the audio buffers, and to the buffers required for a particular audio effect as the effect buffers. Setting the audio buffers size to the size of the effect buffers is a possible, but not optimal solution: both the record delay and the synchronization delay discussed in section 2.1 are proportional to this size. A better solution is to use a fraction of the effect buffer size for the audio buffer size: multiple captured audio buffers are joined together to form a single effect buffer, and each effect buffer is then split back into multiple audio buffers after the transformation to feed the audio playback device. The advantage of this scheme is that the effect buffer size only affects the record delay. The synchronization delay still depends on the audio buffer size, which is a fraction of the effect buffer size only. There is a small issue though: the processing of an effect buffer must be fast enough to fit within the duration of an audio buffer. We have to ensure that the first part of the result, corresponding to an audio buffer, can be played in time. This can be a problem with CPU intensive effects if the audio buffer size is much smaller than the effect buffer size. 3. CURRENT STATE In this section, we discuss the current state of our framework and give preliminary results on practical tests. 3.1. Decklight 4 Decklight 4 is our audio processing framework written in Java on Linux. It implements all the ideas presented in this paper. It uses RtAudio for audio I/O and sets realtime priority threads for the audio processing. Various audio effects are implemented. They range from simple time-domain effects such as filters and echoes, to complex Fourier-domain effects such as pitch shifting. The only part that is written in C is the access to RtAudio and to the Linux threading API. In both cases the C code is a straightforward mapping between the actual API and Java methods and is thus portable to all platforms where RtAudio is available (notably Windows and Mac OS X). The rest of the application, including all audio effects and the fast Fourier transform are written entirely in Java. The actual architecture of the framework is based on our previous projects Decklight 2 and 3 [2] and is thus not described here again. The only notable difference in Decklight 4 is the introduction of the Java language, and the focus on audio processing only. 3.2. Results In order to validate our theory, we implemented two audio effects on top of Decklight 4, and we compared the theoretical latencies against experimental measurements. To make the measurements, we sent the left output of a mono audio signal through our application, and we mixed the result with the right channel that was directly wired. The result was recorded in an audio editor on another machine, and the latency was computed using the delay between the left and right channels. The first audio effect was a ten-band equalizer using a cascade of Butterworth band-pass filters. The latency inherent to this effect itself is negligible. The audio buffers were set to a size of 16 samples. The second effect was a pitch shifter working in the frequency domain. It uses overlapped windows of 512 samples at a sample rate of 48000Hz. Its inherent latency is thus about 10.67 milliseconds. The audio buffer size was set to 32 samples. The application was running with the Sun JVM version 6 on a Linux machine. The sound hardware was a Realtek SoundBlaster card. The audio driver was accessed by RtAudio through JNI. Table 1 shows the theoretical and measured latencies with these two effects. This table reveals that there is only a small difference between the theoretical and measured latencies. These differences correspond to the duration 101

« Prev Next »