Page  153 ï~~A Platform for Real-Time Perceptually-Based Audio Data Reduction Tom Maglione and Barry Vercoe MIT Media Lab Music & Cognition Group 20 Ames Street, E15-484F Cambridge, Massachusetts 02139 maglione@media.mit. edu bv@media.mit.edu (617) 253-2143 Abstract The MIT Real-Time Audio Processor (RTAP) board for the Mac-II has been recently enhanced to include digital stereo AES/EBU compatible inputs and outputs, larger amounts of high-speed Static and Dynamic RAM, and the processing power of two DSP's (27 MIPs) coupled by Dual-Ported Static RAM (SRAM). This real-time architecture is facilitating interactive research in Digital Audio Broadcasting (DAB) bandwidth conservation. The motivations involved in the design will be reviewed, and the perceptually-based techniques it supports will be illustrated. 1.0 Background The RTAP project began four years ago by providing high-speed digital signal processing power for music synthesis and analysis research, and has since evolved to support numerous research projects at the Media Lab. Recently, the RTAP project has begun to support research by the Television of Tomorrow (TVOT) group, in conjunction with the Cheops project [Cheops 90], whose goal is to provide a variable-resolution transmission channel for digital video and accompanying audio for future television transmissions. The goal of the audio portion of this project is to encode speech, music and ambience for rate-varying packet transmission over a multiplexed audio/video channel. Other supported research projects include synthetic performers, [Vercoe 84,89], synthetic listeners, synthetic spaces (artificial acoustic ambience), and cognitive audio processing. 2.0 History The first RTAP board [Boynton & Cummings 88] used one DSP56001 running at 20.0 Mhz with eight kilowords of memory, and interfaced directly with 16-bit serial A/D and D/A converters. Limitations in the converter subsystems led to the development of an auxiliary circuit board which converted straight serial binary audio data to the AES/EBU standard format, with simultaneous availability of AES/EBU A/D and D/A converters. The next version (version 1.2) of the RTAP board (Peterson 90] ran both processors at 25.0 Mhz., increased the memory space to 28 kilowords of Static RAM and one megaword of Dynamic RAM, moved the AES/EBU format conversion on-board and added a second D5P56001 processor to handle the AES/EBU format conversion processing. ICMC 153

Page  154 ï~~The present RTAP architecture (version 1.3) was designed to reduce power dissipation and overcome trace density difficulties with the previous layout. It consists of two Motorola DSP56001 Digital Signal Processors (DSP's) running at 27.0 Mhz. with the same memory capacity as version 1.2, all accessible from the Mac-II NuBus and mapped such that multiple RTAP boards can operate on the same NuBus. Most recently, new integrated circuits from Crystal Semiconductor Corporation [Crystal 90,91] allow simultaneous transmission and reception of two-channel stereo AES/EBU formatted signals with the maximum word length of 24 bits per channel. Although most audio conversion equipment operates at a resolution of 16 bits per channel, the 24-bit word length is required to maximize the available bandwidth for transmission and reception of encoded audio data signals. 3.0 Architecture The RTAP architecture (figure 1) provides a scalable parallel processing environment for development of real-time digital audio signal processing techniques using one to four RTAP boards per NuBus. One RTAP board provides 27.0 MIPs of integer digital signal processing power as a slave device on the Mac-II NuBus. The Interface Controller (IC) processor handles the AES/EBU transmitter and receiver devices, and can provide some audio signal processing operations (windowing, filtering, etc.) before handing the audio data to the other processor. The Audio Processor (AP) handles most of the signal processing algorithms and serves as the primary interface with the Mac-II NuBus. The IC and AP processors communicate via the dual-ported Static RAM. Each RTAP provides read/write access to the host ports of both on-board DSP's, to all of the on-board memory segments, and to the host ports and memory segments of up to three other RTAP boards. The RTAP memory system consists of eight kilowords each of X-data, Y-data and P-program zero-wait-state Static RAM memory for the Audio Processor, and four kilowords of dual-ported Static RAM for interprocessor communications. Additionally, the AP processor has access to one megaword of Dynamic RAM, for use in long data delay lines and reverberation algorithms. All RTAP memory and host port interfaces are mapped on the NuBus according to the NuBus slot occupied by the RTAP board. The AP processor can also serve as a master of the NuBus Mac-II backplane bus; this bus mastership can be used to implement various synchronization techniques useful in a multi-RTAP board configuration. The NuBus mastership capability supports various parallel processing protocols in a MIMD (Multiple-Instruction Multiple Data) programming environment, including simple semaphore synchronization primitives, message passing paradigms [Vercoe 89], shared memory organizations and a dataf low paradigm [Peterson 90]. The inherent architectural flexibility even allows multiple RTAP boards to operate in a simulated SIMD (Single Instruction Multiple Data) environment. ICMC 154

Page  155 ï~~4.0 Audio Data Rate Reduction Techniques The RTAP architecture is being used to study various methods for reduction of the digital audio data rates. The goal is to retain the perceptual quality available from the Compact Disc (sampling stereo signals at 44,100 samples per second) while reducing the transmission bandwidth requirements from about 700 kilobits per second to 128 kilobits per second for a monophonic audio channel. The psychoacoustic phenomenon of critical-band masking [Zwicker 91] is to be the primary mechanism used to reduce the data rates (compress the audio data) based on reduction or removal of the irrelevant data in the audio signal. Current compression methods include MASCAM [MASCAM 88] and MUSICAM [MUSICAM 90], and rely on block FFT transform operations and filtering using polyphase quadrature filters, both supported by the RTAP architecture. The RTAP will be used to investigate the potential for real-time implementation of the encoding and decoding sides of the audio data compression operation. Recent articles have shown a lack of concensus [Spectrum 91] regarding competing methods for digital data rate reduction, and proponents of various methods have admitted some of the shortcomings of the critical-band masking methods when handling musical audio data with much transient information. These facts, combined with the desirability of scalable audio resolution indicate the need for further research into the problem of compression for data rate reduction. 5.0 Planned Future Enhancements The RTAP architecture has been modularly designed to be easily upgraded with enhancements as they become available. Motorola has recently announced the availability of DSP56001 processors operating at a 33.0 Mhz. clock rate, which would increase the RTAP processing power to 33.0 MIP's. There have been recent announcements in the press regarding the eventual speedup of the NuBus backplane from 10.0 Mhz. to 20.0 Mhz. via the NuBus 90 architecture, raising RTAP/RTAP and host/RTAP board communication speeds. The DSP speedup can be accommodated by changing clock rates and providing faster decoding logic and SRAMs; the NuBus speedup should be automatic since the current design operates synchronously with the NuBus clock. Both of these enhancements will provide additional processing power, and should be quickly available to the RTAP architecture. Reference s: Boynton & Cummings 88: Lee Boynton and Dave Cumming, "A Real-Time Acoustic Processing Card for the Mac II"",MIT Media Lab paper, March 1988. Cheops 90: John Watlington and V. Michael Bove, Jr., "The Cheops Imaging System", MIT Media Laboratory Internal Memo, October 1990. Crystal 90: Crystal Semiconductor Corp. CS8401/CS8402 Digital Audio Interface Transmitter data sheet, November 1990 ICMC 155

Page  156 ï~~Crystal 91: Crystal Semiconductor Corp. CS8411/CS8412 Digital Audio Interface Receiver data sheet, April 1991. MASCAM 88: Gerhard Stoll, Gunther Theile, Martin Link, "MASCAM: Using Psychoacoustic Masking Effects for Low-Bit-Rate Coding of High Quality Complex Sounds", Proceedings of the Marcus Wallenberg symposium on Structure and Perception of Electroacoustic Sound and Music, Lund, Sweden, August 21-28, 1988. MUSICAM 90: Gerhard Stoll and Detlef Wiese, "High Quality Audio Bitrate Reduction Considering the Psychoacoustic Phenoma of Human Sound Perception", Proceedings of the International Symposium on Subjective and Objective Evaluation of Sound, Poznan, Poland, September 25-27, 1990, pages 281-291. Peterson 90: Peterson, K. "Pseudo-Static Scheduling of Music Processing Algorithms for a Small-Scale Multiprocessor", M.S. Thesis, MIT Media Lab & Dept. of Electrical Engineering, August, 1990. Spectrum 90: IEEE Spectrum, Volume 28, Number 7, July 1991, corrections on page 16 column 3. Vercoe 84: Vercoe, B. "The Synthetic Performer in the Context of Live Performance", ICMC Proceedings 1984, pp. 199-200. Vercoe 89: Vercoe, B. "The Macli Real-Time Audio Processor: Reference Manual", MIT Media Lab, 1989. Zwicker 91: Eberhard Zwicker and U. Tilmann Zwicker, "Audio Engineering and Psychoacoustics: Matching Signals to the Final Receiver, the Human Auditiry System", Journal of the Audio Engineering Society, Volume 39, Number 3, March 1991, pages 115-126. NuBUS Figure 1: Real-Time Audio Processor Versionl.3 ICMC 156