Page  121 ï~~Sound Database Retrieved by Sound Shuji Hashimoto Hai Qi Dingding Chang Dept. of Applied Physics, Waseda University ARCSE, Waseda University Okubo 3-4-1, Shinjuku, Tokyo 169 Japan Email: {shuji, qihai, dchang} @shalab.phys. waseda. ac.jp ABSTRACT This paper will propose a new type of sound database system which use sound as a key for data retrieval and has sound processing module to create new sound which is not included in in database. 1 Introduction Sound database may be useful for composers, sound designers and sound directors who often search sounds suitable for their works. It is, however, difficult to reach the objective sound in a short time using keywords in data retrieval. Because sound cannot be described precisely to make the labeling or indexing of sound data troublesome. Recently, some sound retrieval systems have been reported. Sound editing, mixing can be integrated with database application [Blum et al] [Keislar et al]. Other researchers have described an analogous database for sound synthesis [Vertegaal & Bonis]. Neural nets provide a different way to find a mapping between acoustical attributes a and perceptual properties p, which can be used for the automatic indexing of a sound database [Feiten & Gunzel]. This paper will propose a new type of sound database system which use sound as a key for data retrieval and so that peoples can search their wanted sound easily. 2 System Overview The architecture of the proposed sound database system is shown in Figure 1. The system extracts temporal features and spectrum features from sound data, sorts these features in a reference table and builds the database by using them as index keywords. After a user gives a reference sound under the system's direction, the system will characterize this sound in the same manner as the storaged sound data. By calculuting the distance between the reference sound and storaged sound, five similar sounds will be refered to the user at different locations using stereo output, and the user can select the most likely one among them. Be cause this system have the data modification ability, users can reach the objective sound easily. 3 Sound Characterization and Data Storage 3.1 Recording module and sound database file This module records an input of real sound, and convert it into digital data to create a sound data unit with the parameters: 44.1kHz sampling rate, 16 bits bits per data, and 65536 data per unit. As shown in Figure 2, the sound data unit that were created by the recording module, are storaged in the sound database file one by one. Because there are same data numbers in every sound data unit, we can compute the start relational address of a unit from the unit number easily. For example, if an unit is No.i in the sound database file, we can know that the unit starts at a relational address that is (i- 1) x 65536+ 1. 3.2 Feature extracting module and feature parameter table file Characterization of this system is performed in both time domain and frequency domain for normalized wave data. For an effective description of sound with limited parameters, we can describe it similarly by thinking the envelope (Figure 3-a) of the digital sound as that are shown in (Figure 3-b). By this method, six parameters will be extracted as temporal features. They are attack time t 1, decay time t2, sustain time t3, release time t4, max output level L1, and sustain level L2. On the other hand, the spectrum features including a fundamental frequency F1, envelope factors F2, F3, F4, L3, L4, L5, and tonality factor F5 are extracted by FFT. The ICMC Proceedings 1996 121 Hashimoto et al.

Page  122 ï~~amplitade envelop S'(a) 1 tin. ila amplitade - "' r " " 1 1 1i 1i 1i L1 L2 attack decay sustain release (b) Figure 3: Temporal feature extraction Figure 1: System overview F2 F3 F4 spectum envelope of the sound is shown as Figure 4. We will extract the spectrum features by a similar approach as that used in the extraction of temporal features. The most important parameter in 1 2 i sound database file data unit data unit Figure 4: Spectum feature extraction spectrum features is the fundamental frequency, it will be extracted at first and named as F1, The next group of parameters are the envelope factors. There may be many peaks in the spectum envelope, we will select the biggest three among them and extract their peak levels (L3,L4,L5) and frequencies (F2,F3,F4). Apart from these, the ratio of power (Pt) which is the sumary of the fundamental frequency and its harmonic frequencies such as F1,2F1,3F1,4F1....., as shown in Figure 4, and the power (Pa) which is the sumary of all the frequency components can be defined as the tonality factor. These parameters will be sorted in the feature parameter table file as a record and used for data retrieval. And the feature parameter table file will be made up of the records of all sound data unit. The module extracts the feature parameters from data unit Figure 2: Sound database file Hashimoto et al. 122 ICMC Proceedings 1996

Page  123 ï~~sound data unit, storages the unit into sound database file, then use the number of this unit storaged in database file and the extracted parameters to make a record, and storage it in feature parameter table file. 4 Key Sound and Data Retrieval 4.1 Input process module This module accept an user's input sound and convert it into the reference feature parameters that have the same form as the sound feature parameters that were storaged in feature parameters table file, we call them t l,, t2r, t3r, t4r, Lir, L2r, L3r, L4r, L5r, Fir, F2r, F3r, F4r, Fr. 4.2 Compare module and sound data reading module The compare module computes the distance between the reference data and storaged data with 2 4 tkri- tki 2 5 Lkr-Lki)2 k= k=1l Figure 5: Stereo module Fk%,- Fki k-1 (1) where tli, t2i, tai, 4i, Lli, L2i, L3i, L4i, Lsi, Eli, F2i, F3i, F4i, F5i are record i in feature parameters table file. When Ei < EE, it means that sound pointed by record i is more similar than the sound pointed by record j. Then, we can select five similar data from database file by comparing E and using sound data read module. 4.3 Stereo module and effect module The stereo module changes five selected mono sound data into stereo sound, and make an effect that the five sound seems at different locations, so that user can easily compare the output sounds and distincts the most likely one among them by the spacial location of sound. We can realize the function likes that by the method shown as Figure 5. Because the expected data may exist not in the database but in user's brain, the most similar data in database may be unfit for user's expectting. In order to solve this problem, the system provides the following function. User can changes the feature parameters of selected sound with the direction from system, the effect module will modifies the selected data with the changed parameters. This module can repeat the modification until the user is satisfied with the last sound. 4.4 User's private file The format of this file is AIFFC, and the name of this file is named by user oneself. The system will storage the last sound into this file so that the user can use it in other system. 5 Conclusion This paper presented a new type of sound database which use sound as a key for data retrieval. The proposed system is not only a sound data base but also interactive sound creation system to search a new sound from sound associatively using search function and sound processing function. The authors consider that data base system must has a data modification ability beacuse the objective sound may exist not in the data base but in your brain. References [Blum et al] T. Blum, D. Keislar, J. Wheaton, and E. Wold, "Audio Database with Content-Based Retrieval", Proc. IJCAI'1995, Workshop on Intelligence Multimedia Information Retrieval, 1995. [Keislar et al] D. Keislar, T. Blum, J. Wheaton, and E. Wold, "Audio Analysis for Content-Based Retrieval", Proc. 1CMC'1995, pp.199-202, 1995. [Vertegaal & Bonis] R. Vertegaal and E. Bonis, "ISEE: An Intuitive Sound Editing Environment", Computer Music Journal, Vol.18, No.2, pp.21-29,, 1994. [Feiten & Gunzel] B. Feiten and S. Gunzel, "Automatic Indexing of a Sound Database Using Self-organizing Neural Nets.", Computer Music Journal, Vol.18, No.3, pp.53-65,, 1994. ICMC Proceedings 1996 123 Hashimoto et al.