Page  00000446 Subjective Preference Oriented Global Sound Database Pitoyo Hartono' Kenji Suzuki2 Hai Qi2 Shuji Hashimoto2 'Advanced Research Institute for Science and Engineering, Waseda University 2Department of Applied Physics, Waseda University 1. Introduction Most of the database systems adopted indexing methods that were based on the objective logical features of the data [1]-[3], imposed by the systems designers. To extract a particular content from the database, the users are required to have knowledge on the logical features that are used to index the data. While this characteristic is good for database systems with logical contents, it is a great disadvantage for database with subjective contents, like sounds. It will be very hard for the users, if for searching a particular content from the database the users have to specify the logical (in this case the physical) characteristics of the content. Oppositely because sounds imposed subjective psycoacoustical effects on human, it will be very helpful to build a database system, that permit the user to make queries using subjective preference oriented searching key. For example, consider a person creating sound effects for background music. The person may have an ideal sound in mind, but it is not always easy to refer the sound using its acoustical or physical features. It will be more intuitive and less stressful to mimic the sound to search for similar sounds in the database. In this paper, we propose a sound database system, which is able to make query, using sound as the query-key. The proposed system extracts a number of parameters that characterize the timbre of a sound, which is one of factors that impose subjective impressions that differ from person to person. To deal with the different subjective preference of each user, the system is equipped with an adaptive query mechanism that gradually absorbing the preference of the user in searching the database, which will shorten the searching time. We have proposed a content-based sound database system in the past [4]. In the previous system, the data are indexed by their spectral characteristics, which does not necessarily strongly correlate with the users' subjective sound impressions, while in the proposed system we utilize timbre parameters that accommodate the user's subjective preference better to employ an adaptive searching mechanism. Another significant improvements are: 1. In the previous system, the sound data are located locally, while in the proposed system the search range includes the Internet space. The seamless integration of sound data in the local storage and sound data in the Internet web sites will greatly enrich the database. 2. The automatic modification of the available sounds to generate new sounds in its idle time. These two newly introduced functions that enable the system to deal with various demands of sounds, are a great contribution in increasing the usability of the database. "Global database" throughout this paper, refers to a database system that has the ability to actively searching the Internet for new sounds and modified them to enrich the sound collections. 2. System Outline The outline of the system is shown in Figure 1. For the locally registered sound, the system stores the sound in the local database and the related indexes in the index database, while for the sound that was found in the Internet, the system stores the URL of the sound in the URL database. The URL search engine continuously searches the Internet for new sounds of which durations are less than 5 seconds. The seamless integration of local sound data with the ones located in the Internet and active sounds generation, will greatly enrich the database, so that it can satisfy the demands of many users with different preferences. The system also modifies the locally existing sound by executing pitch shifting, sound filtering and sounds mixing to create new sounds using the sound modifier. To implement the proposed system efficiently, a host-client relation is built. In this system the local sound database and the URL database is attached to a host computer. The host computer can be accessed by an arbitrary number of client computers, which keep only the parameter database locally. The task of the client system is to extract the parameters for the inputted sound and execute the index matching, while the task of the 446

Page  00000447 host system is to supply the sound demanded by the client, search new sounds from the Internet and create new sounds. This host-client system is built using JAVA and the interface using dbias r*trirevAl I#arl JavaScript, which will make the whole system compatible with most of the existing operating systems. In re~? t ra tion preltereiftce dwi4ibw ~wir^ 4 NIeii~~ -md~:i~~ key sowincl Itioc at sound:'iclusid flo'llk " thibroe pra nci~: c ri c Ni Actalm Interanet | Iter l Figure 1. System Outline Sound indexing In the registration process, the system extracts 64 parameters that characterize the timbre of the inputted sound and used them for indexing the sound. The timbre parameters are extracted from the envelope of the sound that is known to generate perceptual effect for human[5]. Figure 2 shows the original sound envelope and the simplified envelope that is used for parameters extraction. The parameters are 5 envelope parameters (the lengths of attack, decay, sustain release, and the ratio of sustain level to the maximum amplitude), 15 spectral parameters, which are physical parameters of attack and sustain periods, and 44 harmonic features of the sound envelope [6]. Amplitude Origal sound envelope Adaptive searching mechanism In the searching process, the system will extract the timbre parameter from the key sound, execute the index matching and extract the sound locally or from the Internet. Sstart query key sound tIimb re parameter 1 extraction database searching presentation of sound candidates Ssound selection ideal. yes sound |noj preference weights renewal end Figure 3. Searching Procedure The sound searching procedure is shown in Figure 3. The system executes repetitive searching mechanism, in which each query yields five sounds, which have the least distances to the key sound. The distance is defines as follows, 64 (X, - Xj)2 Ek = a (1) j=l (1)K K Xj (Xj > 0.1) Kj = 0.1 (Xj < 0.1) where Ek denotes the distance between the key (a) Sound envelope Time Amplitude S..................................... M axim um a ttack d decay...................Sustain Level sustain: release (b) Envelope model Time Figure 2. Sound Envelope 447

Page  00000448 sound and the k-th sound in the database. Xj and Xky show the j-th parameters of the key sound and the k-th sound in the database, respectively. The weight of thej-th parameter is denoted by aj. The retrieved five sounds are presented in order with different localizations using a stereo sound diffusion system. The user selects the most preferable one from them. If the selected sound is a satisfactory one, then the searching process is terminated, otherwise the next search will start using the selected sound as the key sound after the preference adaptation. To adapt the subjective preference of the users, the system executes weights renewal for every searching repetition. The weight renewal is executed as follows, (Equation 1) becomes the smallest. Changing the preference weights is equivalent to changing the scale of each parameter, so that eventually the query is executed inside the user's preference space. Sound 2 Sound 1, S\ Sound 3 Key Sound Sound 4 --Sound 5 c:increase Sn S n $ d Sound 2 01 (a) y,:decrease Selected sound Sound 3 0o D' t= X (t)-X(t) J |J J I (2) in which, D',(t)is the difference between the j-th index of the key sound and the i-th candidate sound at the t-th search. Then, the distances are ranked in ascending order. When the rank of the chosen sound is R'(t), then the weight correction is done as follows, Sound 4 Key Sound ) Sound 5 a (t) aj (t + 1) = RS(t) (3) aj(t) is the weight for the j-th index at the t-th search iteration. This procedure should be executed for all indexes until the distance between the chosen and key sounds becomes the smallest. Equation 3 implies that when a particular parameter of a chosen sound is ranked low, then the parameter is insignificant for the user, so the related weight should be scale downed. Oppositely when the rank of the particular parameter is high, then the related weight should be kept at a large value. Figure 4 illustrates the execution of weight adaptation. For the purpose of simplicity, in Figure 4, it is shown that each sound is indexed by 2 timbre parameters, that are x and y. In this example, 5 sounds are generated by the system after the key sound was given. Suppose that, the user chose Sound 2 as the most similar sound to the key sound. From Figure 3(a) it is clear that Sound 2 has the largest difference with the key sound in term of the parameter y, but because the user chose Sound 2, it should be closer to the key sound in the perceptual space of the user. The interpretation of the system with regard to the choice of the user is that for the given sound, the importance of the parameter y is low, so that the related weight ay should be decreased. The weights renewal will be executed until the distance of the chosen sound to the key-sound (b) x Figure 4. Adaptive Searching Mechanism 3. Experiments The proposed system was implemented on Windows 2000 machine (CPU:Pentium III: 600kHz). Preliminary experiments were done to test the efficiency of the proposed system in retrieving desirable sound data, with regard to the retrieval speed and the quality of the retrieved data. Because from users' point of views, there will no be distinction between the data that are located locally and the ones that are located in Internet, for this preliminary test, we regard that it will be sufficient to execute the test on the local database with about 1800 sounds (about 200 MB in size). The time needed for extracting the timbre parameters, and displaying 5 candidate sounds are 3 sec and 1 sec, respectively. Experiments were done with 10 users with no specific musical expertise or technical knowledge about the proposed system. Each of the users was asked to input a sound and extract a sound that is similar to the inputted sound according to the user's preference. Each user conducted 5 searches using 5 sounds as follows, Sound 1: whistle, Sound2: noise (breath out to the microphone), Sound3: human speech, Sound4: metallic bell, Sound5: crumpling a peace of paper. For comparison we also conducted experiments on database system with no preference weight. In Figures 5 and 6, "adaptive" refers to the proposed system and "fixed" refers to the system with non-adaptive weights. 448

Page  00000449 45 4 35 3 25 2 15 10 0 IIEI Sadaptive * fhed Figure 5. Evaluation on search repetitions Figure 5. Evaluation on search repetitions 4 35 3 25 15 1 05 0 O adaptive M ficed sound data. Unlike the conventional database systems that usually require the users to have prerequisite knowledge about the logical features of the data, our proposed system is free from such a requirement. The propose system is also equipped with an adaptive searching mechanism that automatically accommodate the user's preference in its searching strategy, leading to shorter searching time. The automatic sound generation function and the seamless integration of the rich collection of data in the Internet to our system contributes to the limitless enrichment of the database without having to bear the cost of data storage because only textual information in the form of URL and the index of the data are required for sound data that are located in the Internet. To realize rapid data search and efficient management, an effective data structure has to be considered. We also plan to open the system for Internet users in the near future. References [1] Blum, T., Keislar, D., Wheaton, J., and Wold, E.: "Audio Database with Content-Based Retrieval", Proc. IJCAI'1995, Workshop on Intelligence Multimedia Information Retrieval, 1995. [2] Keislar, D., Blum, T., Wheaton, J., and Wold, E.: "Audio Analysis for Content-Based Retrieval", Proc. ICMC1995, pp. 199-202, 1995. [3] Feiten, B., and Gunzel, S.: "Automatic Indexing of a Sound Database using Self-organizing Neural Nets", Computer Music Journal, Vol. 18, No. 3, pp. 53-65, 1994. [4] Qi, H., Muramatsu, T., and Hashimoto, S.: "Multimedia Environment for Sound Database System", Proc. ICMC1997, pp. 105-108, 1997. [5] Grey, J.M.: Multidimensional perceptual scaling of musical timbres, J.Acoustical Society of America, 61(5), pp.1270-1277, 1977 [6] Qi, H., and Hashimoto, S.: Global Sound Database on Internet, Proc. MMM2000, pp. 195-208, 2000. Figure 6. Satisfaction index Figure 5 shows the averages of searching iterations for the respective system. From this figure, it can be seen that the proposed database system has the least search iterations, except for sounds 3. Although we cannot conclude that the performance of the proposed system is superior to other systems with regards to all sounds, from this experiment it can be seen that the proposed system search the database efficiently. To evaluate the quality of the proposed system, we asked the user to rank the quality of the extracted sound in 5 steps, in which 1 indicates very poor and 5 indicates very good. The average of the rank is shown as index in Figure 6. Because the system is continuously searching new sounds posted in the Internet and at the same time modifying the local sounds, the number of sounds that are registered is approaching 20000, which is a great enrichment compared with the original 1800 sounds. 5 Conclusions and Future Works In this paper we have proposed a new sound database system that is able to accommodate the subjective preferences of the users in extracting 449