Page  398 ï~~Organizing the parameter space of physical models with sound feature maps Bernhard Feiten Gerhard Behles Technische Universitit Berlin Inst. f. Kommunikationswissenschaft Elektronisches Studio Tel +49 30 314 25699 FAX: +49 30 314 21143 E-Mail: Bernhard.Feiten, Abstract Timbre space can be described with self-organizing Sound Feature Maps. These maps can be used to control the parameters in a perception based way. The method is examined by an application on a physical model instrument. 1. Introduction The majority of synthesis models suffer from their poor correspondence between-the synthesis parameters and the perceived quality of the sound. A solution for this problem is to add a perception-based description of the sound to its entity. For that reason an automatic feature extraction as well as a kind of "artificial perception" is required. The proposed method produces Sound Feature Maps that can be used to organize the multidimensional timbre space on a two dimensional map according to the similarity of the sounds under consideration. The problem can be devided into two parts. First, one must find a distance measure for sounds which has a good correlation with the perception of the desired feature, e.g. similarity. This can be obtained by modelling properties of the peripheral auditory system combined with an adequate metric. The specific loudness, in sones, of the partial tones of steady state sounds in combination with the Minkowski measure with p=5 (p=2 is the Euclidian distance) showed the best correlation with perceived similarity [Feiten,Gtinzel, 1993b]. For arbitrary sounds, a barkscale filterbank is used. This filterbank reduces the incoming signal to the loudness criticalband rate time pattern (Sones/Bark/sec) [Feiten and Guinzel, 1993a]. Secondly one must find an algorithm to organize and classify the sounds with reference to the revealed features. The Kohonen Feature Map (KFM) algorithm realizes a mapping of an input space onto a topology-preserving feature map in an unsupervised, self-organizing learning process [Kohonen, 1984]. Applied on the features detected by the preprocessing stage, the topology of the input sound space is transformed to the dimensions of the KFM. Similar sounds are mapped to neighboring elements in the KFM. The result is a sound feature map (SFM) which can be used as a retrieval index for sound archives or to control a synthesizer. The user can retrieve sounds by relating the desired sound to the structure of the SFM. For dynamic sounds, the KFM has been extended to include hierarchical maps. [Feiten and Gunzel, 1993a, forthcoming]. The aim of the present research is to examine the usefulness of a SFM to control the timbre of a synthesizer interactively. Physical Modeling (PM) promises rich control facilities of the timbre, but for the reason of nonlinearity and chaotic behaviour the variation of the parameters often leads to an unexpected reaction. The mapping of the parameter space of the physical model on a Sound Feature Map allows both the intuitive and rich sound control and an enhanced handling of the nonlinear system. Recently Xavier Rodet [1993] examined the timedelayed Chua's Circuit for sound synthesis with a nonlinear dynamic approach with Perry Cooks [ 1992] brass wind instrument model as example. We used this simple synthesis model to study the behaviour of the SFM algorithm in connection with nonlinear synthesis models. As the aim is to use the SFM to influence the sound extensivly during playing only a Steady-StateSFM is needed. In the training phase, the sounds are generated with randomly generated prameter sets, preprocessed and mapped to a SFM with a relativly large number of cells. After training, the map can be used as a controller for the PM instrument by moving the cursor on the screen in a rectangle corresponding to the SFM. The PM parameters associated with the coordinates are communicated via shared memory to a realtime synthesis process running on a SGI Indy. 2. PM instrument and random source Physical models provide an adaquate way to control the synthesis process. Modeling the nonlinear properties of real instruments leads to a sensitive and rich behaviour of the synthesis. Nonlinear systems with feedback exhibit a tendency to instability. The conditions for stability, oscillation and dynamic behaviour are difficult to describe analytically. Rodet [1993] proposed different approaches to find stable solutions for simple Sound SynthesisTechniques 398 ICMC Proceedings 1994

Page  399 ï~~physical models. One instrument he examined is the simple physical brass wind instrument model proposed by [Cook, 1992] shown in Fig. 1. PR F Q x[n' Fig. 1: Simple physical brass wind instrument model [Cook, 1992] The method proposed in this paper can be used as a general algorithm to examine the parameter space of physical models. The algorithm evaluates a random selection of parameter sets for applicability.. With a wide spreaded random search usefull regions in the parameter space are detected. Small spreaded random values are used to explore these regions. Here a parameter set for the above mentioned brass model was judged according to the following two conditions: the number of meanvalue crossings has to exceed a pitch-dependent threshold and the amplitude must lie within a specified range. 3 Preprocessing Since the SFM described in this experiment only qualifies stationary aspects of sounds, temporal psychoacoustic phenomena such as temporal masking need not be considered in the preprocessing stage. The barkscale filterbank is realized with FFT for reasons of efficiency. Based on a sampling frequency of 48 kHz a transformation size of 1024 points is chosen. The resulting bandwidth corresponds approximately to the critical bandwidth for low frequencies. The Kaiser window function is chosen to minimize spectral leakage. The DFT power density spectrum is multiplied with the transmission factor of the outer ear for the diffuse field condition. Critical bands intensities are estimated by summing up the energy in the corresponding filter channels in a range of one Bark. The spreading of the critical-band level on the basilar membrane is calculated by linear approximation of the masking threshold on the barkscale. Finally, the specific loudness for each critical band is calculated [Feiten and Giinzel, 1993a]. In this manner, the preprocessing or feature extraction takes 1024 samples from the PM source and calculates a bark spectrum with 25 points of the specific loudness. This bark-spectrum is taken as input vector for the KFM algorithm. 4 Kohonen Feature Map algorithm The KFM algorithm is a neural net that both accomplishes a vector quantization of a ndimensional input vector space U and preserves the topology of the input space over the net [Kohonen, 1984]. The map is a matrix consisting of l*m elements of U, directly identifying any input signal pattern with the most similar map-element. Therefore, the map-elements might be regarded as "codebook vectors" for the classes of U. The algorithm starts with a random initialisation of the map. Each map element represents a randomly chosen n-dimensional input space vector. In the following training process, the map has to learn the topology of the input space. The training process consists of c>> l*m cycles. In every cycle t= 1....c the map is adapted to a randomly (as described above) generated training vector v of U according to the following rules: First, the map element most similar to v is determined. This element is called "bestmatch". Second, the bestmatch vector and the map vectors w in its vicinity are aligned toward the training vector: +1 -w t + eh(v - w),i,-..n (1) Third, the PM-parameters which produced the training vector are stored in the bestmatch cell. The extent of the alignment depends on Â~ and h. E is a monotonically decreasing function of t (e.g. with some exponential decrease), while h is a function which decreases with rising distance from the best-matching element. Consequently, the alignment is strong for map-elements near the bestmatch and at the beginning of the training process, while it is small for map-elements in regions far from the bestmatch and at the end of the process. As a result, the map represents the similarity relations of the input space. As another result, the KFM represents the statistical distribution of the training vectors. Regions of U that involved many trainigs vectors, are represented larger and in better resolution than regions of rare signals [Feiten and Gunzel, 1993b]. 5 Results In the experiment described here, two different kind of maps were used. In both cases the map size was l = m = 100. The number of learning steps was taken to be 250.000. The pitch was kept constant during training. The first map was trained with normalised synthesis parameter values - the second with the specific loudness vectors. Each of the 1 0.000 map elements stores the normailzed PM parameters or the specific loudness representation and the PM parameters of the sound. ICMC Proceedings 1994 399 Sound Synthesis Techniques

Page  400 ï~~In the first case in addition to the synthesis parameters each element in the KFM would hold a boolean value indicating stability. An analysis of the vector distances ploted over the distance within the twodimensional map shows that the average vector distance is growing with the distance in the map (Fi. 2). This indicates that the map is well trained. Figure 2: Course of the maximum, average and minimum vector distances within the PM parameter map Figure 3 shows synthesis parameter values and validity for the mapped PM parameters. The random parameter settings used for the SFM were derived from synthesis settings marked stable in the KFM. The same analysis of the resulting SFM proofs that this map is well trained too in regard to the loudness vector distances between cells (Fig 4). 1." Examining the feature-map reveals the timevariant behaviour of the physical model. Although certain areas in the map can be easily associated with specific sound qualities and playing techniques, the character of an individual cell may vary considerably depending on the previous state of the system, i.e. on the path through the feature map that was taken to get there. Nonetheless the map supplies a selection of a great number of stable states and stable transitions between them, supporting the search for usefull sounds and gestures and providing a fair amount of surprise. References [Cook, 1992] Perry Cook. A Meta-Wind-Instrument Physical Model. Proc. ICMC 92, San Jose. pp273-276. [Feiten and Guinzel, 1993a] Bernhard Feiten, Stefan Giinzel. Sound Feature Maps based on Selforganizing Neural Nets. 94 th Cony. of the AES, Berlin, Fl-10, April 93. [Feiten and Giinzel, 1993b] Bernhard Feiten, Stefan Ginzel. Distance Measure for the Organisation of Sounds. Acustia. Vol. 78. No. 3 pp. 181 - 184. April 1993. [Feiten and Giinzel, forthcoming] Bernhard Feiten, Stefan Gunzel. Automatic Indexing of a Sound Database using Self-Organizing Neural Nets. Computer Music Journal. forthcoming. [Kohonen, 1984] Teuvo Kohonen. Self-Organisation and Associative Memory. Springer Verlag, Berlin. [Rodet, 1993] Xavier Rodet. Flexible Yet Controllable Physical Models: A Nonlinear Dynamics Approach. Proc. ICMC 93, Tokio,pp.48-55. I 6 30 40 o o too 1o ts 40 Fig. 4: Course of the maximum, average and minimum vector distances within the SFM Figure 5 displays the PM parameters as organised by the SFM. The usefulness of the SFM as timbre controller on the one hand requires regions on the SFM where the change in timbre is perceived as continuous. On the other hand the sounds produced by the map should give a rich and in the sense of perception equal distributed canvas of different timbres. The SFM reflects the statistical properties of the random sound source. The created brass wind instrument SFM does not satisfy the demand for equal distribution as large regions contain very simliar sounds. Sound SynthesisTechniques 400 ICMC Proceedings 1994

Page  401 ï~~a n 0 0 a 0 0 M Y J Q Q aon oor w-" a.. soooa eo ntin-o" V w i! o " " o " " " m P " h " h S P 1J Y w w.. w PN 00 0. L 4 0. *O0 a) C s) IZ3 0 Q1 a) a) V I0. oooossosooo s moseeve"o onononomr)