Machine Learning of Sound Attributes: Computer-Assistance in Concept Formation and Musical InventionSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact email@example.com to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 479 ï~~Machine Learning of Sound Attributes: Computer-Assistance in Concept Formation and Musical Invention Eduardo Reck Miranda Centre for Music Technology - Dept. of Music - University of Glasgow 14, University Gardens - Glasgow, G12 8QH- Scotland, UK Email: firstname.lastname@example.org Abstract In this paper I discuss the role of Inductive Machine Learning (IML) in systems intended to aid musical invention. I focus on the modelling of a particular aspect of human intelligence which is believed to play an important role in musical creativity: the Generalisation of Perceptual Attributes (GPA). I introduce the basic concepts and functioning of a particular IML algorithm in the context of a case study sound synthesis system: ARTIST. I conclude with suggestions for further work. Keyywords: Music and Artificial Intelligence (M&AI), Inductive Machine Learning (IML), Intelligent Sound Synthesis Systems. 1 Introduction In this paper I discuss the role of Inductive Machine Learning (IML) in systems intended to aid musical invention. I focus on the modelling of a particular aspect of human intelligence which is believed to play an important role in musical creativity: the Generalisation of Perceptual Attributes (GPA). By GPA I mean the process by which a listener tries to find common sound attributes when confronted with a series of sounds. In this paper I introduce the basics of GPA and IML in the context of ARTIST, a case study sound synthesis system [Miranda, 1994a], [Miranda, 1994b], [Miranda, 1994c], [Miranda, 1995], [Miranda et al., 1993] and [Smaill et al., 1994]. ARTIST is a system that works in co-operation with the user, providing useful levels of automated reasoning to render the synthesis tasks less laborious (tasks such as calculating an appropriate stream of synthesis parameters for each single sound) and to enable the user to explore alternatives when designing a certain sound. The system synthesises sounds according to user specification of values for sound attributes in a relatively high-level language, for instance normal vibrato, high openness and sharp attack. Each of these attribute-value expressions is implemented in the lower-level terms of a specific sound synthesis algorithm; the system currently works with formant synthesis. ARTIST stores information about "known" sounds as clusters of attribute-value expressions (e.g., in vibrato=normal, "vibrato" is a sound attribute and "normal" is a possible value for this attribute). The user may, however, be interested in producing a sound which is "unknown" to the system. The system will attempt to compute the attribute values for this yet unknown sound by making analogies with other known sounds which have similar constituents. To do this, ARTIST infers which sound attibutes should be considered to make the analogies; IML is therefore aimed here at the provision of this capability. 2 A Psychoacoustic Speculation It is believed that, when one listens to several distinct sound events, s/he tends to characterise them by selecting certain sound attributes which s/he thinks are important. When listening to several distinct sound ICMC Proceedings 1996 479 Miranda
Page 480 ï~~events, it seems that the human mind prioritises the selection of certain attributes which are more important in order to make distinctions among them [Miranda, 1994c]. If one carefully listens to a series of sound events, there will probably be a large number of possible intuitive generalizations. It is therefore essential to select those generalizations we believe to be appropriate. These depend upon several factors such as context, sound complexity, duration of events, sequence of exposure and repetition, which make a great variety of combinations possible. Humans, however, are able to make generalizations very quickly; perhaps because we never evaluate all the possibilities. We tend to limit our field of exploration and resort to some heuristic. I believe that this plays an important role in imagination and memory when creating sounds and composing with them. ARTIST has the ability to make generalizations in order to infer which attributes are "more distinctive" in a sound. The term "more distinctive" in this case does not necessarily refer to what humans would perceive to be the most distinctive attribute of a sound. Current IML techniques are not yet able to mimic all the types of heuristics used by humans. Nevertheless, I propose that one kind of heuristic might use information theory to make generalizations. The IML algorithms used in ARTIST thus use information theory to pursue this task. Once the generalizations have been learned, the user may use the descriptive rules to specify new sounds, different from those that were originally picked out as typical of the sounds that the system already "knows". 3 ARTIST's IML Engine ARTIST currently uses two IML algorithms: the ISCD (Induction of the Shortest Concept Description) and the IDT (Induction of Decision Tress) [Bratko, 1990] and [Miranda, 1994b]. This paper focuses entirely on the former. The ISCD algorithm aims at the induction of the shortest description/s, that is, the smallest set/s of attribute values of a sound, or class of sounds, which can differentiate it from the others in a training set. In ARTIST, the training set is automatically inferred from the system's knowledge base. An example ISCD rule, when looking for a description for a sound labelled "vowel", on the basis of some training examples, is as follows (consider that "normal vibrato" and "high openness" are terms of the sound description vocabulary): A sound event is vowel if it has normal vibrato and high openness. No matter how many attributes the vowel sound has in the training set, according to the above rule the "most relevant" attributes for this sound class are vibrato=normal and openness=high. The result of the learning is the description of sounds (or classes of sounds) in the form of ISCD rules. The format of a rule is as follows: SndClass = [Descrip(1), Descrip(2),... Descrip(n)]. where Descrip(n) are lists of attribute values in the form: [AttribName(1)=AttribValue(1), AttribName(2)=AttribName(3),..., AttribName(n)=AttribName(n)]. A sound description SndClass is interpreted as follows: (a) an individual sound matches with the description if it satisfies at least one of the Descrip(n) of a sound class Miranda 480 ICMC Proceedings 1996
Page 481 ï~~(b) an individual sound satisfies a Descrip(n) list of attribute values if all the attributes-value pairs in Descrip(n) are as for the sound in question. For instance, a rule for the sound class labelled "open vowel" could be: open vowel = [[vibrato=normal, resonators(formant)=vowel(a), sex=male], [vibrato=below normal, resonators(formant)=vowel(e), sex=female]]. The interpretation of the above rule is: A sound event is an open vowel if it has normal vibrato rate, its spectral envelope corresponds to the resonance of a vowelw/a/(as in the word "bat") and it is a male sound, or it has a vibrato rate lower than normal, its spectral envelope corresponds to the resonance of a vowel/e/(as in the word "blend") and it is a female sound. The algorithm uses the single trial induction technique to process the training examples; that is, all the input examples are processed at once. The main requirement here is that the constructed description of a sound exactly matches the examples belonging to the sound class. When a sound event matches a description, it is said that the description enfolds the sound event. Thus, the algorithm must construct a description for the given sound class, which enfolds all the examples of this sound class and no other examples. The algorithm works as follows (lists are represented in Prolog-like syntax): To enfold all the examples of SndClass in TrainingSet: 1. If no example in TrainingSet belongs to SndClass 2. Then 2.1. ClassDescrip =  (i.e., an empty list) 3. Else 3.1. ClassDescrip = [Descriplist I DescripLists] where DescripList and DescripLists are obtained as follows: 3.1.1. Construct a list DescripList of attribute values that enfold at least one example of the desired sound class and no other example of any other sound class 3.1.2. Remove from the TrainingSet all the examples covered by DescripList and enfold the remaining unfolded sound events by DescripLists Each DescripList list is incrementally constructed. Its construction process is highly combinatorial. Each time a new attribute-value condition is added, there are almost as many alternative candidates to be added as there are attribute-value pairs. It is not immediately clear which of them is preferable. In general, it would have to enfold all the examples of the sound class being learned with as few DescripList lists as possible. Learning is viewed here as a search among possible descriptions with the objective of minimizing the length of the concept description. The algorithm resorts to a heuristic scoring function because of the high combinatorial complexity of this search. At each point, only the best-estimated attribute is added to the list, immediately disregarding all other candidates. The search is reduced to a deterministic procedure, without any backtracking. The heuristic estimate is based upon the assumption that a useful Descriplist's element should discriminate well between ICMC Proceedings 1996 481 Miranda
Page 482 ï~~examples of the class being processed from the other examples. Thus, it should enfold as many positive examples of the sound class as possible and as few negative examples as possible. The heuristic score of the attribute value is the number of POSIT in MATCH, minus the number of OTHERS in MATCH: SCORE = IPOSITAMATCHI-IOTHERS^MATCHI POSIT is the set of positive examples of the sound class being learned, whereas OTHERS is the set of "negative" examples. MATCH represents the set of sound events which satisfy the attribute-value condition. 4 Conclusion and Further Work At the moment, the attribute-value pairs for sound description are specified manually. I plan to automatize this task by adding the support of a sub-symbolic level to the symbolic IML level of ARTIST. Neural networks technology is suitable for this task [Forrest et al., 1987]. I propose that a neural network based upon auditory modelling techniques has great potential for raising new paradigms for sound representation. In addition, this would enable the creation of a more perceptually-oriented tool for sound analysis and therefore facilitate the definition of sound descriptors for a sound. The sub-symbolic level would then be aimed at the identification of prominent classificatory features in input samples of sounds and provide ways of referring them using symbols to be processed at the symbolic IML level. 5 References [Bratko, 1990] Bratko, I. (1990), Prolog Programming for Artificial Intelligence, Addison Wesley (ISBN 0-201-41606-9). [Forrest et al., 1987] Forrest, B.M., Roweth, D., Stroud, N., Wallace, D.J. and Wilson, G.V. (1987), Neural Networks Models, Physics Department pre-print 87/419 (ECSP-TR-11), University of Edinburgh. [Miranda, 1994a] Miranda, E.R. (1994a), "From Symbols to Sound: Al-based Investigation of Sound Synthesis.", Contemporary Music Review, 10:2, pp. 211-232, Harwood Academic Publishers (ISBN 3-7186-5572-1). [Miranda, 1994b] Miranda, E.R. (1994b), "The Role of Artificial Intelligence in Computer-Aided Sound Composition", Journal of Electroacoustic Music, Vol. 8, Sonic Arts Network (ISSN 1355 -7726). [Miranda, 1994c] Miranda, E.R. (1994c), "Sound Design: An Artificial Intelligence Approach", PhD Thesis, University of Edinburgh. [Miranda, 1995] Miranda, E.R. (1995), "An Artificial Intelligence Approach to Sound Design", Computer Music Journal, 19:2, pp. 59-75, MIT Press (ISSN 0148-9267). [Miranda et al., 1993] Miranda, E.R., Smaill, A. and Nelson, P. (1993), "A Symbolic Approach for the Design of Intelligent Musical Synthesizers", Proceedings of the X Reunion Nacional de Inteligencia Artificial, Mexico City, SMIA. [Smaill et al., 1994] Smaill, A., Wiggins, G. and Miranda, E.R. (1994), "Music Representation - between the Musician and the Computer", Music Education: An Artificial Intelligence Approach, Smith et al. (Edls), Springer-Verlag (ISBN 3-540-19873-3, ISBN 0-387-19873-3). Miranda 482 ICMC Proceedings 1996