Page  38 ï~~Computer-Assisted Training Of Timbre Perception Skills Rene Quesnel Faculty of Music McGill University quesnel@music.mcgill.ca Abstract The paper describes the ongoing research and development of a computer-assisted learning environment for the training of listening skills related to the perception, evaluation, and control of timbre. Timbral ear training is a specialized form of ear training that aims at developing a listener's sensitivity to timbre differences, long-term memory for timbre references, and listening strategies used in the subjective evaluation of timbre in terms of objective parameters of sound. An initial version has been used at McGill University as the main training tool for a course in technical ear training. The current research is investigating ways of making the program adaptive to individual listeners, improving its diagnosis capabilities, and improving the quality of the interactions with the listener. 1 Introduction Timbre is generally considered to be one of the main perceptual attributes of sound together with pitch, loudness, and perceived duration. It is commonly accepted that the spectrum of a sound and its evolution over time are important factors determining its timbre. The control and evaluation of timbre are commonly performed by sound engineers and composers of electro-acoustic and computer music. In the recording studio, all stages in the audio chain (the path between the sound source and the listener that includes microphones, amplifier stages, loudspeakers) can modify and shape the spectral content of the source by adding resonances and distortions [Toole and Olive 1988, Olive 1990]. An important task of the sound engineer is to control this timbral shaping process. Electro-acoustic and computer music composition involves direct manipulation of timbre and the building of new sound qualities. In both cases, listening skills related to the detection, discrimination, identification, and evaluation of subjective timbre variations and their relationships with technical parameters of sound are involved. There is evidence that basic auditory perceptual skills (detection, discrimination, and identification) can be trained [Watson 1980]. Much less is known, however, about how to develop timbre perception skills and more generally, about the cognitive aspects of timbre perception. An extensive course in technical listening1 (Timbre Solfege) has been described by Miskiewicz [1992]. The course is mainly aimed at sound engineers to develop aural skills related to timbre identification and memorization, perception of relationships between timbre, pitch, and loudness, and auditory spaciousness. Timbre is categorized in terms of octave and 1/3-octave resonances, each category being identified by its center frequency, along with associations with vowel sounds and descriptive terms. 2 Computer-Assisted Timbral Ear Training A partial computer-assisted implementation of Timbre Solfege has been developed at McGill University [Quesnel and Woszczyk 1994] on the Macintosh platform. This initial version (Timbral Ear Trainer I) implements exercises aimed at developing memory for a basic set of timbre categories, sensitivity to small timbre differences, and listening strategies for the evaluation of timbre. In active matching tasks (Comparative Listening), specific timbres resulting from the superposition of resonances and antiresonances on sound sources must be duplicated by the listener in a process of comparatively listening to the timbral differences between two sounds and determining the parameters (center frequency, Q, and gain) underlying these differences. Alternatively, the applied resonances must be removed to restore the original timbre. In identification 1Defimed by Letowski [1985] as "the process of timbre evaluation in terms of both aesthetic impressions and identification of their physical correlates". Psychoacoustics, Perception 38 ICMC Proceedings 1994

Page  39 ï~~tasks (Absolute Identification), the listener must identify a particular timbre with a label (e.g. the center frequency value of a resonance or the vowel that can be associated with the resonance). The computer is used to manage the presentation of exercises, the evaluation of the answers, the collection of data, and the control of two Roland E660 digital parametric equalizers and a Yamaha DMP7D mixer through MIDI. Listening exercises are grouped into units of increasing complexity. The software system has been used in the Graduate Sound Recording Program at McGill University as the main training tool for a one-year course in timbral ear training. Timbral Ear Trainer I has limited knowledge about the nature of the listening tasks students have to perform. The program knows the final answer and compares its internal representation with the equalizer settings chosen by the student. It knows that a change in timbre may only occur if the gain in any one frequency band is different from zero. But it doesn't know much about useful listening strategies. In the case of a wrong answer, the program doesn't have the necessary knowledge to determine how the answer was arrived at. As a result, Timbral EarTrainer I is not adaptive and feedback is simple: differences between the computer's and the student's solutions are displayed both aurally and visually (through the display of the equalizer's control settings). 3 Timbral Ear Trainer II A second version of the program is being developed with the following objectives: 1) making the program adaptive to individual levels of performance, 2) improving the accuracy of the diagnosis process, and 3) improving the usefulness and sophistication of the interactions between the apprentice listener and the program. The remaining of the paper discusses some of the issues that must be addressed in order to meet these objectives. 3.1 Representations of Timbre The issue of representing timbre differences is central to the questions of diagnosis and feedback. With timbre, the question is not only how to represent it but what exactly to represent [Dannenberg 1993]. Timbre has been represented in a geometrical space as a set of perceptual dimensions (most often 2 or 3) usually including brightness (related to the spectral envelope) and attack quality [Grey 1977, Krumhansl 1989, Donnadieu et al. 19941. It has also been represented (somewhat ambiguously) with descriptive terms [e.g., Salmon 1950, von Bismarck 1974, Ethington and Punch 1994]. Timbre can also be represented in terms of control dimensions using, for example, pa rameter values used in sound synthesis [Ashley 1986]. In learning environments, the chosen dimensions or categories and the tools that are used to produce controlled timbre variations should bear a relationship with the domain in which the skills are to be used. In addition, multiple representations should be used to provide multiple views and perspectives, notably in remedial activities. In computer-assisted learning environments, novel representations can be devised in order to make tacit knowledge explicit and to make visible cognitive processes that are normally not directly observable. In timbral ear training, timbre variations can be produced by modifying the spectral envelope using parametric equalizers and filters. These tools are used by sound engineers and composers. They provide a means of linking subjective sound qualities and objective parameters of sound. They can also be used to categorize timbre into a reduced set of references that can be memorized and used as anchor points against which other timbres can be evaluated. Timbre differences can also be represented graphically as either differences in the spectral envelopes of the modifications applied to the sound source or as differences in the spectral envelopes of the modified sound sources. 3.2 Diagnosis and Evaluation An accurate diagnosis is an essential requirement of adaptive systems. The training of auditory skills present some unique challenges for the evaluation of answers. For example, in comparative listening exercises with multiple frequency, Q, and gain values, more than one answer can result in a timbre similar to the timbre of the question. Therefore, the student's answer could differ from the computer's and still be considered as acceptable. Computing average scores based on the final answers in comparative listening exercises does not provide an accurate assessment of a student's performance level. For example, a student solving a complex comparative listening exercise in 45 seconds will demonstrate greater listening abilities than the student who must struggle for 10 minutes to arrive at the same answer. Yet, final scores will be identical in the two cases. It is critical to avoid situations in which the program tells the student that the answer is right in a comparative listening exercise when there is in fact an audible difference. Perceptibility of resonances and jnd's in resonance frequency depend on the signal used, the frequency range, listening conditions, etc. In ICMC Proceedings 1994 39 Psychoacoustics, Perception

Page  40 ï~~this respect, neither the critical bandwidth formula proposed by Zwicker and Terhardt [1980], nor the ERB equation proposed by Moore and Glasberg [ 1989] can account for the variety of contexts that can be encountered in timbral ear training. Diagnosis requires at minimum a representation of the question and a corresponding representation of the student's answer. The evaluation can be made more precise by the knowledge of the solution path taken by the student. It can be further enhanced by the knowledge of the student's performance history. Many ITS systems attempt to maintain a dynamic, moment-by-moment model of the cognitive state of the learner. We know too little about the cognitive aspects of timbre perception at the moment to build such a model but there are other solutions. Adaptive training is possible using a clear representation of the problem's structure and an accurate picture of what the student is doing while solving the problem [Reusser 1993]. An additional source of diagnosis information can be found not only in what the student does but also how and when it is done. Timing of interactions can help gather information to detect confusion and hesitation [Fox 1988]. Diagnosis process can benefit as well from the detection of noise in the learner's actions (fatigue, lack of attention). 3.3 Timbre Perception Skills One of the most complex target skill in Timbral Ear Trainer II is the ability to evaluate timbre differences by characterizing spectral envelopes (e.g., specifying spectral bandwidth and identifying resonances and antiresonances in terms of center frequency, width, and magnitude). This skill involves a rich set of listening strategies (problem-solving techniques), lower-level subskills (detection, discrimination, and identification of variations in single parameters, memorization of timbre references), and declarative knowledge about timbre, masking, loudness, etc. This diversity calls for a variety of teaching methods. 3.4 Teaching Methods Timbral Ear Trainer II is a hybrid system implementing different teaching strategies depending on the skill being trained. For example, the memorization of an initial limited set of timbre references requires extensive practice in which the manipulations performed by the learner are limited to selecting a center frequency value or a corresponding vowel sound. "Smart" drill-and-practice methods using multiple adaptive pools of questions are appropriate for this type of task. On the other hand, experience gained from using Timbral EarTrainer I has revealed that students can easily become frustrated and confused when using wrong solution paths in complex comparative listening tasks. Students tend to pursue nonsuccessful paths until they have to retreat and start over again without the understanding of what was wrong with their strategy. Students often lack alternative solution strategies. Guided practice is therefore more appropriate for these tasks. Some aspects of the cognitive apprenticeship approach [Collins 1988] offers an appealing teaching model to implement in timbral ear training. Cognitive apprenticeship applies teaching methods used in traditional apprenticeship such as demonstrating / observing, coaching, and fading, to the learning of cognitive skills. Scaffolding and fading refer to the process by which help and assistance provided to the learner are progressively removed as the learner acquires expertise. This allows the learner to work on interesting, meaningful, and situated problems at the early stages of the training. 3.5 Research Module Very little is known about timbre memory and learning [Crowder 1989, 1993]. A research module is therefore being built into the system to investigate what are the strategies used by listeners for the memorization of timbre references and the evaluation of complex timbre differences. As the listener works towards the solution to a problem, a trace of his or her problem solving steps is recorded. At the finegrain level, the trace contains the sequence of events produced by the listener as s/he interacts with the program. Work is being done to parse the trace in order to group these events into higher-level actions which can then be grouped into patterns of actions representing strategies. The initial implementation of Timbral Ear Trainer II is based on a set of assumptions about the skills and subskills involved and the learning and teaching strategies to use. The research module will be used to articulate and test various alternative implementation choices [Breuleux and Quesnel 1994]. 4 Summary This paper has outlined work being done to develop a computer-assisted environment for the training of aural skills involved in timbre perception and evaluation. Some of the critcial issues involved have been discussed. It is hoped that the combination of a research tool and a learning environment will allow both a better understanding of the skills and cognitive processes involved, and the development of better learning tools and teaching methods. Psychoacoustics, Perception 40 ICMC Proceedings 1994

Page  41 ï~~Acknowledgments This research was made possible in part by a grant from the Social Sciences and Humanities Council of Canada. References [Ashley, 1986] Richard D. Ashley. A KnowledgeBased Approach To Assistance In Timbral Design. Proceedings of the International Computer Music Conference. San Francisco: Computer Music Association, pp. 11-16, 1986. [Breuleux and Quesnel 1994] A. Breuleux and R. Quesnel. A Computer-Based Learning Environment For Investigating Skills, Learning, And Teaching In Technical Listening. Proceedings of ED-MEDIA '94, Vancouver, June 1994. [Collins 1988] A. Collins. Cognitive Apprenticeship And Instructional Technology. BBN Systems and Technologies Corporation Report No. 6899, 1988. [Crowder 1989]. R.G. Crowder. Imagery for musical timbre. Journal of Experimental Psychology: Human Perception and Performance, 15: pp. 472 -478, 1989. [Crowder 1993]. R.G. Crowder. Auditory memory. In S. McAdams and E. Bigand (Eds.), Thinking in Sound: The Cognitive Psychology Of Human Audition.. Oxford: Oxford University Press, pp. 113-145, 1993. [Dannenberg 1993] Roger B. Dannenberg. Music representation issues, techniques, and systems. Computer Music Journal, 17(3): pp. 20-30, 1993. [Donnadieu et al. 1994] S. Donnadieu, S. McAdams and S. Winsberg. Caractdrisation Du Timbre Des Sons Complexes. I: Analyse Multidimensionnelle. Actes du 3e Congr.s Francais d'Acoustique, Toulouse, France, May 1994. [Fox 1988] Barbara A. Fox. Interaction As A Diagnostic Resource In Tutoring. Tech. Report No. 88-3. Boulder, Colorado: University of Colorado, 1988. [Grey 1977] John M. Grey. Multidimensional perceptual scaling of musical timbres. J. Acoust. Soc. Amer., 61: pp. 1270-1277, 1977. [Krumhansl, C.L. 1989] C.L. Krumhansl. Why is musical timbre so hard to understand? In S. Nielzen and O. Olsson (Eds.), Structure And Perception Of Electroacoustic Sound And Music. Amsterdam: Elsevier, pp. 43-53, 1989. [Letowski 1985] T. Letowski. Development of technical skills: Timbre solfeggio. J. Audio Eng. Soc. 33: pp. 240-243, 1985. [Miskiewicz 1992] Andrzej Miskiewicz. Timbre solfege: A course in technical listening for sound engineers. J. Audio Eng. Soc. 40: pp. 621-625, 1992. [Moore and Glasberg 1989] Brian C. J. Moore and Brian R. Glasberg. Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J. Acoust. Soc. Am. 74(3): pp. 750-753, 1983. [Olive 1990] Sean E. Olive. The Preservation of Timbre: Microphones, Loudspeakers, Sound Sources and Acoustical Spaces. Paper presented at the 8th International AES Conference, Washington D.C., 1990. [Quesnel and Woszczyk] Rent Quesnel and Wieslaw Woszczyk. A Computer-Aided System For Timbral Ear Training. Paper presented at the 96th AES Convention, Amsterdam, February 26 - March 1, 1994. Preprint 3856. [Reusser, 1993] Kurt Reusser. Tutoring systems and pedagogical theory: Representational tools for understanding, planning, and reflection in problem solving. In S.P. Lajoie and S.J. Derry (Eds.), Computers as cognitive tools. Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 143-177, 1993. [Toole and Olive 1988] Floyd E. Toole and Sean E. Olive. The modification of timbre by resonances: Perception and measurements. J. Audio Eng. Soc. 36: pp. 122-142, 1988 [Watson 1990] Charles S. Watson. Time course of auditory perceptual learning. Ann. Otol. Rhinol. Laryngol. 89 (Suppl. 74): pp. 96-102, 1980. [Zwicker and Terhardt 1980] E. Zwicker and E. Terhardt. Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am. 68(5): pp. 1523 -1525, 1980. ICMC Proceedings 1994 41 Psychoacoustics, Perception