Page  00000001 FUZZY RULES IN COMPUTER-ASSISTED MUSIC INTERPRETATION Tatiana Kiseliova University of Dortmund Department of Computer Science, D-44221 Dortmund Tatiana.Kiseliova Harro Kiendl University of Dortmund Department of Electrical Engineering D-44221 Dortmund kiendl Yves Rambinintsoa University of Dortmund Department of Electrical Engineering D-44221 Dortmund Yves.Rambinintsoa ABSTRACT In this paper we describe fuzzy rules used in the developed prototype of a "fuzzy music interpretation system" [4]. The core of this system consists of two essential units, the rule base and the inference machine. The rule base contains general IF-THEN interpretation rules, formulated by an experienced pianist. The inference machine contains both conventional and advanced fuzzy information processing strategies. Once the system is fed with the information-the notes and special signs such as "ppp" and "legato", coded in accordance with the MIDI format-contained in the score of Beethoven's "Fiir Elise", it generates an interpretation of this piece of music and renders it in the form of a MIDI file. Certain refinement parameters allow us to modify the character of the interpretation. 1. INTRODUCTION It is clear that many different kinds of music have great commercial importance. Therefore, it is understandable that from the beginning, much effort has been expended to exploit the potential of current technologies for producing music. To date, we have been able to save, copy, and render music with control of loudness and tone. Moreover, electronic musical instruments, equipped with MIDI (Musical Instruments Data Interface), allow us to modify tempo and the timbre of the tones and to mix different sound tracks. Our expertise lies in the field of fuzzy control, especially in the development and industrial application of fuzzy methods to model the behavior of human process operators by IF-THEN rules [1-3]. Clearly, a human process operator who controls a chemical reactor based on experience and skill is not the same as a pianist who interprets a piece of music based on experience and skill. However, there are possibly some similarities, so that the fuzzy strategies that have been successfully applied in the field of industrial engineering might also be useful for music interpretation. In particular, we are interested in the potential of the advanced fuzzy strategies that we have developed and published, but which have not yet been exploited for music interpretation purposes. This is the motivation of our work. To limit the complexity of the problem, we focus on piano music. In this case, we can manipulate only the volume and the timing of the beginning and ending of each note, not its timbre. 2. MECHANICAL INTERPRETATION We start with the score of a piece of music, here Beethoven's "Fir Elise". The information contained in the score is encoded and divided into two sections. The first refers to the plain notes and specifies the pitches (e.g., E, D, A), and the time of the start and the duration of each note. The second section, derived from additional notations in the score, supplies additional information concerning tempo, note ties, legato, nonlegato, staccato, volume and pedal. To generate the mechanical interpretation the first and the second sections of the information are considered and superimposed with default values for tempo, volume and pedal. The resulting interpretation sounds very monotonous. 3. FUZZY INTERPRETATION Our idea is to manipulate the above mechanical interpretation by applying fuzzy rules, so that the outcome corresponds more to a human pianist's interpretation. The following analogy may make our underlying motivation clearer. Two basics are essential for speaking a language. The first is linguistic instinct, which develops unconsciously by "hearing". The second is application of rules, strict ones such as those of grammar, and loose ones such as recommendations concerning style. These basics complement one another and can partly substitute for one another: Once we have developed a certain linguistic instinct, based on numerous heard examples, we may suddenly become aware that these examples follow certain rules that can be put in words. Conversely, after learning and applying a rule, our linguistic instinct usually is able to supply the message of the rule much more quickly and yet mostly reliably. We postulate that in the same way musical instinct and application of interpretation rules - which depend on the desired style of interpretation - are essential basics for the human art of interpreting a piece at the piano and can partly substitute for one another. As well,

Page  00000002 we realize that fuzzy technology - invented by L.A. Zadeh - has often proved to be useful in technical applications for the automatic processing of knowledge that can be laid down in the form of rules [1]. This reasoning motivates us to strive for further improvements of the mechanical interpretation by modifying this interpretation by fuzzy processing of a set of interpretation rules, supplied by musicians to characterize their styles of playing. As this approach considers only one of the above two basics, some refinement parameters are provided that allow a subsequent on-line refinement to the listener's taste. 4. RULE BASE Following this idea, we have consulted music experts and the music literature and established a rule base, consisting of 150 qualitative rules. They describe recommendations concerning the interpretation of classical piano music in forms such as: IF < the position of a short right-hand note is close before a change from "major" to "minor" time value of note / grace note - absolute / relative tempo - note ties - note position relative to repeated part / bar end of music piece beginning and ending of theme time of harmony change - number of notes per second / bar - change of time value of adjacent notes ascending / descending note sequence absolute / relative volume status / change pedal Figure 1. Input variables of fuzzy interpreter tempo right hand duration of note right hand tempo left hand duration of note left hand volume right hand volume left hand beginning of pedal ending of pedal THEN < reduce tempo considerably > (1) These rules have the general form IF <condition> THEN < recommendation > (2) and describe recommended modifications of the improved mechanical interpretation. The input part (premise) of each rule is a condition that refers to certain input variables (Figure 1). Some of these are explicitly visible in the score and others are context-dependent features or variables that are not explicitly visible in the score. The output part (conclusion) of each rule recommends modifications of the value of one of the eight output variables shown in Figure 2. As each of the premises considers not more than three input variables and as the rule base is grouped in eight sub-sets, each consisting of about only 20 rules referring to the same output variable, the rule base is transparent and therefore understandable. Notice that we do not provide "special rules" such as "note 87 must be played piano", that refer to an individual note or bar. This would result in a nontransparent rule set consisting of many highly specialized rules, with each one applicable only once, in one piece. Instead, we use general rules. Each of these influences the interpretation globally; i.e., with respect to many notes or bars (of the considered composition, and also of other compositions in related styles). That way we keep the rule base small and transparent. Only such rules can reflect an underlying idea and deserve to be called "rules" in the usual meaning of the word. Figure 2. Output variables of fuzzy interpreter 5. MEMBERSHIP FUNCTION To process the rules, we must specify what is meant quantitatively by qualitative linguistic values such as short, long, middle or a little. For this, we use the concept of the fuzzy membership function [1-3]. This allows us to express to what degree tkk, with 0 ~ </k ~ 1, the premise of rule Rk is met. The value /-tk (called the degree of activation of rule k) determines to what degree the recommendation of the rule k is taken into account in the superposition of the recommendations of all activated rules. To model each linguistic input value, we use up to 10 membership functions in the form of equally distributed overlapping triangles or trapezoids of the same shape. To model the linguistic output values, we use equally distributed singletons. In principle, shapes and distributions of the membership functions could be modified interactively for refinement purposes, However, as this refinement option requires adjustment of a large number of parameters and does not work online, we prefer to use the refinement parameters of the postprocessing and superposition units, that are not discussed in this paper.

Page  00000003 6. RULE PROCESSING BY A MAMDANI FUZZY SYSTEM Our fuzzy interpreter is built in the form of eight parallel systems, each processing the rules for one of the eight output variables. Most frequently, Mamdani fuzzy systems are used to process rules for a given output variable [1-3]. The underlying principle is that each "activated rule" (the premise of which is met more or less in given situation) generates an output membership function. It defines the degree to which all possible values of the output variable are "recommended" in the given situation. By superimposing the recommendations of all activated rules, the overall output membership function is generated. The task of the subsequent defuzzification unit is to select the "best-supported" recommendation. Most frequently, one of the following strategies is applied for this. Center of gravity (COG) defuzzification selects the compromise that is "bestsupported" by the recommendations of the overall output membership function. Maximum (MAX) defuzzification selects that output value for which the output membership function takes its maximum. Which choice is more appropriate depends on the application. Such Mamdani fuzzy systems are well established. However, for this application, we found that these conventional defuzzification strategies do not allow the processing of qualitative knowledge as transparently as desired. This observation induced the development of the torque (TOR) defuzzification strategy [2] described below. Additionally we found that a recently developed expansion of the Mamdani fuzzy system, the two-way fuzzy system, which allows us to process not only the usual "positive" (recommending) but also "negative" (prohibiting or vetoing) rules is also useful in this application [1, 2]. 7. TOR DEFUZZIFICATION Let the output membership function referring to an output value u, produced by all activated rules, be given by r singletons, {uiuli}, i = 1, 2,..., r, where ui is the position and pi the activation of the ith singleton. The music interpretation problem shows that both defuzzifications the MAX and COG usually used in fuzzy control, have an essential limitation. Let the fully activated rule R1 produce the output singleton {11 = 1, u1 = 2} that recommends to the degree IA = 1, a medium (ul = 2) increase of the volume for all bars of the theme. Let the partly activated rule R2 produce the output membership function {pu = 0.8, u2 = 1}, which recommends, to the degree 0.8, a small (u2 = 1) increase of the volume at the beginning of all bars (Figure 3). Then it is obvious that if both rules are activated simultaneously, i. e. for all notes at the beginning of the bars of the theme the resulting increase of volume should be greater than medium. More generally, here we require a defuzzification strategy that superimposes equidirectional recommendations of the individual rules to amplify each other. Neither the MAX nor the COG defuzzifications have this property. These considerations suggested the introduction of the TOR (torque) defuzzification: r UTOR P ii i-1 where p is a scaling factor [1, 2]. Figure 3 illustrates that this defuzzification has this desired property. (For p 1, the resulting output value uTOR corresponds to the torque induced by the masses pi considering their positive or negative distances from the neutral point u = 0.) This is the reason we use the TOR 8. USE OF NEGATIVE RULES According to Equation (2), the rules discussed so far are positive rules that produce recommendations. We process these rules as usual by a conventional Mamdani fuzzy system. More transparency for the processing of qualitative knowledge is obtained if negative rules in the form ofdefuzzification method predominantly in this application. P1 --3 -2 -1 1 2 t3 uCOG UTOR UMAX Figure 3. Defuzzification of an output membership function representing equidirectional recommendations of two rules. IF < condition > THEN < warning / veto > (3) are also provided. These can be processed together with the positive rules by a two-way fuzzy system with hyperinference [2]. The use of such negative rules provides more transparency in processing qualitative knowledgeFor instance, suppose that in the interactive process of refining the interpretation we have designed a set of 20 positive rules that recommend certain situationdependent modifications of the volume. Let us further assume that we want the volume variation not to be too large for notes that belong to the theme. If we provide positive rules only, we can realize this in principle by adding the condition "AND note does not belong to the theme" to the premises of all 20 positive rules, and set up additional positive rules for notes belonging to the theme. However, instead of this costly and nontransparent procedure, we can leave the original 20

Page  00000004 positive rules untouched and add only one single negative rule: IF < note belongs to the theme > THEN < u = large is FORBIDDEN >. (4) If this rule is fully activated, its output membership function is given by /u (u) = 'large (u), where rge (u) is the membership function that models the linguistic value large (Figure 4). Let /u (u) be the output membership function (in the form of singletons) produced by all activated positive rules. Then, fuzzy veto hyperinference produces from U +(u) and u/- (u) a membership function /U(u) given by pu(u) = - (u) A --(u)[1, 2]. Choosing the bounded difference for the operator "A", we obtain a resulting output 1u(u) where singletons {Ui, ui of U (u) with large values of | ui are suppressed or diminished as shown in Figure 5. Applying the TOR defuzzification, the volume variations for notes in the theme are reduced compared with a situation where we have the same output,rf(u) p/A 4 (u) 9. CONCLUSIONS In this paper we present some results of our investigation concerning application of fuzzy strategies to modeling the manner in which a musician interprets a piece of music. For Beethoven's "Fur Elise", we set up a fuzzy system comprising 150 interpretation rules. Its input is the information contained in the score of a piece of piano music; its output is a MIDI file that represents an interpretation of this music. The level of interpretation reached so far is still well below the standard of an experienced pianist, but well above a merely mechanical rendering of the score. We consider the level reached to date to be sufficient for generating "trivial" music. Moreover, we see potential for further improvements by refining the rule base. We designed the rules considering mainly "Fur Elise". However, the application of these rules to Mozart's "Rondo Alla Turca" also produces an acceptable interpretation. This suggests that it should be possible to set up labeled rules, where the label indicates whether the rule is very general and can be applied to many pieces of music or whether it is a more special rule that may be applied - depending on the piece of music and on the listener's taste - to realize special musical intentions. 1. REFERENCES [1] Kiendl, H. (1997): Fuzzy Control methodenorientiert. Oldenbourg-Verlag, Miinchen. [2] Kiendl, H. (1999): Decision Analysis by Advanced Fuzzy Systems. In: J. Kacprzyk, L. Zadeh, editors, Computing with Words in Information/Intelligent Systems, PhysicaVerlag, Heidelberg, pp. 223-242. [3] Kiendl, H., Krause, P., Schauten, D. Slawinski, T. (2003): Data-Based Fuzzy Modeling for Complex Applications. In: Schwefel, H., Wegener, I, Weinert, K., editors, Advances in Computational Intelligence, Springer-Verlag Berlin Heidelberg. [4] Kiendl, H., Kiseliova, T., Rambinintsoa, Y., Kreuzer, A. (2004): MP3 files of interpretations of "Fur Elise" Figure 4. Output membership functions ju+(u) and 2- (u) produced by positive rules and one negative rule, respectively (left). 1it, u(u) I I -1 1 Figure 5. Membership function 1u(u) resulting from processing /u+ (u) and u- (u) by a fuzzy veto hyperinference (right). The application of the TOR defuzzification to pu(u) produces a smaller output value compared with applying it to /j + (u). / (u) of the positive rules and no activation of the negative rule. What we wish to stress here is that the use of negative rules supplies a much more transparent means of modifying the interpretation than if we are restricted to the use of positive rules only.