Page  00000013 RHYTHM INTERACTION SYSTEM BASED ON RHYTHM ASSOCIATION Shunichi Kasahara Waseda University Major in Pure and Applied Physics Tomoyuki Yamaguchi Waseda University Department of Applied Physics Ryo Saegusa The Italian Institute of Technology Robotics, Brain and Cognitive Sciences Department Shuji Hashimoto Waseda University Department of Applied Physics ABSTRACT Musical sessions are often co-created through the interaction among plural players. We developed a music interaction system between human and machine rhythm performance, which is composed of rhythm maps with an association mechanism. To achieve musical interaction, a many-to-many association should be considered, because a preferable rhythm performance with partner's rhythms is not unique. The proposed association mechanism tuned by actual human performances allows the system to generate various rhythms that fit a partner's rhythm. We adopted a 3D acceleration sensor as an interface for human performance. We observed interactions with a human and virtual player to evaluate the effectiveness of the proposed system. 1. INTRODUCTION A music session is often co-created through the interaction among multiple players. Recently, many systems for musical sessions between humans and machines have been proposed in order to achieve a musical performance, accompaniment, or improvisation session. Raphael [1],[2] proposed a real-time accompaniment system for a human performance. Goto et al. [3] achieved a Jazz improvisation session among a human and multiple virtual players, and Hamanaka et al. [4] proposed a Jam session system with virtual players which can imitate the personalities of various human players. We proposed a piano session system which can exchange initiative among human and computers [5]. However, these systems require the human to possess skills to play specific instruments. Playing rhythms is the simplest and most essential performance as musical expression. In rhythm interaction, players often generate their rhythm based on association with other players. Nishijima et al. [6] proposed a NeuroDrummer that generates rhythms as a reaction to the human player performance. This system uses neural networks to make a one-to-one mapping among the rhythms. However in real sessions, interaction requires not only a one-to-one rhythm association but also a many-to-many association. We propose a rhythm interaction system based on a rhythm association which can realize a many-to-many relationship among players. The proposed system is composed of the associated rhythm maps and user interface. The rhythm association is realized by the connection between rhythm Human Player Virtual Player Acceleration Us data...User....... - Performance Machine Performance Listening Rhythm Association System Sound Sound output Sound utput Figure 1. Overview of interaction system maps. Each rhythm map and the corresponding connection are trained according to actual human performances [7]. The many-to-many relationship represented by this association system resulted in a more suitable combination of plural players. On the other hand, in actual rhythm performance, there are limitations of action and space due to the physical structure of the instruments. Hence, we have proposed a 3D acceleration sensor as an unrestricted interface device for music control [8]. We also employ a commercially available 3D acceleration sensor for rhythm manipulation in the proposed system. 2. PROPOSED RHYTHM INTERACTION SYSTEM 2.1. System Overview The overview of our interaction system is shown in the Fig. 1. Humans can generate rhythms (Human performance) by shaking the 3D acceleration sensor. The system recognizes the rhythm played by human, and then generates rhythms (Machine Performance) based on the rhythm association. The rhythm association system is realized utilizing two rhythm maps and the map associator trained based on performances by experienced players in advance. 2.2. Rhythm Similarity In this study, the rhythm at each bar is represented as rhythm vector V (v1, v2,...V16), (0 < vi < 127), which is composed of sixteen sampled velocities (intensity) quantized in 128 steps. Therefore, the temporal resolution is a sixteenth note. The learning of the rhythm map is based on human's acoustic similarity of the rhythm vectors considering the 13

Page  00000014 LayerO Layer1 Layer2 Layer3t Layer4 0..... Figure 2. Rhythm hierarchical structure; Note are located in tree nodes Input Paired Vector{V I,V2) X'I X' Find BMU Find BMU S on Rhythm MapR1 I on Rhythm MapR2 RI R2 Figure 4. Rhythm map association. Left; illustration of map links.Right; Training procedure of map link as the four-dimensional matrix of connection weights between units on two rhythm maps; R1 and R2. In the following explanations, positions on the rhythm map R1 and map R2 are denoted as X1 and X2 respectively. The rhythm vectors stored in rhythm map Ri are denoted as Ri(Xi), and the connection weight between two units is expressed asw(Xi,X2), (0 < w < 1). The map link is trained to reinforce the connection of the two rhythms played at the same time. Moreover, connections of similar rhythms are also reinforced according to the distance in the maps. When the rhythm vector V1 and V2 are played at the same time, map links are updated by the following equations, supposing X1 as a BMU of V1 and X2 as a BMU of V2. wt+i(Xi,X2) 1 - {(1 - wt(Xi,X2))(1 - K(X1,X2))} (3) Velocity Rhythm Vector Representation...................., I............................................................................................................................................... Figure 3. Rhythm Map hierarchical structure of the rhythm as shown in the Fig. 2 [7]. If the notes of a rhythm in the higher layer are stronger than the notes in the lower layer, the rhythm is sensed more stable. Similarity S(V, V'), (0 < S < 1) between two rhythms V and V' is defined as a function of algebraic vector similarity and human's acoustic similarity on stability of the rhythms. By Considering the human's acoustic sensitivity, rhythm similarity are expected to be more intuitive and natural than the ordinal algebraic vector similarity. 2.3. Rhythm Map A Self Organizing Map [9] is employed as the rhythm map ( the Fig. 3 ) and has the important feature that similar rhythms are located in neighborhoods. Vectors are stored in units located in Cartesian coordinates X = (x, y), (1 < x, y < n) on the squared rhythm map R(X), where, n is size of the map, and x and y are integer numbers. When the rhythm vector Vin is given as an input, the best matching unit (BMU) X is detected according to the rhythm similarity, and then the rhythm map Rt(X) in each unit of the map is updated toward Rt+l(X) with the following equations. K(X1,X2) =exp(- 2Xl (X21 )exp( 27-1 2-yi(Xi) |X2 - X2 2 Y2 ( (X2) (4) i( i)8 e s o Xj =8neighbors ofXi S(Ri(Xi), Ri(X)) (5) ( x-xý}2 Rt+1(X)= Rt(X)+e 2c2(t) (Vi, o(t + 1) = 0.9999a(t) where, i = 1, 2, and 7 on each BMU is the parameter to express the area of update for neighbor units. 7 is determined by the average similarities among the 8-neighbors. m is constant. The Fig.4 illustrates the rhythm association system and its training procedure. 2.5. Human Performance We adopted the WiiTM Remote [10] (produced by NINTENDO) as a 3D acceleration sensor shown in the Fig. 5. The acceleration sensor of the Wii Remote measures the acceleration, and is used to detect ONSET (i.e. time to sound) and VELOCITY (i.e. the strength of the beat). In order to use the Wii Remote in Max/MSP/Jitter, we utilized the "aka.wiiremote" object developed by Akamatsu [11]. The acceleration signal is sent from the Wii Remote to the laptop computer by Bluetooth. When a human gestures with the Remote, i.e. like beating a drum, the acceleration signal is represented as shown in the Fig. 5. Then, the onset and its velocity in the beating gesture are detected by the following method. R(X)) (1) (2) where, o(t) is a parameter to decide the update region. 2.4. Rhythm Association For the association of rhythm maps, we propose a map link that can be referred from either side and expresses a many-to-many relationship [7]. The map link is defined 14

Page  00000015 Acceleration [G] OnSet Ac'L note velocity =M. {Ac'c-J(t) }dt Figure 5. Interface for Interaction. Right; illustration of onset detection,Left; Wii remote as an interface Figure 7. Behavior of Agent Figure 6. Ensemble of Interaction System From the acceleration waveform f (t), the time t, and tf are determined by the threshold Ac' (the Fig. 5). Where, t, and tf are defined as the time the swing starts and the time it finishes, respectively. The onset moment of the action is set to tf, and its velocity v (tf ) is set by following equation, V(tf) = M {Ac' - f (t)}dt (6) where, tf is corresponded to the sixteenth note i in the bar, and v(tf ) is regarded as vi. The velocity vi is 0 when the sixteenth note i does not correspond to onset time tf. The sound for onset is generated by a MIDI sequencer, and the user can select timbre of sound. (the Fig. 6) 2.6. Machine Performance When a human performs a rhythm V, its BMU X, on the rhythm map 1 is determined. Then, the relevance ratio (fitness potential) between X, and units on the map 2 is determined by the map link as shown in the Fig. 7. By adopting the relevance ratio on the map 2, we introduce an agent that is moving on the map 2 as a virtual player. Then, the agent can generate corresponding rhythms from the unit. We describe X, (t) as the BMU on the rhythm map 1 for a rhythm performed by human at time t, and X2 (t)=(^2(t), ^2 (t)) as a position of the agent. At the next time t + 1, the position of agent X2 (t + 1) is decided as equation (7) and (8). Here, a and 13 are constant. X2(t + 1)= dX2 (t) + X2(t) (7) 1 w dX2 (t) -= -(X1 (t),i X2 (t)) + 1dX2 (t-1 8) a OX2 The virtual player also performs a bass part. The pitch (MIDI note number) is determined by the velocity (the Fig. 6). By introducing the dynamic behavior of the agent, we realized a wider variety of machine performances that depend on the initial position and human player's reaction. 3. EXPERIMENTS We evaluated the proposed rhythm interaction system. We observed the behavior of the agent and human performance while examinees tried a performance, and we received some comments about the system. The number of examinees was five, with three of them having experience with playing instruments and the others having no experience. To collect the training data for system learning, 500 pairs of rhythm vectors are sampled from actual performances by a skilled player. The size of the rhythm maps is 20 x 20. Therefore, the size of the table of map links is 20 x 20 x 20 x 20. The rhythm map is trained 200 x 500 times. The map link is trained 500 times. The initial parameter ou(0) for the update region is set as 10, and the parameter m is set as 2.0. Ac', o, 13 and M were determined as -0.2 [G], 3.0 x 10-3, 0.5 and 250. The trajectory of performances during the interaction is shown in the Fig. 8. 11(X) and 12(X) represent average velocity in the rhythm vector on rhythm map R, (X) and R2 (X). If the average velocity of rhythm are large, luminance gets darker. Positions of indexes on Ii(X) and 12 (X) represent the BMU position on R, and position of agent on R2, and number corresponds the sequence of bar, when the position on map has changed. The reports from examinees and observation are as follows. The human players who had experience with music soon became accustomed to performing with the system. They could understand the dynamics of the system and recognized his/her role in the performance, such as "the user plays the solo part, and the computer follows", or "the computer plays actively as the main player and the user follows with simple rhythms." In the Fig. 8, from index 15 to 33, human played low velocity rhythm and computer played louder performance. 15

Page  00000016 Human player Ii(X) 19 Y Virtual player 12(X).......................................... 191 x ind........... 26R (:M) _ _.3.........1.. RMX(0........(3.... 0 19) index 315 Ri(19,8) j----j--- 26 Ri(g,16)..._. _. 31 R i(0,16) i.. _... _-..I... _ _ _ 3 7 R i( 17, 17), - IMNI 0 Figure 8. Trajectory of performances during the interaction On the other hand, it was initially difficult for the inexperienced users to play various rhythms at first. But soon, they learned to play after being inspired by machine performance. Even though the required time depends on individual users, they could eventually perform interaction sessions like a skilled player. At 28-36, When the agent reached the top of the fitness potential, the position of the agent became unstable,which made the machine performance more exciting and inspiring to the human performance. And at time index9-13 and index36-37, even thought human played similar rhythms, virtual player performed different rhythms, this is achieved by many-to-many association. The system can gradually generate rhythms that fit the human performance. However, it cannot respond to sudden changes of human rhythm because the agent does not jump over the neighborhoods. In order to catch up with the human performance when sudden change is occurred, an external force should be employed in the agent behavior model to allow a jump on the map. It means a sort of changing motif or restart of the virtual player. 4. CONCLUSION We proposed a music interaction system between human and machine rhythm performance, which is composed of rhythm maps with an association mechanism. We also adopted a 3D acceleration sensor as an interface for human performance. For system evaluation, we conducted some experiments with examinees including both musically experienced and inexperienced people. We received comments about our system such as; "It is easy to play music even for inexperienced people," "Users can feel creative interaction with computer after some practice". From the results, we gathered that users can feel a creative rhythm interaction with our system. Moreover, a nonunique reaction to a human performance was achieved with a many-to-many association and with the dynamic behavior of the agent. Users could also perform with unrestricted actions by using the 3D acceleration sensor. Based on some feedback from examinees, the following future works are being considered; the introduction of other gestures for sound control to express sustain or effect, the extension for rhythm and melody or music and image association, and the implementation for non-MIDI based ensemble. Acknowledgment This research is supported by CREST project "Foundation of technology supporting the creation of digital media contents" of JST and the 21st Century Center of Excellence Program, "The innovative research on symbiosis technologies for human and robots in the elderly dominated society," Waseda University. References [1] Raphael, C. "A bayesian network for real-time musical accompaniment," Advances in Neural Information Processing Systems,2002. [2] Raphael, C. "Synthesizing Musical Accompaniments with Bayesian Belief Networks," Journal of New Music Research, 2001 [3] Goto, M., Hidaka, I., Matsumoto, H., Kuroda, Y., Muraoka, Y. "A Jazz Session System for Interplay among All Players - VirJa Session (Virtual Jazz Session System) -," Proceedings of the International Computer Music Conference, 1996. [4] Hamanaka, M., Goto, M., Asoh, H., Otsu, N., "A Learning-Based Jam Session System that Imitates a Player's Personality Model," Proceedings of the International Joint Conference on Artificial Intelligence, 2003. [5] Taki, Y., Suzuki, K. and Hashimoto, S. "Realtime initiative exchange algorithm for interactive music system," Proceedings International Computer Music Conference, 2000. [6] Nishijima, M., Kijima, Y. "Learning on Sense of Rhythm with a Neural Network-The NEURO DRUMMER," Proceedings of the International Conference on Music Perception and Cognition,1989. [7] Kasahara, S., Saegusa, R., Hashimoto, S. "Rhythm Generation with Associative Self Organizing Maps," IPSJ journal (Submitted ) [8] Hashimoto, S., Sawada, H. "Musical Performance Control Using Gesture: Toward Kansei Technomogy for Art." Controlling Creative Processes in Music,1998. [9] Kohonen, T. Self-Organization and Associative Memory. Springer-Verlag, 1984. [10] Wii remote and nunchuk [11] Max object handles Nintendo Wii Remote. aka/max/ 16