Page  00000512 EFFECTS OF TEMPO ON THE PERCEIVED SPEED OF HUMAN MOVEMENT Kathleya Afanador Todd Ingalls Ellen Campana Arts Media and Engineering Program Arizona State University Tempe, AZ 85287 United States ABSTRACT Studies in crossmodal perception often use very simplified auditory and visual contexts. While these studies have been theoretically valuable, it is sometimes difficult to see how the findings can be ecologically valid or practically valuable. This study hypothesizes that a musical parameter (tempo) may affect the perception of a human movement quality (speed) and finds that although there are clear limitations, this may be a promising first step towards widening both the contexts in which crossmodal effects are studied and the application areas in which the findings can be used. 1. INTRODUCTION Three intriguing occurrences-a misunderstood word, a talking puppet, and an elusive collision-have propelled the psychological research on crossmodal perception in which audition and vision are inextricably intertwined. These occurrences are now known as the McGurk Effect [11], the ventriloquist effect [1], and the bounceinducing effect [20] respectively, and all three have proven remarkably robust. Yet the study of crossmodal perception currently relies heavily on behavioral experiments using simple sounds and simple animations. While these studies have been theoretically informative, their contexts are so simplified that it is often difficult to see how the findings can be ecologically valid or practically valuable. The present study examines possible crossmodal effects of a musical parameter (tempo) on the perception of a human movement quality (speed) in hopes that this may be a first step towards widening both the contexts in which crossmodal effects are studied and the application areas in which the findings can be used. 1.1. Crossmodal Perception Discussions about crossmodal perception often center around the McGurk effect or the ventriloquist effect and their variants, both of which are situations in which vision dominates the effect. This paper is primarily interested in the opposite situation-when sound affects vision-and thus draws from examples in which sound directly affects perception of spatial/temporal organization and visual movement. The way in which sound is combined, or not combined, with a visual display can influence perception of object organization and movement within the scene. O'Leary and Rhodes [15] for example, showed that the perceived organization of a sequence of high and low tones could influence the perceived organization of moving dots on a visual display. Auditory information alone may be perceived differently depending on tempo: at slow tempi, alternating high and low tones are perceived as a single stream of sound while at high tempi the high and low tones segregate into separate streams [2] The perception of visual information varies similarly: when dots are displayed moving from left to right in alternating high and low positions at slow rates, a viewer perceives a single dot moving up and down while at faster rates a viewer is more likely to perceive two dots moving horizontally. O'Leary and Rhodes showed that when the high and low tones were heard as two streams, viewers were more likely to see two dots even at rates which would, in a unimodal display, normally result in the perception of one dot. The perceptual organization of objects in the scene therefore also produced a change in the objects' perceived movement pathways. Examining this more directly, Sekuler, Sekuler, and Lau [20] showed that movement pathways can be interpreted differently in the absence or presence of sound through the bounce-inducing effect, where two moving targets are seen to stream through each other in silence but are seen to bounce off of each other when a sound is introduced at the moment of visual coincidence. This effect occurs because the visual stimulus is inherently ambiguous. Sound resolves the ambiguity by biasing a viewer to favor integrating sound and movement into a single event that makes sense [1].Yet Shams, Kamitani, and Shimojo [21] demonstrated that a single flash of light accompanied by multiple beeps is perceived as multiple flashes. Thus even when no ambiguity is present sound can qualitatively alter perception of a visual stimulus. These findings therefore support Vroomen and de Gelder's contention that "cross modal combinations of features not only enhance stimulus processing but can also change the percept." [22] Perceptual judgment tasks have indicated that audition dominates vision in temporal processing. This is sometimes called auditory capture and it stems from 512

Page  00000513 claims that vision and audition are each more sensitive to spatial and temporal processing respectively and from evidence that one modality dominates the other when conflicting spatial and temporal information is presented. One such study by Repp and Penel [19] asked participants to tap their finger in synchrony with auditory and visual sequences containing an event onset shift, with the expectation that this would cause involuntary phase correction responses. Their auditory sequences consisted of identical high pitched piano tones and their visual sequences consisted of black X's on a screen and flashing lights. Within the unimodal conditions, audition produced the smallest variability in taps, larger phase correction responses, and better event onset shift detection. Interestingly, results from the bimodal condition were very similar to those of the unimodal auditory condition indicating that although viewers' attention was aimed at the visual sequences, they depended more upon auditory information to perform the task. If this holds true for more complex stimuli, it suggests the possibility that auditory information also dominates temporal perception when watching human movement. The bounce-inducing effect is additionally an example of congruence-the combination of two media that produces the perception of a relationship between them even when such relationships are coincidental. When two media are presented simultaneously, a viewer assumes relationships between the two media exist and thus looks for them [3][7][13]. Bolivar et al. termed this finding "visual capture" [3]; in other words, visual stimuli influence people to interpret simultaneously presented auditory stimuli as somehow related. Likewise auditory capture may occur from congruence, as when music influences people to perceive simultaneously presented visual material as somehow related. Lipscomb and Kendall [10] for example, paired an abstract film excerpt with a variety of different musical accompaniments and found that viewers perceived several musical choices as a "good fit." Similarly, Mitchell and Gallaher [13] paired three different dance sequences with three different musical sequences and found that congruence was perceived among several different combinations of dance and music (not only between the dance and its intended musical selection). Although Bolivar et al. used visual images with a strong narrative context in their experiment, the findings of Lipscomb and Kendall as well as those of Mitchell and Gallaher suggest that the simultaneous presentation of abstract sound and movement may be well suited to produce perceptions of similarity which may, in turn, facilitate crossmodal effects. 1.2. Music Perception and Human Motion Perception of sound with human movement has been studied to some degree in the area of music perception as it relates to dance. Much of this work focuses on establishing congruence between music and dance by focusing specifically on dynamic qualities [6], general emotion or style [13], or section beginnings and endings [8] of both sound and movement. One recent study, however, examined the effects of various sound parameters on imagined motion. Eitan and Granot [5] asked participants in their experiments to visualize an animated human character (cartoon) of their choice. They were presented brief musical selections, and for each selection were asked to visualize their character moving in an imaginary animated film shot with the given melody as its soundtrack. Their purpose was to analyze the relationship between music and motion in imagined space based upon Clarke's [4] contention that "since sounds in the everyday world specify (among other things) the motional characteristics of their sources, it is inevitable that musical sounds will also specify...the fictional movements and gestures of the virtual environment which they conjure up" [5]. The experiment produced an asymmetrical model of imagined musical space-the fact that a musical stimulus seemed to suggest a particular kinetic quality did not imply that the opposite musical stimulus suggested the opposite kinetic quality. Central to the results of this experiment however, is the finding that by changing sound parameters, participants' imagined motion would change predictably. This suggests that there may be certain natural affinities between sound parameters and movement parameters, yet the asymmetries discovered suggest that the way these affinities are structured may be somewhat nuanced. 2. EXPERIMENT Among the various sound parameters in Eitan and Granot's study, inter-onset-intervals (101, the interval of time between the onsets of successive sounds) were found to affect imagined motion most strongly and symmetrically. Decreasing and increasing intervals strongly influenced participants to imagine motion speeding up and slowing down respectively. In short, what we hear affects the movement we imagine. Historically and theoretically, this finding is not surprising. The association between tempo and human movement speed is arguably the most apparent soundmotion relationship. This association begins in early infancy, evident in high sensitivity towards "regular synchronization of vocal and kinesthetic patterns" [16] and this sensitivity continues to develop through childhood [14]. Humans seem to have an ingrained penchant for rhythmic synchronicity in their own movements [12], whether it is to synchronize with an auditory pulse or to synchronize with the movements of others around them. Phillips-Silver & Trainor have further established that for both infants and adults, auditory encoding of rhythmic patterns can be directly influenced by the movement of their own bodies [17] [18]. In short, the movement we feel affects what we hear. Within the context of music and dance, it is also fairly common for different pieces of music to "bring 513

Page  00000514 out" different dynamic qualities in the same dance. Although this particular point has not been studied empirically, it may be supported somewhat by the congruence studies mentioned above and it hints at the possibility that a sonic change could cause a real change in the perception of a dynamic movement quality. In short, it may be suggested that what we hear affects the movement we see. Based on these reasons, the present study hypothesized that a decrease or increase in inter-onsetintervals would cause a change in the perception of visual movement speed. Would viewers be influenced to perceive a pairing of movement with a fast tempo as faster overall than a pairing of the same movement with a slow tempo? And if so, could it be conceptualized as a variation on auditory capture? 2.1 Method Fourteen undergraduate students participated in this study on a voluntary basis and received one class credit for their time. They were instructed that the experiment was about the perception of human motion but beyond that, all were naive to the purpose motivating this study. Stimuli consisted of videos showing six movements chosen from Laban Movement Analysis [9]-rising, sinking, advancing, retreating, spreading, and enclosing (Figure 1). A single dancer was recorded doing all six movements at three speeds-fast, medium, and slow-with a camcorder synchronized to a motion capture system resulting in 18 video clips and 18 corresponding motion capture data files. Figure 1 Rising, sinking and advancing (top); retreating, spreading and enclosing (bottom). The motion capture data was fed into a pattern recognition model, which analyzed the movement (100 frames/sec) based on the probability that one of these six movements was occurring. These probabilities were then put through a Max/MSP program, which generated sound from the data. The sounds produced were series of clicks, varying in IOI rate according to the speed of movement. Three base rates (550 ms, 500 ms, and 450 ms) and three maximum rates (150 ms, 100 ms, and 50 ms) were used to control the 10 and the probability ratings from the motion analysis determined the transition from base rate to maximum rate. Thus when no movement occurred, the 10 rate was simply the base rate; when fast movement occurred, the recognition model would rate the probability of one of these movements occurring very highly and consequently the 101 would decrease quickly but when slow movement occurred, the probability ratings increased more slowly causing the 101 also to decrease slowly. Each of the 18 data files was put through the Max/MSP patch 9 times (one for each base rate/max rate pairing) and the resulting sound files were synchronized with their corresponding video clip, producing a total of 162 video clips. Participants watched the videos on a 20 inch wide screen computer and heard the sound through external speakers. They were presented each video individually followed by two statements with which they rated their agreement on a scale of 1 (strongly disagree) to 7 (strongly agree), indicated by numbered buttons. The first statement was always either "the movement was fast" or "the movement was slow." The second statement functioned primarily as a distracter. Participants saw each video exactly twice and responded to both the fast and slow statements for each video. The order in which the videos were presented was randomized and for each video, half of the participants responded to the fast statement first while half responded to the slow statement first. 2.2 Results Changes in 101 were found to influence viewers' perception of human movement speed for one set of videos in our experiment. For the medium speed retreating movement, participants indicated significantly different levels of agreement with the statement "the movement was fast" as the minimum IOI length decreased, even though the actual movement they saw was identical across the different tempos. This difference was statistically significant (F(2, 96)= 3.17, p <.05). There were no statistically significant differences across inter-onset intervals for the other videos that we presented, although there was a main effect of actual movement speed (Figure 2), indicating that participants were attentive to movement speed and could accurately distinguish between slow, medium, and fast movements (F(2, 1779)= 400.16, p <.01). Average ratings of speed vs. actual movement speed Figure 2 MRATE is defined as the actual movement rate and coded as 1 (slow), 2 (medium), and 3 (fast). 514

Page  00000515 2.3 Discussion The results provide some preliminary support for the hypothesis that differences in tempo, specifically interonset intervals, may be able to affect the perception of observed movement; however, the limitations of this study are clear. It is possible that the rating task used to measure perceived movement speed was either too cognitive or too coarse-grained. More precise measures achieved through a staircasing method may prove fruitful in attaining significant results. Additionally, it may be necessary to define objectively what is meant by slow, medium, and fast movement speeds. If many instances of crossmodal effects occur when there is ambiguity within one modality then it should be the case that ambiguous movement speed-the range within which it can be perceptually be interpreted as either fast or slow-would be most likely to show the effect. We are currently exploring both of these with additional tests. 3. CONCLUSIONS Further studies will solidify the speculations prompted by this preliminary study and it remains to be seen how and where the findings of such studies can be applied. The various ways in which audition and vision have already been shown to interact make this area worth investigating further and furthermore, this area may prove interesting to artists and developers of computerinteractive environments that use human movement as input. If sonic feedback is to be used as a response to movement, it will be informative to know what, if any, effects on human motion perception are caused by dynamic sound changes, how these effects may function differently within a given context, and how attention may facilitate or inhibit them. REFERENCES [1] Alais, D. & Burr, D. The ventriloquist effect results from near-optimal bimodal integration. Current Biology 14 (3), 257-262, 2004. [2] Bregman, A. S. Auditory scene analysis. Cambridge, MA: MIT Press, 1990. [3] Bolivar, V.J, Cohen, A.J., & Fentress, J.C. Semantic and formal congruency in music and motion pictures: Effects the interpretation of visual action. Psychomusicology, 2, 38-43, 1994. [4] Clarke, E. F. Meaning and the specification of motion in music. Musicae Scientiae, 5, 213-234, 2001. [5] Eitan, Z., & Granot, R. How music moves. Music Perception, 23(3), 221-248, 2006. [6] Hodgins, P. Relationships between score and choreography in 20th century music and dance: Music and metaphor. London: Mellen, 1992. [7] Iwamiya, S. Interactions between auditory and visual processing when listening to music in an audiovisual context. Psychomusicology, 13, 133 -154, 1994. [8] Krumhansl, C.L. & Schenck, D.L. Can dance reflect the structural and expressive qualities of music?: A perceptual experiment on Balanchine's choreography of Mozart's Divertimento No. 5. Musicae Scientae, 1(1), 63-85, 1997. [9] Laban, R. Principles of dance and movement notation. With 114 basic movement graphs and their explanation. London: Macdonald & Evans, 1956. [10] Lipscomb, S. D., & Kendall, R.A. Perceptual judgment of the relationship between musical and visual components in film. Psychomusicology, 13, 60-98, 1994. [11] McGurk, H. and MacDonald, J.W. Hearing lips and seeing voices, Nature, 264, 746-748, 1976. [12] McNeill, W.H. Keeping together in time.- dance and drill in human history. London: Harvard University Press, 1995. [13] Mitchell, R.W. & Gallaher, M.C. Embodying music: Matching music and dance in memory. Music Perception, 19(1), 65-85, 2001. [14] Moog, H. The development of musical experience of children in the pre-school age. Psychology of Music, 4, 38-45, 1976. [15] O'Leary, A. and Rhodes, G. Cross-modal effects on visual and auditory object perception. Perception & Psychophysics, 35, 565-569, 1984. [16] Papousek, M. Intuitive parenting: A hidden source of musical stimulation in infancy. In I. Deliege & J. Sloboda (Eds.), Musical Beginnings.- Origins and Development of Musical Competence (pp. 88-112). Oxford, New York, and Tokyo: Oxford University Press, 1996. [17] Phillips-Silver, J. and Trainor, L.J., Hearing what the body feels: Auditory encoding of rhythmic movement. Cognition (in press), 2007. [18] Phillips-Silver, J. and Trainor, L.J., Feeling the beat: movement influences infants' rhythm perception. Science, 308, 1430, 2005. [19] Repp, B. H. and Penel, A. Auditory dominance in temporal processing: New evidence from synchronization with simultaneous visual and auditory sequences. Journal of Experimental Psychology.- Human Perception and Performance. 28(5), 1085-1099, 2002. [20] Sekuler, R., Sekuler, A. B., & Lau, R. Motion perception. Nature, 385, 308, 1997. [21] Shams L, Kamitani Y, and Shimojo S. Visual illusion induced by sound. Cognitive Brain Research, 14, 147-152, 2002. [22] Vroomen, J. and de Gelder, B. Sound enhances visual perception: Cross-modal effects of auditory organization on vision. Journal of experimental psychology." Human perception and performance, 26(5), 1583-1590, 2000. 515