Page  435 ï~~Listening Simulations: Concepts, design and an application to Virtual Reality Eliot Handelman Princeton University eliotphoenix. princeton. edu Abstract Automated Composition can be regarded as a search problem in want of a predicate. A listening simulation is such a predicate. Its object is to compute response to music in some way that is evocative of human response; an automated composition system would then be able to query the listener for information pertaining to the way some compositional choice served overall objectives. An exploratory architecture is presented based on a "sensation-seeking" personality construct, characterized by rapid habituation and augmenting perception. The architecture attempts to predict as much as is necessary to maintain optimal levels of arousal. Methods for evaluating such architectures are discussed. Thinking through listening architectures can yield, at best, a powerful methodology for the study of cognition and, ultimately, the simulation of consciousness itself. 1 Concepts Automated Composition can be regarded as a search problem in want of a predicate. A listening simulation (henceforth "listener") is such a predicate. Its object is to compute response to music in some way that is evocative of human response; an automated composition system would then be able to query the listener for information pertaining to the way some compositional choice served overall objectives. But what computational listening criteria are "evocative of human response?" Recognizing that any high-level listening model necessarily reflects only one of many available personality constructs, a sensgation-seeking approach was selected. This approach contrasts with with the popular informationprocessing model of cognition, which limits the activity of the nervous system to transmitting and receiving information. This deemphasizes the active role of perception ([Sokolov 1990]) which, according to [Held 1989], should be considered "reflective activity" rather than "passive reception." Sensation-seeking behavior has two major biological, and hence computational implications. The first is that redundancy leads to rapid habituation; the second is a tendency to augment perceived intensity. 1.1 Rapid habituation The habituation of the orienting response (OR), a phenomenon first noted by Pavlov in 1910, received extensive attention by E. N. Sokolov [Sokolov 1963]. The orienting response consists of physiological changes in the organism, due to arousal, in encountering novel stimuli; in the course of repeated encounters, these changes gradually abate and then disappear altogether. Habituation suggests that the brain constructs a predictive model of encountered stimuli, called by Sokolov a "stimulus neuronal model." The computational-listening application of this idea is obvious. 1.2 Augmented perception "Augmenters" are subjects whose judgements of intensity are magnified, rather than reduced; light flashes at equal intensity produce a positive averaged evoked response slope in sensation-seekers as measured by the EEG, which evinces the increasing impact of the stimulus on cortical cells prior to further processing ([Zuckerman 1979]). The habituating and hence reducing OR, which is among the end results of a "receptor reaction that is shunted through a complex neuronal tract," is therefore opposed by a tendency in early processing to magnify the stimulus. The higher-level "neuronal model" strives to undo the augmenting effects of the lower-level "sensory" module. This ICMC Proceedings 1993 435

Page  436 ï~~interplay provides a basic operational definition of the listening simulation. 1 1.3 Compositional induction of neuronal model This definition also has the merit of defining possible terms of composition. The conception is of a composition planner with full access to the brain states of its listener. These states can be read as the interplay between the neuronal model and sensory augmentation. Composition then consists in the specification and induction of this interplay; it produces stimuli which promote, then violate and transform, specified neuronal models. A representative composition scenario in these terms might be: "arrange for a brief silence to be more arousing than any preceding music." 2 Implementation L2 is a large cognitive architecture, described in [Handelman 1991], that was devised to explore these ideas. In the interests of simplicity, auditory stimuli were idealized as one-dimensional units of variable duration differing only in intensity. Intensity of zero corresponds to silence. Intensity is correlated to arousal potential, providing "emotional" incentive. The high-level directive of the architecture is to predict as much as is necessary to keep this potential at an optimal level. Because L2 aspires to auditory generality. an effort was made to restrict the number of explicitly coded "rules" to a minimum; instead an inductive mechanism was provided as part of the predictive machinery. 2.1 Operation The basic operation is as follows. Events are presented to an event listener (El), which augments the intensity of the event in accordance with levels indicated by a very short-term sensory store. If intensity (and hence arousal) surpass an optimal threshold, a global listener (Gi) instructs the event listener to sch enatize; the result is a schen atic listenter (Si) that acts as an intercessor between the event and global listeners, running automatically generated code that projects the expected event based on conttest test. The event is imbued with a feeler, which triggers associated si's. One Si refines another when it expands its context-test, the 1lImust take credit or blame for likn augrnentingpreprocessin and rapid habituation; the literature is, to my knowledge, silent on this possiliUty, which was suggested by musictheoretical considerations. most basic type of which is the feeler. In operation hundreds of parallel and hierarchal Sl's may be generated, refining on each other and on El. An effect of projecting into the environment expectancies which are violated is to increase the actual intensity of events; the intensity of correctly predicted events is lowered. The listener regards these altered events as events it has not heard before. The end result is that the listener tries to predict its own effect on the environment in a process of perceptual individuation. This process suggests the terrifying scale that the project of implementing a listener must eventually assume: nothing less that a computational model of auditory consciousness seems indicated. The violation of a schematic listener's expectancies are detected by the global listener in a way that analogizes with its detection of supraliminal intensities. The response is to construct yet a fourth class of listener, a monitor (MI), triggered by an expectancy violation in the schematic cousin. The same principles that govern the treatment of the environment are applied to the treatment of Si's; the effect is that the architecture listens to the environment as mutations in its network of expectancies. Expectancies are organized point-wise, in contrast to a later but unimplemented listener that was to schematize globally, inferring gross generalities which it would attempt to refine. But the one approach does not rule out the other; an ideal listener would interact from both perspectives. 2.2 Further topics Very many ideas did not reach fruition in L2, although their seeds were planted. The encoding of "memory traces," in particular, was an area that received much attention. As work progressed the idea took hold that the past should be encoded as a quality of the present, rather than as a literal store of events, however these are mutated or deranged. Another idea was that active "remembering" was to have the effect of irrevocably altering the contents of memory, in a process evoking dream-like activity. Or again, all actions of the listener were to be coded in such a way that they were homologous to the lowest descriptive level of listening, such that "sensory" activity--istening-- and "cognitive" activity--planning, categorizing, etc.--would be structurally indistinguishable. The possibilities, obviously, are vast. 436 ICMC Proceedings 1993

Page  437 ï~~3 Evaluation The evaluative problem, judging whether the listener "hears" in some way evocative of human response, is considerable. Earlier listening models suggested that design defects could be detected in music produced through the listener; the composition planner, in this case, consisted of a simple search procedure that tried to maintain maximal arousal. Search, unfortunately, quickly becomes intractable when not assisted by heuristics; some efforts were made in the earlier model to construct a planner that induced such heuristics. In the final analysis, listening and composition are separate, though obviously related, fields. No planner was constructed for L2, in part because auditory events had been idealized. Rather than treat this as a deficit, this suggested a further level of abstraction: a third generation listener was planned for an auditory environment in an entirelyj artificial acoustic domain. The functional problem, refining from L2's concept of interplay, was to differentiate between sensation coming from within-the effect of projectively altering the environment-and that coming without, corresponding to veridical perception. Note that music could be constructed that could facilitate or impede this process; but this awaits investigation. Inspection of the listener is difficult, not to say tedious. Since the system mutates itself and its environment its action borders on opacity. A solution suggested itself which had the effect of mutating the original problem. This I now describe. 3.1 An application to Virtual Reality The most expedient way of detecting mutations in the system is to produce these changes as something which can be heard. Thus one would "listen to the system listening." This solution has the additional advantage of deemphasizing the cognitive "portability" of the simulation: the requirement that listeners evoke plausibly "human" response gives way to the idea that response and the activity of cognition should be imaginative, more or less in a "compositional" manner. This suggests that listening simulations be devised as auditory intelligences-as it were, a new compositional possibility suggesting an afinty with Virtual Reality. Here the listener would project into the virtual environment ways of hearing, with the effect of actively extending perception into this environment. I have discussed these possibilities in [Handelman 19931. 4 Summary While the design of a listener should reflect relevant facts of biology and psychophysics, the "high-level" characterization of individualistic approaches to listening is, fundamentally, a compositional topic. The final test of a listener involves auditory output, whether as music induced by a composition planner or, perhaps more interestingly, audio output reflecting the projective perceptual acts and even computational operations of the listener itself. Thinking through a full explication of proposed ways of hearing can yield, at worst, idiosyncratic (or fictional, or surreal) architectures. At best, it may yield a powerful methodology for the study of cognition and, ultimately, the simulation of consciousness itself. References [Handelman 1993] Eliot Handelman. "Permeable Space: a language of virtual perception." Proceedings of the Third International Conference on Artificial Reality and Tele-Ezistence, July 7-9, Tokyo (ICAT '93). [Handelman 1991] Eliot Handelman. Music as secondary consciousness: an implementation. Unpublished Ph.D. thesis, Princeton University. [Held 1989] Richard Held. "Perception and its neuronal mechanisms." Cognition, 33: 139-154 [Sokolov 19.63] E. N. Sokolov. Perception and the conditioned reflex. Waydenfeld, S. W. trans. NY: Macmillan. [Sokolov 1990] E. N. Sokolov. "The orienting response, and future directions of its development." Pavlovian Journal of Biological Science. V. 2, No. 3. July-September 1990. [Zuckerman 1979] Marvin Zuckerman. Sensation Seeking: beyond the optimal level of arousal. Lawrence Erlbaum Associates, Hillsdale, New Jersey. ICMC Proceedings 1993 437