sonogram of a sound and is a lowest level timbre
representation. Boundaries are necessary to clarify symbols.
The lighter part within the frame represents noise. Symbols
here are not one dimensional series, but expressed as a
disposition in a time-spatial domain. This type of structure
is defined in signal level, perceptive level, and cognitive
level in a self-similar manner. In the lowest level of timbre
shown as in Figure 1, we call the symbols micro timbres.
In the sound synthesis system Otkinshi[5] which runs on
a Windows system, hierarchical and self-similar structure
are implemented in a sound object, and those structures are
also reflected into the GUI (Graphical User Interface).
Figure 2 shows an example of a sound object defined
hierarchically. At the top level, music is expressed as an
icon. By double clicking, a user can reach the lower level.
Upper level sound (music) is defined as a disposition of the
lower level sounds (music) in a time-spatial domain.
4.4 Extension of Previous Timbre (Music)
Representation
Item 3 means that a new description should be an
extension of traditional common music notation. This does
not necessarily insist that common music notation can bear
new expression. On the contrary, a particular setting of
parameters in a new representation will be an equivalent to
common music notation.
Fig. 3 shows two types of music notation in O'kinshi.
The features in common to both representations are that the
horizontal axis represents time and the vertical axis
represents a dimension of timbre (part, channel).
In timbre notation as well as music notation, horizontal
axis represents time and vertical axis represents timbre,
which ranges from micro timbre to macro timbre in
correspondence to the level under consideration.
4.5 Discrimination of Embedded and Exposed
Structure
Item 4 is a requirement needed particularly when human
physical action is involved. Embedded structure represents a
symbol sequence in an upper level, and exposed structure
represents observed physical movement. Let's take an
example of speech utterance and its recognition. Text
sequence is an upper level embedded structure. But
observed data is not a text sequence but a physical signal
sequence where all are concatenated smoothly and not
observed distinctively. So there are two structures:
continuous representation of timbre in the lower level"
caused by a physical movement and discrete representation
of timbre in the upper level which drives the physical
movements. These two types of structures occur in many
other examples: dance pattern and its physical movement,
music score, and actual performed sound. The upper level is
always an intension, and the lower level is its practice.
jl.-3a
L- L.6C
- I I'.A
1..I.I
rA
11141!
-il W
'1
Figure 1 Symbolization of timbre (Signal level)
I iI.-- II
I.q'..I I
I W,
1 -
I H.i ".
I.ik. I
- - - -- M
"lid11|1 "** 1. 11 * *..I..
-.4.. i M-.Z. _r j - -e
n J. - 7.U1 9.I. 1i.. I
-.n..l I..1-.pin~...II......
i%-mle-.r
Figure 2. Hierarchical structure of sound object in sound
synthesis system O'kinshi[,T]
4.6 Definition of Operation and Grammatical
Structure
Some kinds of operations are defined by symbols. This
operation does not occur necessarily equivalently in all
levels of the hierarchy. Figure 4 depicts a sonogram of
female voice /ieaou/. Lighter portions of the figure represent
harmonics of five vowels. For those harmonics, our
perception groups them and makes a single stream. This
well known psycho-acoustic phenomena of grouping should
be reflected in symbol level operation. As an extension of a
one dimensional symbol sequence like a grammar, the rule
of both temporal sequence and spatial order is necessary, in
which boundaries of symbols are defined in a time-spatial
domain. This enables a theoretical description of musical
sequence as well as of chord progression.
Proceedings ICMC 2004