Page  284 ï~~An Artificial Perception Model and Its Application to Music Recognition Andranick Tanguiane ACROE-LIFIA, 46, av. Felix Viallet, 38000 Grenoble, France Abstract The data obtained by signal processing are selforganized in order to separate patterns before their recognition. The performance of the model is similar to perceiving objects in abstract painting without their explicit recognition. The input information is described in terms of generative patterns and their transformations. Such a representation is less complex than the data themselves. The complexity of data is understood by Kolmogorov, i.e. as the amount of memory required for their storage. Representing the data in the most efficient way, we obtain their description, revealing certain causal relationships. The approach is applied to voice separation. Chord spectra are described in terms of generative subspectra, providing the decomposition of polyphony into parts and chords into notes. It is shown that the representation of a chord spectrum corresponding to the causality in the data generation is least complex. The demonstration is based on the factorization of chord spectra into the convolution product of indecomposable spectra, similarly to the factorization of integers into primes. The model explains logarithmic scaling in pitch perception and the insensitivity of the ear to the phase of the signal as the conditions necessary for voice separation. The model explains also some rules of music theory as simplifying adequate perception of polyphony. Keywords: Artificial perception, pattern recognition, automatic notation of music, voice separation, rhythm/tempo tracking, signal processing. 1 Introduction We distinguish two stages in pattern recognition: (a) object segregation, i.e. grouping data into messages; (b) object identification, i.e. matching the segregated messages to known patterns. For example, the first stage corresponds to distinguishing lines, spots, etc. in abstract painting, but their explicit recognition is the task for the second stage. We deal with the first stage of "not-intelligent" perception. The related model of so called correlative perception is based on some general principles and properties of human perception rather than on any particular knowledge about the patterns. The model is applied to voice separation required in automatic notation of performed music. The contemporary state of the research is reviewed in (Tanguiane 1993), where one can find more related references. In Section 2, "Principle of Correlativity of Perception", we introduce some basic assumptions about data representation. In Section 3, "Applications to Voice Separation", we formulate the problem of chord recognition as the problem of recognizing acoustical contours. In Section 4, "Problems of Justification", we enumerate the questions to be answered in order to substantiate our model. In Section 5, "Generation of Chord Spectra", we represent a chord spectrum as generated by multiple translation of a tone spectrum. In Section 6, "Factorization of Chord Spectra", we formulate the problem of chord decomposition as deconvolution of the chord spectrum and show that there is the only non-trivial deconvolution of a musical chord which corresponds to the chord generation. In Section 7, "Causality and Optimal Data Representation", we show that this non-trivial deconvolution of the chord spectrum is less complex than the spectral data themselves. In Section 8, "Applications to Psychoacoustics and Music Theory", we discuss the assumptions of our model. It is shown that logarithmic scaling and the insensitivity to the phase of the signal are essential conditions in order to realize the chord recognition. Besides, we show that some statements of music theory can be interpreted from the standpoint of our model as providing the conditions for adequate perception of polyphony. In Section 9, "Conclusions", we enumerate the main statements of the paper. 6B.4 284 ICMC Proceedings 1993

Page  285 ï~~AAAAAAAAAAAAAAA AAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA A AAAA AAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAA a) b c d) Complexity of rhythmic pattern 6 3 12 6 Complexity of its transformation 0 4 0 4 Total complexity 6 7 12 10 Figure 1: Pattern of B composed by patterns of A a) ljjnJ. b) J J J R012 d) 8.012 Table 1: Complexity of representation of events in Fig. 2 perceived as a long rhythmic pattern rather than a short one being repeated. This means that the representation in Fig. 2b is inadequate. If the same rhythmic sequence is presented in a melodic context shown in Fig. 2c, the sensation of repetition becomes much more clear. Now the representation in Fig. 2d seems to be adequate. (One can observe a similar phenomenon while recognizing a fugue theme in augmentation or diminution.) To explain such an ambiguity in rhythm perception, estimate the complexity of the four representations. Suppose that one byte is needed to code a duration, and two bytes are needed to code a duration with pitch. Also suppose that in order to call the repeat algorithm we need four bytes. Under such conventions the complexity of the representations (in bytes) is given in Table 1. In case of pure rhythm its representation as a single pattern is less complex, whereas in the melodic context its representation as a repeat is more efficient. Thus the hierarchical principle of data representation is provided with a feedback, guiding the process of data representation in the simpliest way. Figure 2: Four representations of the same rhythm 2 Principle of Correlativity of Perception By correlativity of perception we mean the capability of perception to group stimuli and to form patterns of relationships between these groups. These groups of stimuli and relationships between them we call low-level patterns and high-level patterns, respectively. For example, the pixels (stimuli) in Fig. 1 form small characters A (low-level patterns) which in turn form a contour of a large character B (high-level pattern). Using such representations, it is possible to recognize high-level patterns not recognizing low-level patterns. If instead of A some unknown symbol (not recognizable) was used, the pattern B would be still recognizable by the relationships between these unknown symbols. As we show later, the recognition of chords doesn't require recognizing tones (pitch), which corresponds to human perception. Thus we represent data in terms of repetitive (correlating) messages, or generative elements and their transformations. Such a representation is less complex than storing the data themselves. The complezity of data is understood by Kolmogorov, i.e. as the amount of memory storage needed for the data representation. To illustrate the idea of complexity, consider the rhythmic pattern shown in Fig. 2a. It could be perceived as a repetition of the first three durations performed two times faster, corresponding to the representation in Fig. 2b where R012 denotes the call for the repeat algorithm R with parameters: Begin from time 0, repeat 1 time, perform 2 times faster. However, the given sequence of events is 3 Applications to Voice Separation A voice is associated with a group of partials which move in parallel with respect to a log2-scaled frequency axis. A dynamical trajectory of the group of partials is called a melodic line, or a part in polyphony. From our standpoint, a melodic line is a highlevel pattern, generated by low-level patterns which are associated with notes. A statical contour generated by a repetition of a group of partials is a highlevel pattern of a chord. To separate voices (recognize chords), one has to represent spectral data in terms of generative elements and their trajectories (contours). Note that our approach to voice separation is based on recognizing the similarity of voices, in contrast to the approach (Chafe & Jaffe 1986; MontReynaud & Mellinger 1989), based on recognizing their dissimilarity. To illustrate the procedure of chord recognition, consider two simple chords shown in Fig. 3a which ICMC Proceedings 1993 285 6B.4

Page  286 ï~~b) a) c) No. of frequency asj(k) 82(k) band k Heine See.erhebtdnH Hz Pitch 2200 c#4 1760 a3 1650 9#3 1408 f3 1320 e3 1056 C3 990 A2 880 a2 704 f2 660 e2, 440 al 352 fi 330 el -I II V I II! = 34 1-.. 0 0 0 30 1 29 1 0 0 26 0 25 1 v 0 0 0 0 20 1 v 0 18 0 0 0 0 0 13 1 0 0 0 0 0 0 6 1. 0 0 0 0 1 1 v 0 t Figure 3: Spectral representations of chords: a) chords (el; al) and (fi; al); b) their log2-scaled spectra for tones with 5 partials (marked by black or white, respectively); c) the discrete clipped audio spectra of the chords to within a semitone (the arrows show the parallel motion of partials) have the spectra shown in Fig. 3b. Represent the spectra of the chords as shown in Fig. 3c. Thus a chord is identified with a binary string {sn(k)), where n is the number of chord (in our example n = 1,2), k is the number of frequency band (in our example k = 1,..., 34), and I if the signal has partials in the sn (k) = kth frequency band, 0 otherwise. We recognize melodic intervals between successive chords by peaks of the correlation function Rnn+1(i) = Zs,(k)s,+i(k + i). k If this function has a peak at point i we suppose that the correlating subspectra correspond to similar tones which form interval of i semitones. In this case the correlating subspectrum is said to be the generative group of partials of the given melodic interval. Harmonic intervals in the nth chord are recognized by peaks of the autocorrelation function R,n(i)- = =sn(k)sn(k + i). k Figure 4: The 130th J.S.Bach chorale used for computer experiments on recognition of chords Note that the correlation functions may have peaks at the points which don't correspond to real intervals. It is caused by coincidences of the partials belonging to different tones. Fortunately, the structure of groups of partials which correspond to real tones is repeated regularly, forming a stable spectral pattern. On the contrary, a correlating group of partials which doesn't correspond to a real tone has a random spectral structure. Therefore, in order to recognize true intervals, we must examine the generative groups of partials (revealed by correlation analysis) and reject the incidental groups with no stable structure (which occur only once). To recognize stable spectral patterns, we apply correlation analysis to different generative groups of partials. Our approach is tested with computer experiments on recognizing harmonic and melodic intervals in J.S.Bach chorale (Fig. 4). The choral has been considered as a sequence of 24 chords whose spectra have been computed and analyzed under various assumptions. The experiments have differed in voice type (harmonic, or inharmonic), number of partials per voice (5, 10, or 16), frequency resolution (within 1, 1/2, 1/3, or 1/6 semitone), and optional restriction on the considered intervals up to 12 semitones (in order to avoid the octave autocorrelation). The recognition reliability in these 48 experiments has been always better than 98%. 4 Problems of Justification Thus a chord spectrum is regarded as generated by a tone pattern translated along the log2-scaled frequency axis, according to some interval structure. In this case, the low-level configurations are similar tone spectra, resulting from translations of a generative tone pattern. The high-level configuration is the contour circumscribed by the generative tone pattern, which is associated with the interval structure of the chord. However, reasonable arguments and good recognition results are not yet a strict substantiation of the model. It may happen that the method is well fitted to the data considered, but will fail in some other experiments. For example, some chord spectra may be representable in terms of generative tone spectra in several ways, implying the ambiguity in their recognition, or the optimal (least complex) representation may differ from the chord decompo 6B.4 286 ICMC Proceedings 1993

Page  287 ï~~sition into notes. Therefore, in order to substantiate our model, we must be sure that: (a) representing a chord spectrum as translations of a generative tone spectrum, we obtain its description corresponding to its perception; (b) such a description provides the optimal representation of spectral data; (c) this optimal representation is unique. Now we are going to confirm the above items for two-tone intervals, major triads, and minor triads. For this purpose we suggest a special mathematical machinery. First, we show that a chord spectrum is a convolution of a generative tone spectrum by an interval distribution. Then we establish an isomorphism between polynomials over integers and discrete audio spectra with respect to addition and convolution. Unlike polynomial over integers, the unique factorization deosn't hold for polynomials over positive integers (see below) which correspond to discrete power spectra. However, the unique factorization holds for the class of polynomials which corresponds to chord spectra built from harmonic tones. Next we show that power spectra of musical tones are irreducible, as well as interval distributions of two-tone intervals, major triads, and minor triads are irreducible as well. Hence, the only factorization of a chord spectrum equals to its generation. This approach to spectral decomposition into irreducible factors can be applied in signal processing, speech recognition, and theory of generalized functions where the deconvolution problem arises. The isomorphism between polynomials and discrete spectra enables using algebraic methods in spectral analysis. 5 Generation of Chord Spectra We shall consider audio spectra, i.e. we assume the frequency axis to be log2-scaled, implying equal distances corresponding to equal musical intervals. We restrict ourselves to discrete power spectra limited in low frequencies and bounded in high frequencies, assuming that both frequency bands and signal levels are expressed by non-negative integers, while the number of bands with a positive level being always finite. Thus by spectra we understand the expressions of the form N N S=-"S(z)=- a6( - n)=Z a6,., (1) n=0 n=0 where 6, are Dirac delta-functions and a~ are nonnegative integers. The value an can be interpreted as the signal power in the nth frequency band. The phase of partials is ignored, corresponding to human perception. The support of spectrum (1) is defined to be the set of its partial frequencies As = {n: an 0}. Recall that a musical, or harmonic tone (with a pitch salience) is characterized by a harmonic ratio of the partial frequencies 1: 2:...: K. Then the support As of its discrete audio spectrum S has the form As ={nk:nk=[Clog2k+0.5], k=1,...,K}. (2) where ni is associated with the fundamental frequency or pitch, the constant C characterizes the accuracy of spectral representation, being equal to the number of frequency bands per octave, and [-+ 0.5] is the rounding function, since [-] retains the integer part of its argument. A spectrum S whose support As has the form (2) is said to be harmonic. Since we consider a log2-scaled frequency axis, pitch shifts correspond to parallel translations of the tone spectrum along to the frequency axis. A translation of a spectrum (1) by m bands to the right corresponds to the convolution 6m * S = E an6n+m n (3) As shown by Tanguiane (1993), a chord spectrum can be regarded as generated by a multiple translation of a tone spectrum. By virtue of (3), a chord spectrum S can be represented as follows S = Z(bm6m *T) = (E bmbm) * T = I * T, (4) M m where T is a tone spectrum, and I = Em bm6m is the interval distribution. For example, if the frequency resolution is within one semitone (C = 12), then the major triad formed by major third and fifth (4 and 7 semitones, respectively) has the interval distribution I0,4,7 = 6o + 64 + 67. A spectrum S is said to be simple if all its coefficients are relatively prime and the first coefficient ao $ 0. Obviously, (4) can be written down as follows S=a6 *TiI, (5) where the interval distribution I and tone spectrum T are simple. A simple interval distribution I corresponds to the intervals between the lowest note and other notes of the chord, while its coefficients bm determining their relative loudness. The term a6p * T can be understood as a spectrum of the lowest tone of the chord, having spectral pattern T, loudness a, and pitch p. ICMC Proceedings 1993 287 6B.4

Page  288 ï~~6 Factorization of Chord Further we shall divide a spectrum Spectra In order to investigate the items enumerated in Introduction, we pose the question: Given a chord spectrum S generated according to (5); does there exist a convolution factorization of S into simple factors other than (5)? Lemma 1 (Isomorphism Between Discrete Spectra and Polynomials over Integers) Define the correspondence between discrete spectra with integral coefficients and polynomials over integers by the equality of their coefficients N N S =ZFanon 4~ p(x) = an n. n=O n=O Then this correspondence is one-to-one, the sum of two spectra corresponds to the sum of the associated polynomials, and the convolution of two spectra corresponds to the product of the associated polynomials. The proof of this proposition as well as proofs of subsequent ones can be found in (Tanguiane 1993). By analogy with polynomials, a spectrum S is said to be irreducible if it cannot be factored into a convolution product of two spectra, each other than a constant coefficient. Lemma 2 (Sufficient Condition for Irreducible Spectra) Consider a simple spectrum S whose support As contains two points at least. Suppose that the distance between the last (first) partials in S is less than the distance between any other pair of partials. Then S is irreducible. Lemma 3 (Irreducibility of Harmonic Spectra) Let S be a simple harmonic spectrum or a simple segment of a harmonic spectrum. If the spectral resolution is sufficiently accurate then S is irreducible. Lemma 4 (Irreducibility of Intervals and Triads) Simple interval distributions, corresponding to two-tone intervals and major or minor triads, are irreducible. Note that the assumption of non-negativity of spectral coefficients is essential. Otherwise, even the interval of major triad is not irreducible. Example 1 (Reducibility of Major Triad) Let the spectral resolution be one semitone (C = 12), whence the major triad corresponds to 4 frequency bands. By virtue of Lemma 1, the reducibility of the major triad follows from the factorization 4 + z4 = (2 +2x +x2)(2 - 2x+ 2). N S=E nw n=1 whose partial (impulses) tones have frequencies Wcl<W2 <... <WN, into lower and higher parts, which'are said to be head and tail, as follows: N N-Q N S = Ean6b2 = 3]an &, + 1 n=1 n=1 n=N-Q+I Under these conventions the spectrum's higher part with Q partials N Sq = -- an.. n=N-Q+1 is said to be the Q-tail of S. We say that two spectra S and T have congruent Q-tails if SQ = a TQ for certain a. This will be denoted SQ ~ TQ. Lemma 5 (Unique Factorization for Spectral Tails) Let T, I, U, and J be four spectra, each having more than one impulse, such that T*I=U*J. Let d be the distance between the last two impulses of T and f be the distance between the last two impulses of I. Suppose that d < f. Define the tail TQ by the given bandwidth f, i.e. determine Q so that N Tq= n=N-Q+l ang bw-= z a (6) and denote the distance between the last and the Qth to last partial of T by g = WN -wN-Q <f. Then either Uq TQ and the distance between the last two partials of J is greater than or equal to g, or JQ TQ and the distance between the last two partials of U is greater than or equal to g. Lemma 6 (Uniqueness of Interval Decomposition) Consider an interval distribution I with two 6B.4 288 ICMC Proceedings 1993

Page  289 ï~~impulses, and let the distance f between its impulses be smaller than or equal to 12 semitones, i.e. I = alb,,,+ a26,,,2 (W1 <w2); f = w2 - w1 < 12 semitones. Let T be a harmonic spectrum with N successive partials, where N is odd and sufficiently large so that the distance between the last three harmonics of T is less than f, i.e. N T = Zbn~wt (wl<...<WN); n=1 f > WN -N-2. Consider spectrum S = T* I. Then its deconvolution into non-trivial factors (i. e. having at least two impulses each) is unique to within order and units. Note that this lemma is valid not only for harmonic tones with all successive partials, but, by virtue of Lemma 3, for segments of harmonic spectra which contain three higher harmonics. Theorem 1 (Uniqueness of Interval Decomposition) Consider a two-impulse interval distribution I and let the distance f between its impulses be smaller than or equal to 12 semitones, i.e. I = al5W, + a2bb2 (w1 < W2); f = w2 - w1 < 12 semitones. Let T be a harmonic spectrum with N successive partials, where N sufficiently large so that the distance between the last four harmonics of T is less than f, i.e. N T = Z bn&6 (w1<...<wN); n=1 f > WN -WN-2 -Consider spectrum S = T* I. Then its deconvolution into non-trivial factors (i.e. having at least two impulses each) is unique to within order and units. Lemma 7 (Uniqueness of Chord Decomposition) Consider an interval distribution I with three-impulses. Let the distance between its extreme impulses be smaller than or equal to the octave and let the distances between its adjacent impulses be different, i.e. 3 I = Z~ w<w2<w) i=1 f = to- w1 < 12 semit ones, W2-W 70W3-W2. Let T be a harmonic spectrum with N successive partials, where N is not divisible by 2 and 3, and is sufficiently large so that the distance between the last four harmonics of T is less than f, i. e. N T = Ebn6- (to w<...<wwo); 1n=1 f > WN -W N-3 -Consider chord spectrum S = T* I. Then its deconvolution into non-trivial factors (i.e. having at least two impulses each) is unique to within order and units. Theorem 2 (Uniqueness of Chord Decomposition) Consider a three-impulse interval distribution I. Let the distance between its extreme impulses be smaller than or equal to the octave and let the distances between its adjacent impulses be different, i.e. 3 = Z W{ (wl<W2<W3); i=1 f = w3- w1 < 12 semitones, W2- 1 # W3- 2. Let T be a harmonic spectrum with N successive partials, where N is sufficiently large so that the distance between the last seven harmonics of T is less than f, i.e. N T = Z:bnb6,w (w1<...<wN); n=1 f > W N-W N-3. Consider chord spectrum S = T* I. Then its deconvolution into non-trivial factors (i.e. having at least two impulses each) is unique to within order and units. Note that Theorem 2 is valid also for segments of harmonic spectra with seven successive harmonics. Thus spectra of two-tone intervals and major or minor triads are decomposable in the unique way. Consequently, the only deconvolution of a chord spectrum, which is built from harmonic tones with a sufficient number of harmonics, reveals its generation. Theorem 2 doesn't state the unique decomposition of chords if the number of partials of generative tones is small. This however can be done by directly testing each particular case from a finite number of cases. To finish this section, we shall show that the harmonicity of tones is an important condition of the Unique Deconvolution Theorem. For arbitrary power spectra the unique deconvolution doesn't hold. By virtue of Lemma 1 this is seen from the following example proposed by Alain Chateauneuf in personal communication. ICMC Proceedings 1993 289 68.4

Page  290 ï~~Example 2 (No Unique Factorization of Polynomials over Positive Integers) Consider the following polynomials: p(z) = x2+2x+2 q(:) = x2-2x+2 r(x) = z(z2+2z+2)+1 (2 + z+ l)(z + 1), where polynomials p(x), q(:), x2 +: + 1, and x + 1 are irreducible over integers and, consequently, over non-negative integers as well. By virtue of Lemmas 1 and 4, polynomial p()q()= -- x4 +4 is irreducible over non-negative integers. Therefore, polynomial with non-negative integer coefficients p(:)q(:)r(:) can be factored into irreducible polynomials over non-negative integers as follows p~x~~x~~x)= [p(x)q()]r(x) = (4 + 4) (X2 + x + 1)(x + 1). (7) At the same time, polynomial p(x)q(x)r(x) can be factored into polynomials over non-negative integers in a different way: p(x)q(x)r(x) -- p(x)[q(x)r(x)] = (:2 + 2x + 2)[:(:4 +4) + x2 _ 2x + 2] =(2+2x+2)(x5 +2 + 2z+2), (8) where the first factor, x2 + 2: + 2, is irreducible, being different from all irreducible factors of factorization (7). Hence, there exist two different factorizations of polynomial p(x)q(x)r(x) into irreducible polynomials over non-negative integers, one given by (7) and another given by (8) with a further factorization of the second term x5 + x2 + 2 -+ 2, if such a further factorization exists. 7 Causality and Optimal Data Representation One can ask a question: Why do we perceive chords as chords but not as single sounds? From the standpoint of our consideration we can reformulate this question as follows: What are the reasons in favor of decomposing spectra instead of considering them as they are? In order to compare different representations we refer to the criterion of least complex data representation. We shall show that the representation of a chord spectrum in a form of deconvolution is the optimal representation of the chord spectrum with regard to the amount of memory needed for the storage of the spectral data. This way we justify such a representation of spectral data and adduce reasons in favor of perceiving chords as chords but not as indivisible sounds. Recall that according to Kolmogorov, the complexity of data is defined to be the amount of memory required for their storage. Since the spectra considered can be stored as a sequence of impulses, the complexity of a spectral representation can be identified with the number of impulses to be stored. By complexity of a spectrum S we understand the number of points in its support As. The complexity of S is denoted by lAst. By complexity of a deconvolution S = T* I we understand the total complexity of the factors which is equal to tATI- + AI. Theorem 3 (Revealing Causality by Optimal Data Representation) Suppose that a spectrum S is generated by a spectrum T translated according to an interval distribution I, where T is a harmonic spectrum or its segment with seven or more partials, and I corresponds to a two-tone interval, major triad, or minor triad. If the frequency resolution is sufficiently accurate then the spectrum representation corresponding to the spectrum generation (5) is the least complex representation of S. 8 Applications to Psychoacoustics and Music Theory Some statements of music theory can be explained as the prescriptions to provide comprehensible hearing of musical structure. In our model this corresponds to simplifying music recognition. For example, we can explain the prohibition of parallel fifths in the counterpoint. Parallel fifths imply parallel motion of the partials associated with the voices. This makes the separation of voices more difficult. The resulting psychoacoustic effect is "timbral" rather than harmonical. This can break the homogeneity of musical texture, and therefore parallel fifths are avoided (or used intentionally). The timbre effect of parallel leading of parts is used in pipe organs where several pipes tuned in a chord are turned on by a single key. The model explains the nature of interval hearing. The interval hearing can be understood as the capability to recognize the distance between the tones which are similar in spectral structure. In other words, the interval hearing is nothing but correlative perception in the frequency domain. The emphasized condition is rather a generalization than a restriction. Indeed, all musical tones, having the same ratio of harmonics, meet this condition. On the other hand, the idea of distance is applicable to similar sounds with no pitch salience, as bell-like sounds, or even band-pass noises. Therefore, we eliminate the idea of absolute pitch from the definition of interval hearing. 6B.4 290 ICMC Proceedings 1993

Page  291 ï~~According to our model, the function of interval hearing is decomposing acoustical streams and tracking parallel acoustical processes. This is extremely important for the orientation in the acoustical environment. Note the role of logarithmic scale in our consideration. Owing to the use of logarithm, patterns with a linear structure (as tone spectra with multiplication of partial frequencies) are non-linearly compressed, becoming irreducible. Therefore, the role of logarithmic scales in perception can be explained as providing the indecomposability of patterns. In particular, as follows from Lemma 3, harmonic spectra are irreducible which meets the perception of a musical tone as an entirety. On the other hand, Theorem 3 explains the perception of chords as composed sounds. It is noteworthy that the optimality in the representation corresponds to the causality of data generation. This way we obtain a way of getting semantical knowledge based on general principles of data processing. Note that if we supposed the model sensitivity to the phase of the signal, we couldn't prove the irreducibility of harmonic spectra and interval distributions of chords. Indeed, if we considered spectra with complex coefficients, by the fundamental theorem of algebra the associated polynomials would be always factored into linear terms, implying reducibility of all spectra. Similar difficulties arise even for spectra with negative integral coefficients (see the example after lemma 4 where negative coefficients, corresponding to the pase 1800, are considered). This may explain the insensitivity of audio perception to the phase of the signal: otherwise, the signal decomposition would not correspond to physical causality in the signal generation. Finally, we would like to mention that the factorization method is efficient for justifying the principle of correlativity of perception theoretically, but may fail in applications. The first reason is the difference of spectral and polynomial approximations. In a sense, polynomial approximations are stable with respect to deviations of coefficients but they are not stable with respect to changes of the degree of the polynomial. In spectra, partials can deviate from theoretical frequencies, implying changes of the degree of the associated polynomials. However, these spectra are usually considered as approximating theoretical spectra, whereas the associated polynomials may have quite different factorizations. Another practical disadvantage of the factorization approach is that factoring real spectra requires much computing when the associated polynomials are of high degree. Therefore, our method of spectrum decomposition by correlation analysis can be more stable and reliable in applications. On the other hand, by virtue of the isomorphism between spectra and polynomials, the correlation method may be used to factorize polynomials. 9 Conclusions We have suggested an "artificial perception" approach to data processing. It is proposed: 1. Representing data in terms of generative elements and their transformations. 2. Finding generative elements by discovering the messages correlated under deformations of the data. 3. Performing a directional search for the appropriate transformations of the data by the method of variable resolution. 4. Overcoming the ambiguity in the data representation, applying the criterion of the least complexity of data representation. 5. Justifying the approach by a series of mathematical statements showing the efficiency of chord spectrum representations used. 6. This approach has been tested in chord recognition and has been applied to explaining some perception phenomena. In particular, we have obtained 98% reliability in chord recognition. Besides, we justify the logarithmic scaling in pitch perception, the insensitivity of the ear to the phase of the signal, and the prohibition of parallel voice leading in strict counterpoint. References [Chafe and Jaffe, 1986] Chafe C. and Jaffe D. Source Separation and Note Identification in Polyphonic Music. Proc. of the IEEEIECEJ'ASJ Int. Conf. on Acoustics, Speech, and Signal Processing, Tokyo, 1986, vol. 2, 1289-1292. [Mont-Reynaud and Mellinger, 1989] MontReynaud B. and Mellinger D. Source Separation by Frequency Co-Modulation. Proc. of the First Int. Conf. on Music Perception and Cognition, Kyoto, Japan, 17-19 October, 1989, 99-102. [Tanguiane, 1993] Tanguiane A.S. Artificial Perception and Music Recognition. Berlin: Springer-Verlag, 1993. ICMC Proceedings 1993 291 6B.4