EXTRACTING THE MELODY OF AN IMAGE USING MATE-SIMILARITY GENETIC ALGORITHMSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact firstname.lastname@example.org to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 00000001 EXTRACTING THE MELODY OF AN IMAGE USING MATE-SIMILARITY GENETIC ALGORITHM Abbas Pirnia Shahid Beheshti University Electrical and Computer Engineering Department ABSTRACT This paper is an application of genetic algorithms to the problem of composing music. We introduce a new type of GA and name that Mate-Similarity GA in which pairing of individuals is based on a Probabilistic function. Then we describe a system named IMUGEN which uses this type of GA to create melody. The IMUGEN system takes an image as its input and tries to produce, extend and concatenate musical pitches extracted from the image. Finally, we report some results of the IMUGEN outputs and discuss the influence of modifying of adjustable parameters on outputs. 1. INTRODUCTION Algorithmic composition could be described as "a sequence (set) of rules (instructions, operations) for solving (accomplishing) a [particular] problem (task) [in a finite number of steps] of combining musical parts (things, elements) into a whole (composition)", (Cope, 1993)1. Heuristic principles for automated music composition is well combined with computer music [5, 4] but the use of Genetic Algorithms in this task is less well established . Genetic algorithms (GAs) have proven to be very efficient search methods [3, 2], especially when dealing with problems with very large search spaces. This, coupled with their ability to provide multiple solutions, which is often what is needed in creative domains, makes them a good candidate for a search engine in a musical application . Papadopoulos and Wiggins divided the GA-based approaches into two categories based on the implementation of the fitness function : (1) Use of an objective fitness function; In this case the chromosomes are evaluated based on formally stated, computable functions. (2) Use of a human as a fitness function; In this case a human replaces the fitness function in order to evaluate the chromosomes. Concisely, one can say GAs are generally used for evolving population by finding the best solutions from the whole solution population and selecting them as parents. In this research, however, we use GAs for Panel Discussion in the ICMC '93, concatenation of different definitions of the two words. Ramak Ghavamizadeh Shahid Beheshti University Electrical and Computer Engineering Department extension of some primary individuals to reach new individuals. To do so, the crossover operation is used traditionally, but fitness function gets two individuals as its input and returns the second one's fitness value from the first one's point of view, instead of getting one individual and returning its deterministic fitness value. Furthermore the function is designed as a Probabilistic function and individual's pairing is done in a nondeterministic manner. We call such a GA (which uses Probabilistic fitness function) Mate-Similarity Genetic Algorithm and describe that in section2. After introducing Mate-Similarity genetic algorithm, the IMUGEN system (short for IMage-MUsicGENetic-GENarator), an evolutionary system which tries to create melodic pieces, using this approach, will be described in Section 3. The IMUGEN system uses a mapping method for initializing the population. It maps a visual parameter of an image (intensity) to a musical parameter (pitch). Then, duration values will be added to the extracted pitches using a transition table. The extracted motifs (note sequences) are used for initializing the population of the IMUGEN's GA. The evolution of the individuals results in the extension of the motifs. In the last step, concatenation of the motifs will lead to a melody, which is the output of the IMUGEN system. Experimental results with the IMUGEN system are described in Section 4. We draw some conclusions in Section 5. 2. MATE-SIMILARITY GA Generally the final target of GAs is to find solutions which maximize/minimize the fitness function. It is understood that: a) the problem has one or more good solutions and the other solutions are useless. b) the fitness function is designed in a way that returns an extremum value for the best solution(s). The first item implies that the individuals of population have different values and the second implies that these values are determined by fitness function. One may say that this is an absolutist point of view. In order to be more relativist about the value, we may
Page 00000002 not assign absolute values to individuals. Instead, we may prefer to assign a value to every particular feature of each individual. 2.1. Mate-Similarity GA presentation There are two significant aims in order to which, the Mate-Similarity GA has been designed: 1- considering some musical motif, we want to extend them and reach new motifs 2- concatenating these motifs in a suitable order to reach a melody Therefore, the algorithm workflow is different from classic non-Mate-Similarity GA in some basics. One of the most significant differences between Mate-Similarity and classic GAs is the remaining of the parents after their combination. Therefore, the number of population increases in each generation. If all of the individuals participate in breeding and no one remains single, while each pair of them breed two new individuals then the number of new population would be twice of the number of the first population. This means that the number of generation would increase exponentially by progress of generations. There is another difference between MateSimilarity and classic GAs regarding genetic operators. In this work, probabilistic functions are used for both crossover and concatenation. In one step the functions are used to find pairs of individuals to crossover and breed new individuals, while in another step they are used to find pairs of individuals to concatenate and make a longer chromosome. The genes are seen as notes (pitch-duration couples) and the chromosomes as motifs (sequences of notes). Thus, it is important to note that IMUGEN is not trying to evolve a perfect solo on a specific tune; it tries to evolve a workable collection of note sequences which can be applied to any tune. Regarding this aspect, IMUGEN is like GenJam that is a GA based system to evolve collections of melodic ideas . 2.2. Probabilities in Crossover As mentioned before, we do not use fitness function to assess a chromosome, but to determine the pairs of chromosomes to be recombinated. To do so, the fitness function is defined as a probabilistic function. This function gets two chromosomes as its arguments and returns the tendency of the first chromosome to pair with the second one as a probabilistic value. The tendency of ci to c. is notated by: P(Ci, c). In a multidimensional space that each axis stands for one of the chromosome's features, we need to define a concept to control combination of the chromosomes which we refer to as neighbourhood degree. Because our problem has discrete parameters, the neighbourhood degree is defined in a discrete space, although it can be applied in almost similar way to continuous domains as well. Consider two chromosomes c(d, d2,..., d) and c'(d',d,...,d',) in a K dimensional space. We call c' an n th neighbour of c if: K ND(c, c')= ND(c', c)=-d- - dJi = n i=1 (1) Each chromosome in each generation may do crossover or do nothing. This means that every chromosome may remain single and not change in the next generation. We express the tendency of chromosome ci to remain single as: p(c,,s). Then s is the identity operand forp. The implementation of this probabilistic function must be in a way that the tendency (of the first chromosome to pair with the second one) reduces by increase in the degree of neighbourhood (and vice versa). Furthermore the sum of probabilities of all possible actions (pairing or remaining single) of a particular chromosome must be equal to one. Finally, we define the probability of self-pairing (i.e. a chromosome combines with itself) to be equal to zero. p(c, c,) = 0, i 1 (2) p(ci,s) + p(c,,ci ) =, i j>l (3) We consider the set of chromosomes with neighbourhood degree of / from chromosome c as: N,(c)= ci ND(c,c,)= 1) (4) and a, (c) = number of members in N, (c). Then the sum total probability of combination of chromosome c with neighbours of /Ith degree will be: PN,(c)= - p(c, c), N(c) 0 (5) VciENI(c) PN, (c)= 2', 1 1 (6) Having (5) and (6), we may come to the important equation: 1 p(c, ci) = * (c) 2 a, (c) In the same manner:,ND(c, c,) =1 (7) p(cs)= + 1 2mnd(c i(c 2 ViES(c) (8) where: mnd(c) = Maximum neighbourhood degree of C and S(c)= i N,(c)=0}
Page 00000003 Having all tendency values (p(ci,c ) for each i and j) the process of selecting pairs (parents) can be started. This task is done in an arbitrary order. All of the individuals are sorted in a list based on their maximum tendency (pairing list). The process will be started with the first individual in the list (that has the maximum tendency). A random generator selects a pair (or nothing) for the individual based on its probabilistic tendencies. Then the individual and its pair (if exists) will be deleted from the list. These steps are repeated until the list becomes empty. For the uninitiated, a Mate-Similarity GA looks like: Initialize the individuals in the population While not finished evolving the population Calculate all tendency values Select pairs of individuals (parents) Breed new individuals Build next generation with all individuals elihW 3. THE APPLICATION IN MELODY CREATION Although the GA with multidimensional workspace and probabilistic fitness function is a general approach and has no limitation in use, our primary goal is to achieve a promising approach for creating musical melodies of acceptable quality. 3.1. Design of IMUGEN The IMUGEN system is developed based on a three-tier architecture. The output of each layer would be the input of the next one. The first layer's input is an image and provided by the user. -| a bitmap image as the input - _- The first layer maps images to, raw motifs using a simple mapping. SThe raw motifs are the first S....................................................................r's ou tp u t an d th e seco n d - [layer's input. [The second layer extends motifs using probabilistic GA. l The extended motifs are the 4e3 second layer's output and the ^ third layer's input. SThe third layer concatenates " motifs using multidimensional Sworkspace. \ The melody (concatenated Smotifs) is the third layer output. MIDI interface is designed to convert every layer's output to a MIDI file. The third layer's output is a melody and given to the user as the final output. Figure 1 shows IMUGEN system architecture and provides a visual overview of its operations. 3.2. The first layer: Mapper The first step in GAs is to initialize first population. In this work, initialization of the population is performed by extracting primary motifs from an image using a simple mapping. Our proposed mapping transfers the average intensity of pixels to the pitches of a diatonic scale. This task is performed in two steps, each by a function. The first step is to calculate the average intensity of a piece of the image. This is done by the function cony that takes a piece of the image (as a matrix of pixels) as its argument and returns the average intensity of its pixels (as a value between 0 and 255). Let P be the matrix of pixels with width of a and height of b; every pixel in P has three parameters R, G, B for Red, Green and Blue colors; then the average intensity of P will be: SR (ij + Gi,j + Bij ) Conv(P) = ia j<b 3*a*b The next step is to map the average intensity to a pitch. This is done by the function map that takes the average intensity (as a value between 0 and 255) and returns a pitch (as a value in the four octave diatonic scale range). Map(x) = trunc[x / 9] +1 The layer has a rectangular frame of an arbitrary size. The frame is located on the image and the portion of the image that is inside the frame is the input of Conv. To extract a sequence of the pitches from the image, the frame is moved on the image and after each movement mapping is repeated and a new pitch is extracted. The mapping step leads to a sequence of pitches. 3.3. The second layer: Extender Extender is implemented with Mate-Similarity GA. Therefore we need to introduce some musical features for motifs based on which individuals can be placed in a multidimensional space. The number of dimensions of space is equal to the number of features. We propose 7 melodic features regarding merely pitch specifications. These features are illuminated by an example. Let's consider a simple sequence pitch such as [C3-E3-C3 -G2] in C Major. Its corresponding features are as follows: (1) The first pitch of sequence: C3 (2) The last pitch of sequence: G2 Figure 1. IMUGEN architecture
Page 00000004 (3) The center pitch of the sequence (i.e. the nearest pitch to the pitch at the middle of the range between the first and the last note): A2 (4) The most frequent pitch of sequence: C3 (5) The length of sequence: 4 (6) The height of sequence (i.e. number of semitones between the highest and the lowest pitch of sequence): semitones between E3 and G2 = 7 (7) Movement of sequence (i.e. a signed number that counts the semitones between the first pitch and the last one, if the last pitch is higher than the first then the sign is positive otherwise the sign is negative): -5 The layer takes the Mapper's pitch sequence as its input. Then the pitches are placed on chromosomes sequentially. 3.4. The third layer: Linker Linker is implemented with Mate-Similarity GA. The layer takes the motifs produced by Extender as its input. The length of chromosomes in the layer is variable. The only genetic operator is used in this step is concatenation that takes two individuals as input and returns one new individual as output. The average length of chromosomes will increase by generation progression. Continuing the generation loop will result in longer pitch sequences which are made from motifs with similar features. The length of the longest chromosome can be used as the criterion for exiting generation loop. 4. RESULTS AND DISCUSSION Our experiments showed that IMUGEN's output is better for images with no obvious high contrast borders. In fact, such borders cause undesirable jumps in output pitches. We profit two parameters to solve the problem: 1- The size of the Mapper's frame; if the size of the frame is so large to cover some regions from both sides of border then the average will decrease pitch jumps. 2- The amount of overlap of two sequential states of the frame; increasing in the amount of overlap of two sequential states of the frame will bring near pitches acquired from these two states. Thus, the pitch jumps will decrease. We studied the final pitch sequence of IMUGEN (the third layer output) throughout comparing the sequence with three other sequences: 1- Sequence 1: pitches of a melodic piece from the beginning of Mozart's symphony-40 2- Sequence 2: a random generated pitch sequence 3- Sequence 3: a sequence of pitches was extracted from an image (the first layer output) As we understood by comparing the results, there were some regularity in Sequence 1 and irregularity in Sequence 2. The sequential pitches are nearby in Sequence 1, while Sequence 2 oscillates significantly. In Sequence 3, we can see regularity similar to Sequence 1.. The design of Mapper in the first layer is so simple, while this is not the only way to map visual parameters to musical parameters. Probably such mappings could produce motifs with factors such as musical tension, intension, expectation and melodic closure , if the image has any aesthetic such as symmetry. The major weak point of IMUGEN is that it doesn't gain pitches and durations correspondingly. Probably if duration values also extracted from the image somehow, there will be some relations between pitch and duration. Another way may be adding pitch dimension to duration transition table. However, this approach may also be not a promising one, because the transition table is a statistical description of the training set and, in most cases, it will lose information about the training set . 5. REFERENCES  Biles, J. A. "GenJam: A genetic algorithm for generating jazz solos", Proceedings of the International Computer Music Conference, San Francisco, 1994.  Goldberg, D. Genetic Algorithms in Search, Optimization and Machine Learning. AddisonWesley, 1989.  Holland, J. H. Adaptation in Natural and Artificial Systems. University of Mitchigan Press, 1975.  Laske, 0. E. AI and Music. A Cornerstone of Cognitive Musicology. In (M. Balaban, K. Ebcioglu, 0. Laske eds.) Understanding Music with Al Menlo Park, CA: The AAAI Press, pp. 3-29, 1992.  Moorer, J. A. Music and Computer Composition, Reprinted in Schwanauer, 1972.  Mozer, M. C. "Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing", Connection Science, 6 (2-3): pp. 247-280, 1994.  Papadopoulos, G. & Wiggins, G. "Al Methods for Algorithmic Composition: A Survey, a Critical View and Future Prospects", Proceedings of the AISB '99 Symposium on Musical Creativity, 1999.  Towsey, M., A. R. Brown, S. Wright and J. Diederich. "Towards Melodic Extension Using Genetic Algorithms", proceedings of The Australasian Computer Music Conference, A. R. Brown and R. Wilding, eds. Brisbane: Australasian Computer Music Association, pp. 85-91, 2000.