Optical Music Recognition - progress reportSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact email@example.com to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Page 293 ï~~Optical Music Recognition - progress report. Amnon Wolman and Tad Yeager Center for Music Technology, School of Music, Northwestern University firstname.lastname@example.org or email@example.com, firstname.lastname@example.org Abstract The project's goal is to recognize a scanned sheet of handwritten music in an interactive fashion, such that the computer could "learn" a person's handwriting and, in time reduce the amount of necessary humancomputer interaction and correction. In the last two years we moved the program to a new platform (C++) and switched to working in gray-scale. After successfully recognizing most standard music elements in their standard printed form we are ready to negotiate the more quirky water of handwriting and adjusting (i.e. learning). 1. General Background In this project our main interest was in creating a program which would help a user print her own music. The basic notion is that the user would scan the manuscript and the program would attempt deciphering the file, with the help of the user. We have made several assumptions in the development of the project, most importantly that the program will not be able to work without a user helping it along. We hope as more work is done that more will be automated but we assume that at no point will it be totally mechanical. In this presentation we will discuss one issue we have been tackling this year. Attempting to connect incomplete objects. 1.1 Currently in place At present the program is still a group of distinct programs, one which reads the scanned file, present it on the screen, and straightens the page. The last element--the "straightening of the page"--means that the program can identify the staff lines, and use them as an indication of what is a straight line. This is also one of our other assumptions, we insist that the user will write on printed manuscript paper i.e. that the staves are NOT handwritten. Another program takes the output of the first program, removes the staves and matches some of the standard music symbols (clefs, notes) and identifies them. We use histograms to identify the objects by comparing them to pre-existing histograms within the program. The program does two histograms: a horizontal one and a vertical one. Matching information from both histograms gives us enough information about a manuscripted object. It is important to note here that, at present, we indicate to the program what is a single object. This is done by dragging a box around it, and the histograms are created within this box. We are in the process of automatically finding objects using a tracing technique. This is done by following along the object and changing the color of the outlining bits But dealing with handwriting forced us to solve the problem of incomplete objects before allowing for tracing. 1.2 Gray scale We started working on the project three years ago writing the code in NeXTStep on the NeXT machine, and looking at scanned black and white images. At the beginning of this year we decided, for obvious reasons, to port the existing software to a more general UNIX platform, and rewrite the code in C++. During that time we decided to start using gray scale images rather than black and white. This was done based on the assumption that we will learn more about the personalized habits of users. Studies indicate that looking at a gray scale image could show the general direction of the hand motion. i.e. this becomes extremely useful when looking at incomplete objects. It also, we assume, will make the learning process of the computer more significant. For example a user may have a tendency to start elements from top down, making the upper portion of the line heavier then the lower portion, i.e. the upper pixels will be darker ICMC Proceedings 1994 293 Music Notation
Page 294 ï~~then the lower ones, indicating a clear direction of motion. If she tends to use her pen at an angle and assuming for the sake of the example it is tilted to the right, the line would be heavier on the right, i.e. the right pixels will be darker then the left ones. But when drawing a full circle this would be different and one horizontal portion would be lighter then the other, giving us additional information about making a distinction between the two objects. 2. The user interface. At present the user interface allows the user to look at the file from different perspectives and zoom in etc. By pushing buttons it could request the program to remove the staff etc. The user may select an object by dragging around it. 3. Incomplete objects Handwriting is uneven in darkness and items which should be considered as single items may be separated into two and more. For example most writers are not careful about making sure that stems are connected to noteheads, thus a quarter note will show up as two distinct elements, and a four sixteenth note group may show up in as many as nine different elements. The use of a tracing algorithm makes this one object appear to be several separate objects. The challenge now is filling in the spaces between elements of broken objects without connecting distinctly different objects. Here is where interaction and learning play an important role. Proximity, although a helpful indicator, is not necessarily the best criteria; some objects that are extremely close to each other should not be connected, and conversely, with some handwriting, some objects can be disconnected by a great number of pixels. Our solution is to design a series of filters which "look" around a found object to see if there are any objects around it which may be possibly connected to it. These filters are preset to a specific relation to the size of the staff. The assumption is that users write smaller if the staff is smaller and the distances between elements of one item are Smaller. Once the program "thinks" there is a possible connection to be made, the user interface checks with the user. Mter several positive responses the program adjusts the filter to that amount, (min. and max.), and stops checking with the user. The interface will allow the user to correct the program if it made an error, thus revising (if necessary) the filters yet once again, or possibly reopening the dialogue with the user. 3.1. Identifying parts of an incomplete object. At present we have not decided how to deal with this issue, so we have the user draw a box around an object. There are several alternatives; the most logical one is for the program to assume that an object is complete try to match it, and if it can not match it will assume it is incomplete and start looking around it for possible missing elements. With handwriting this is not at all obvious, since no match may mean just a slightly personalized was of writing a specific object. It seems that the program will have to ask the user if the object is complete or not, after not successfully matching it. 3 2. Connecting the parts an incomplete object. of It can not be assumed that the shortest distance between parts of an object means that they should be connected Figure one shows an incomplete object a dotted quarter note. Looking just at proximity, the program will connect the dot to the notehead but not the stem to the notehead. (figure two). FIGURE 1. A dotted quarter note. FIGURE 2a A dotted quarter note mistakenly connected Music Notation 294 ICMC Proceedings 1994
Page 295 ï~~G(j,k) = 'Gr(j,k)]2 + [Gc(j,k)]2} with Gr(j,k) = 1/4[(A2 + 2*A3 + A4) - (AO + 2*A4 + A6)] Gc(j,k) = 1/4[(AO + 2*AI + A2) - (A6 + 2*A5 + A4)]. FIGURE 2b A dotted quarter note mistakenly connected. To solve this problem we implemented a gradient based contour encoding for gray scale recognition as used in OCR. (Srlkantan, Lam, Srihari). We are trying to take advantage of the shading in a gray scale format. Using this shading, broken or noisy objects are connected using an edge detection algorithm. This method of edge detection is a more graceful way of connecting objects than a brute force tracing method in which an arbitrary pixel value is used to determine whether or not an object is incomplete. Our method allows us to concentrate on local changes in shading. With some handwriting, the pixel value is not that much greater than the background,especially with off white paper or an imperfect scan of the image, i.e. with uneven lumination, the paper at the top of the page could be darker than the handwriting at the bottom or vice versa. Using an edge detection algorithm that takes advantage of the grey-scale format allows us to put together incomplete objects as if putting together a jigsaw puzzle without all of the pieces. Taking the gradient--magnitude and direction--of the image can be used to find the edges of a broken object. By knowing the darkness and the direction of a handwritten line, circle, slash etc., we can drake an accurate guess as to whether or not a proximally located fragment is part of another object or a unique object. Also, this also opens the door for artifical intelligence techniques for "learning" about handwriting, if shading tendencies are recorded. We use the Sobel operator for finding the gradient of the object at each pixel (Pratt) given by The following 3 X 3 matrix represents the local values surrounding the pixel: AO Al A2 A7 F(j,k) A3 A6 A5 A4 The average gradient magnitude is then used to threshold out noise. Once the gradient of the entire image has been calculated at each pixel and the noise is thresholded out, a feature extraction algorithm is applied to find the the edges around the object the user selected. The. first step in the feature extraction algorthim is partitioning the gradient map. This involves identifying the object space, then subdividing the object itself. After dividing the object's components, quantization of the parts by angle allows us to see the object as a group of shapes formed from the shading. This quantization of the gradient directions into a twelve ranges of thirty degrees each, yields a twelve type class of features. Finally, using feature sets, the object can be identified, depending on the quantity of each type of feature. in figure three we see how the computer connected the incomplete object. ICMC Proceedings 1994 295 Music Notation
Page 296 ï~~6. Bibliography. FIGURE 3. Object connected correctly. 4. Plans for the future Our immediate plan is to start coding the learning part of the program. This is the part which keeps multiple copies of information regarding symbols specific to a user so that it can compare them. In this part of the program, an object after it is recognized and labeled is stored in a data base specific to the user. Multiple copies of all objects are kept to create a better comparison. Every time a user runs the program she increases its knowledge about her handwriting. William K. Pratt. _DigitalImage_ Processing.., 2nd ED. Wiley-Interscience publication. (pp 490-505). G. Srikantan, S.W. Lam, S.N. Srihari. "Gradient-based contour encoding for grayscale character recognition," in _CharacterRecognition_ _Technologies., Donald P. D'Amato, Editor, Proc. SPIE 1906, (pp. 2-9). A. Wolman, James Choi, Shahab Asgharzadeh, and Jason Kahana. "Recognition of Handwritten Music Notation." in the Proceedings of the International Computer Music Conference 1992 San Jose State University, San Jose CA. (pp 125-127). 5. Acknowledgments. This research is supported in part with grants from the Ameritech Foundation, Northwestern University URGC Grants, and the School of Music at Northwestern University. We would like to acknowledge James Choi for his continuing support of the project. Music Notation 296 ICMC Proceedings 1994