Page  354 ï~~Handwritten Music-Manuscript Recognition Kia Ng, David Cooper* and Roger Boyle School of Computer Studies, University of Leeds, Leeds LS2 9JT, UK. * Department of Music, University of Leeds, Leeds LS2 9JT, UK. Email: kia@scs.leeds.ac.uk, d.g.cooper@leeds.ac.uk, roger@scs.leeds.ac. uk URL: http://www.scs.leeds.ac. uk/kia/, http://www.leeds.ac. uk/music/, http://www.scs.leeds.ac. uk! Abstract Over the last few years, we have been developing a prototype for printed-music score recognition. In this paper, we discuss some of the main obstacles in optical handwritten manuscript recognition, and present our initial attempts toward music-manuscript recognition based on the existing prototype. Problems and interesting partial results, using the existing prototype for printed music with handwritten manuscripts are illustrated and discussed. Plausible extensions and enhancements required for our printed optical music recognition prototype system to handle handwritten manuscripts are suggested and demonstrated. Ambiguities and omissions are common in handwritten document analysis: to enhance the recognition results, we are working on a number of higher-level processes applying musical domain knowledge and conventions. 1 Introduction The relationship between handwritten musicmanuscripts recognition and printed Optical Music Recognition (OMR) is analogous to that between handwritten scripts recognition and printed Optical Character Recognition (OCR). Music manuscripts present a greater degree of inconsistency, ambiguity and uncertainty than do their printed counterparts. Our existing prototype system, which is designed for printed music scores, can resolve neatly-written manuscripts and actual manuscripts to a certain extent. Pre-processing implementation for automatic thresholding, de-skewing and stave-line detection (see [Ng and Boyle, 1992] and [Ng and Boyle, 1996a]), is independent of the musical symbols, and applicable, without modification, to hand-written music scores, assuming that the music symbols are generally written on pre-printed manuscript paper, with five-line staves [Clarke et al., 1993]. However, early manuscripts deserved special attention. Beside common problems such as uneven thickness and discontinuity of stave lines (as in printed OMR), early hand-written manuscripts, which use stave lines that are handdrawn with a special pen called a rastrum, are often bowed, and may also be irregularly curved. With hand-drawn stave lines, precautions should be taken when finding the pitch information of the note-heads. Stave pixels above, below or crossing the note-heads are scrutinised to check whether the note-head is located on a stave line or between two stave lines. Vertical locations alone can provide inaccurate pitch information. 2 Obstacles Some of the main obstacles in optical handwritten manuscripts recognition include the following: " The relative sizes of similar features (e.g. two quavers) are often inconsistent. For example, notice the varying sizes of the flat signs in Figure l(a), and different sizes of quavers in the same bar in Figure 3. Sometimes global features, especiaJly time-signatures, are written in huge sizes across staffs. " The difference in relative size of primitives (e.g. the width of a notehead and its stem) is not sufficiently large to differentiate one from the other. For example, a quaver note-head may only appear as a slight thickening at one end of a 'vertical' line, while in the printed symbol, the difference in width between a stem and a note-head is obviously clear (see Figure 1(b)). (a) (b) Figure 1: Inconsistent relative sizes. (a) Inconsistent sizes of similar features. (b) Unclear relative sizes of different primitives, contrasted with a printed version (right). Ng et al. 354 ICMC Proceedings 1996

Page  355 ï~~" Sometimes, an almost complete interconnection may occur between nearby primitives. For example, two vertically-stacked beams, or a group of horizontally-connected ledger lines and accidentals may occur (see Figure 2). Inter-connection between features that belong to a different staff, are also commonly found. (a) (b) Figure 2: Severe inter-connection. (a) Vertically-stacked beams. (b) Horizontallyconnected ledger lines and accidentals. * The reverse of inter-connecting features, unconnected features, contribute other ambiguities. Frequently found examples include: isolated note-heads which are detached from their stems; and stems that are disconnected from their beams. Problems increase with the separation width, resulting in the possibility of incorrect reassembly of primitives with other nearby separated features. " Different types of pens and writing pressures produce inconsistencies in line thickness. Noise, caused by blots of ink or stains and correction marks are sometimes inevitable. Complications may also be caused by the use of a pen with a wide, flat or oblique nib, notably in Bach's manuscripts. * In contrast to the font problems of printed OMR and OCR, the variations in writing styles (in shapes and sizes), of symbols are endless. Figure 3 illustrates this observation using a one-bar extract which contains a variety of quaver-flag styles. Furthermore, there is considerable disparity in the styles of writing of different composers. Figure 4 presents a printed treble clef in comparison to five handwritten treble clefs from different composers. Figure 3: Variable styles of writing. Figure 4: Treble clefs. (a) A printed reference model. (b) Handwritten samples. " Slant and curve lines are inherent. in handwritten scripts: the vertical orientations of lines such as stems and bar-lines cannot be assumed. Other straight-line features such as beams are likely to be bent, and curved features such as slurs and ties may consist of multiple curves. " Old manuscripts introduce other interesting problems: decolouration of paper may require intelligent local, rather than global, thresholding; and the possible usage of obsolete symbols can be confusing. " Syntactical rules, which are useful in enhancing the recognition for printed OMR systems, may often be neglected [Clarke et al., 1993]. For example, in printed music, there are specific locations for note-heads with respect to their related stems (generally top right and bottom left), but examples of noteheads which are positioned at the top-left of their stems, or horizontally centred with their stems, are common in manuscripts. Inconsistencies and ambiguities in handwritten manuscripts are severe. OCR attempts to correct them using various knowledge-based inputs, notably grammatical syntax, and semantics. Ambiguities within words can be deduced using a dictionary. A sentence may be analogous to a phrase in music, however there are many inherent difficulties in this approach, and there is no clear solution to the detection of a phrase. Ambiguities within a word may be analogous to those within a chord or cadence, however, there is a wide range of possibilities, and generally, a sequence of chords rather than an isolated one must be parsed to assist clarification. 3 Application of the Existing Prototype to Handwritten Manuscripts For printed scores, we use a sub-segmentation process to decompose musical symbols into lowlevel graphical primitives, such as noteheads, stems and dots, before recognition [Ng and Boyle, 1992]. This works well on printed scores and provides a useful intermediate stage where many ICMC Proceedings 1996 355 Ng et al.

Page  356 ï~~neighbouring-primitive classifications can be checked and confirmed with each others. However this relies on the vertical orientation of the symbols and is not robust when dealing with hand-written manuscripts due to typical slanted line segments. Using a number of sample manuscripts as input sources, the existing prototype system designed for printed music scores has yielded interesting and promising partial recognition. Figure 5 illustrates several interesting problems using the existing printed OMR prototype on handwritten data. The reconstructed results are illustrated in their original relative locations. Unclassified features are represented by unfilled rectangles.... - iI Figure 5: Recognition attempts using the prototype for printed OMR. Vertical lines appear relatively wide since they are effectively the bounding-box areas of the 'vertical' primitives (which are mostly slants). Many note-heads are missing due to their narrow width and their placement at the centre of their stems, and are vertically sub-segmented into smaller pieces. Some note-heads are mis-classified as dots and certain isolated symbols in the test samples, notably flat and rest signs, have been correctly recognised mainly due to the printed training data set used. Beam and curve detection are apparently robust with relatively even-thickness features. Since the prototype assumes beams to be straight, isolated curved beams are classified as curved. 4 Extensions to the Existing System The sub-segmentation process relies on the vertical orientation of features. For example, in order for the vertical sub-segmentation of a crotchet to work correctly (i.e. the disassembly of a crotchet into a stem and a note-head), the stem of a crotchet, has to be a vertical line situated to the right or to the left of the note-head(s). Figure 6 illustrates a handwritten sample crotchet with its note-head located above its stem, and the result of the undesirable vertical sub-segmentation attempts. (a) (b Figure 6: The problem of sub-segmenting a skewed crotchet. (a) A skewed crotchet. (b) The subsegmentation result. From initial experimental results, a different vertical sub-segmentation approach to handle the characteristic slants in handwriting is clearly needed. Following the primitive sub-segmentation approach in printed music scores, and general methods used in hand-written script processing, a possible approach for hand-written manuscript subsegmentation, using skeletonisation and junction points to guide the decomposition of composite symbols, is proposed. Musical features of handwritten manuscripts may be viewed as a set of interconnected curved lines. A robust method is needed to separate each segment which has a different semantic. A general mathematical morphology approach, which has been widely employed in the domain of cursive script recognition, is adapted, and a thinning process (see [Zhang and Suen, 1984] and [Sonka et at., 1993]), is implemented for initial experimentation. The thinning routine implemented demonstrates successful transformation of input manuscripts into one-pixel wide line segments which are free from the ambiguities of uneven line thickness. To minimise the bias of skeletonisation associated with a specific raster scan (i.e. from left to right, top to bottom), further scans are performed in alternate directions. Figure 7 demonstrates a sample handwritten manuscript, and a thinned version of the input. (a) Sample input. (b) A thinned version. Figure 7: Thinning. A simple eight-neighbour count and crossingindex, can locate many junction points which can be used to guide the sub-segmentation of the features. The crossing index is half of the number of changes occurring (foreground to background or the reverse), when the eight-neighbours are traversed. The index indicates the number of segments that will result if the pixel under examination is removed. Skeleton pixels with an Ng et al. 356 ICMC Proceedings 1996

Page  357 ï~~eight-neighbour count of1I are clearly termination points, and a crossing-index of three or more suggests a junction point. Other junction points and certain classifications may be feasible using the difference in the local line-segment thicknesses. Chain-code and polygonal approximations methods [Gonzalez and Wintz, 1987] may be applied to detect turning points. The level of thinnings may be kept as the pix! value and the final skeleton line pixel may find its surrounding thickness by simply visiting its neighbouring pixels. The changes in neighbouring-pixel values can provide valuable clues to guide a sub-segmentation or locate turning points. Unclear note-heads may be discovered by this approach. When the junction points are found, it may be possible to estimate a vertically-orientated rectangular box to simulate a bounding box of a sub-segmented printed music features, or to 'straighten' the slanted line segment. After that, many of the rule-based classification techniques may be reused, perhaps with a relaxation of threshold parameters and rules. For example, note-heads are allowed to appear at the top left or bottom right of a stem, or even at the centre. The current prototype is only equipped with a basic thinning algorithm. General problems of such techniques, for example, clusters of junction points and unwanted extra thinned segments, may result from 'cracks', 'holes' and elongated intersections in the features. These artefacts may not be critical in this application, and more sophisticated thinning algorithms [Suen and Wang, 1994] could be adopted. 5 High-level Knowledge Enhancement It is widely agreed that domain knowledge is indispensable in most complex document analysis and recognition. To enhance recognition, most printed OMR systems have exploited and applied music notational syntax. Ambiguities and omissions in the hand-written document domain are severe. Without specific expert knowledge and experience, even an engraver may sometimes have problems reading the many different styles of hand-writing and deciphering the correct features, and only with higher-level knowledge can a reader deduce the actual musical information. A transcriber requires skills and knowledge, not only in terms of syntax, but perhaps sometimes even of the stylistic conventions of a specific composer. We believe that higher-level domain-knowledge enhancements are probably the most likely, if not the only way. forward in assisting the clarification of ambiguities and omissions. We are working on a number of higher-level processes using the applications of higher-level domain knowledge and conventions, including a rule-based algorithm for tonal-centre detection [Ng et at., 1996b1. 6 Summary In this paper, some of the main obstacles to handwritten music manuscript recognition are discussed, and the feasibility of extending the existing prototype to handle handwritten manuscripts is investigated. Acknowledgements We wish to acknowledge the following composers for samples of their original handwritten manuscripts, and permission given to use their compositions for experiments and illustrations: Mr P Wilby, Dr H-L Yao and Dr D Cooper. References [Clarke et at., 1993] A Clarke, B Brown and M Thorne. Recognising musical text. In Proceedings of the SPIE, Vol. 2064: pp. 222 -- 233, 1993. [Gonzalez and Wintz, 1987] R Gonzalez and P Wintz. Digital Image Processing, AddisonWesley Publishing Company, 1987. [Ng and Boyle, 1992] K Ng and R Boyle. Segmentation of Music Primitives. In D C Hogg and R D Boyle (Eds.): Proceedings of the British Machine Vision Conference, Springer-Verlag, pp. 472 -- 480, 1992. [Ng and Boyle, 1996a] K Ng and R Boyle. Recognition and reconstruction of primitives in music scores. Image and Vision Computing. To appear 1996. [Ng et at., 1996b] K Ng, R Boyle and D Cooper. Automatic Detection of Tonality using Note Distribution. Journal of New Music Research. To appear 1996. [Sonka et at., 1993] M Sonka, V Hlavac and R Boyle. Iniage Processing, Analysis, and Computer Vision. Chapman & Hall Computing, London, 1993. [Suen and Wang, 1994J C Y Suen and P S P Wang. Thinning Methodologies for Pattern Recognition, Vol 8 of Series in Machine Perception antd Al. World Scientific, 1994. [Zhang and Suen, 1984] T Y Zhang and C Y Suen. A fast parallel algorithm for thinning digital patterns. Communications of the ACM, 27(3): pp. 236 -- 239, 1984. ICMC Proceedings 1996 357 Nig et al.