Gamera: Optical music recognition in a new shell Karl MacMillan, Michael Droettboom, and Ichiro Fujinaga Peabody Conservatory of Music Johns Hopkins University 1 East Mount Vernon Place, Baltimore MD 21202 email: {karlmac,mdboom,ich} @peabody.jhu. edu Abstract An optical music recognition system has been completely overhauled and reformatted into a new framework called Gamera. The new opensource software is not only designed to recognize various music notations, including handwritten scores, but can be used to develop systems that can recognize many other structured documents. Gamera is intended to be used by domain experts with particular knowledge of the documents to be recognized but without strong programming skills. Gamera contains image processing and recognition tools in an easy to use, interactive, graphical scripting environment. Additionally, the system can be extended through a C++ and Python plugins. Keywords: optical music recognition 1. INTRODUCTION An optical music recognition (OMR) system (Fujinaga 1997) has been completely overhauled and reformatted into a new framework called Gamera. This system combines image processing and recognition tools in an easy to use, interactive, graphical scripting environment for the creation of domainspecific document recognition systems by document experts The goal of the system is to leverage the user's knowledge of the target documents to create custom applications rather than attempting to meet the needs of diverse users with a monolithic application. The applications created by the user are suitable for use in a largescale digitization project and they can be run in a batch processing mode and easily integrated into a large scale digitization framework. Additionally, a module (plugin) system allows experienced programmers to extend the system. This paper will give an overview of Gamera, describe the user environment, and briefly discuss the plugin system. 2. IMOTIVATION AND GOALS Gamera is being created as part of the Lester S. Levy Sheet Music Project (Choudhury et al. 2001). The Levy collection represents one of the largest collections of sheet music available online <http://levysheetmusic.mse.jhu.edu>. The Collection, part of the Special Collections of the Milton S. Eisenhower Library at the Johns Hopkins University, comprises nearly 30,000 pieces of music which correspond to nearly 130,000 sheets of music and associated cover art. The goal of the Levy Project (Phase Two) is to create an efficient workflow management system to reduce the cost and complexity of converting large, existing collections to digital form. From the beginning of the project, optical music recognition software was a key component of the workflow system. The creation of a flexible OMR tool is necessary because of the historical nature of the Levy collection. Existing OMR systems are not designed to handle the wide range of music notation found in the collection or deal with the potentially degraded documents. OMR alone is not sufficient for the complete recognition of the scores in the Levy collection as they are not comprised only of musical symbols. Text is also present as lyrics, score markings, and metadata. It was hoped, however, that an existing optical character recognition (OCR) system would be able to process such text. Early trials of existing systems revealed there are many problems with the current generation of OCR software within this context. To address the need for OCR in the Levy project the Gamera system was created. Gamera is an extension of the existing OMR system to a general symbol recognition system. The recognition of text uses the same technology that allows the OMR system to perform well on the musical portions of the Levy collection. In addition to serving the needs of the Levy project and music recognition in general, we hope that the system may be used in the future for the recognition of historical documents and any other structured documents that current recognition systems do not adequately address. In addition to generalizing the system, a graphical programming environment has been added to ease the adaptation of the system by users with expert knowledge of the documents to be recognized. This environment provides an easy to learn scripting 482
Top of page Top of page