Gamera: Optical music recognition in a new shell
Karl MacMillan, Michael Droettboom, and Ichiro Fujinaga
Peabody Conservatory of Music Johns Hopkins University
1 East Mount Vernon Place, Baltimore MD 21202 email:
{karlmac,mdboom,ich} @peabody.jhu. edu
Abstract
An optical music recognition system has been
completely overhauled and reformatted into a new
framework called Gamera. The new opensource
software is not only designed to recognize various
music notations, including handwritten scores, but
can be used to develop systems that can recognize
many other structured documents. Gamera is intended
to be used by domain experts with particular
knowledge of the documents to be recognized but
without strong programming skills. Gamera contains
image processing and recognition tools in an easy to
use, interactive, graphical scripting environment.
Additionally, the system can be extended through a
C++ and Python plugins.
Keywords: optical music recognition
1. INTRODUCTION
An optical music recognition (OMR) system
(Fujinaga 1997) has been completely overhauled and
reformatted into a new framework called Gamera.
This system combines image processing and
recognition tools in an easy to use, interactive,
graphical scripting environment for the creation of
domainspecific document recognition systems by
document experts
The goal of the system is to leverage the user's
knowledge of the target documents to create custom
applications rather than attempting to meet the needs
of diverse users with a monolithic application. The
applications created by the user are suitable for use in
a largescale digitization project and they can be run in
a batch processing mode and easily integrated into a
large scale digitization framework. Additionally, a
module (plugin) system allows experienced
programmers to extend the system. This paper will
give an overview of Gamera, describe the user
environment, and briefly discuss the plugin system.
2. IMOTIVATION AND GOALS
Gamera is being created as part of the Lester S.
Levy Sheet Music Project (Choudhury et al. 2001).
The Levy collection represents one of the largest
collections of sheet music available online
<http://levysheetmusic.mse.jhu.edu>. The Collection,
part of the Special Collections of the Milton S.
Eisenhower Library at the Johns Hopkins University,
comprises nearly 30,000 pieces of music which
correspond to nearly 130,000 sheets of music and
associated cover art.
The goal of the Levy Project (Phase Two) is to
create an efficient workflow management system to
reduce the cost and complexity of converting large,
existing collections to digital form. From the
beginning of the project, optical music recognition
software was a key component of the workflow
system. The creation of a flexible OMR tool is
necessary because of the historical nature of the Levy
collection. Existing OMR systems are not designed to
handle the wide range of music notation found in the
collection or deal with the potentially degraded
documents. OMR alone is not sufficient for the
complete recognition of the scores in the Levy
collection as they are not comprised only of musical
symbols. Text is also present as lyrics, score
markings, and metadata. It was hoped, however, that
an existing optical character recognition (OCR)
system would be able to process such text. Early
trials of existing systems revealed there are many
problems with the current generation of OCR
software within this context.
To address the need for OCR in the Levy project
the Gamera system was created. Gamera is an
extension of the existing OMR system to a general
symbol recognition system. The recognition of text
uses the same technology that allows the OMR
system to perform well on the musical portions of the
Levy collection. In addition to serving the needs of
the Levy project and music recognition in general, we
hope that the system may be used in the future for the
recognition of historical documents and any other
structured documents that current recognition systems
do not adequately address.
In addition to generalizing the system, a graphical
programming environment has been added to ease the
adaptation of the system by users with expert
knowledge of the documents to be recognized. This
environment provides an easy to learn scripting
482