530
Views
35
CrossRef citations to date
0
Altmetric
Articles

Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images

&
Pages 169-195 | Received 05 Jul 2014, Accepted 07 Apr 2015, Published online: 22 Sep 2015
 

Abstract

We posit that progress in Optical Music Recognition (OMR) has been held up for years by the absence of anything resembling the standard testbeds in use in other fields that face difficult evaluation problems. One example of such a field is text information retrieval (IR), where the Text Retrieval Conference (TREC) has annually-renewed IR tasks with accompanying data sets. In music informatics, the Music Information Retrieval Exchange (MIREX), with its annual tests and meetings held during the ISMIR conference, is a close analog to TREC; but MIREX has never had an OMR track or a collection of music such a track could employ. We describe why the absence of an OMR testbed is a problem and how this problem may be mitigated. To aid in the establishment of a standard testbed, we provide (1) a set of definitions for the complexity of music notation; (2) a set of performance metrics for OMR tools that gauge score complexity and graphical quality; and (3) a small corpus of music for use as a baseline for a proper OMR testbed.

Acknowledgements

The definitions of quality levels and images illustrating them are reprinted with permission of our collaborators Brian Søborg Mathiasen, Esben Paul Bugge, and Kim Lundsteen Juncher. We use test pages from the corpus of Bellini et al. (Citation2010) with permission of the authors. Ichiro Fujinaga’s comments on both early and late versions of this paper were extremely helpful, as were numerous conversations with Chris Raphael over a period of years. Dorothea Blostein offered her expertise on document-recognition systems in general as well as on music-recognition systems. Bill Clemmons and Craig Sapp offered comments based on many years of experience using OMR systems. Finally (in a literal as well as a figurative sense), extensive feedback from the referees greatly improved the paper. Most of Donald Byrd’s work on OMR was part of the MeTAMuSE project, made possible by the generous support of the Andrew W. Mellon Foundation. Bill Guerin and Megan Schindele made substantial contributions to MeTAMuSE.

Notes

This work was supported in part by the Andrew W. Mellon Foundation.

1 Indeed, we have encountered forceful statements from OMR researchers to this effect, e.g. ‘I think plenty of progress can still be made on OMR without evaluation’ (Chris Raphael, personal communication, January 2013).

2 The OMR program was a version of PhotoScore from about 2006.

3 The Schubert example and a number of other examples of the subtlety of real-world CWMN are shown and discussed in the online ‘Gallery of Interesting Music Notation’ (Byrd, Citation2013).

4 The overall graphical context in Figure – with intersecting beams and so on – is exceptionally complex. On the other hand, while the slurs in both Figures and are more complex than the average, they’re nowhere near the most complex we know of. The record holder appears in the online ‘Gallery of Interesting Music Notation’ (Byrd, Citation2013); it has no fewer than 10 inflection points, spans three staves in each of three systems, and goes backwards several times.

5 Factors like this are the basic reason that one of us has argued (Byrd, Citation1984, pp. 5–6) that really high-quality formatting of complex music is a serious artificial-intelligence problem. See also Byrd (Citation1994).

6 Specifically, each of their pages 1, 2, 5, and 6 has multiple instances of C and F clefs appearing in positions that have been obsolete since the late 18th century. In addition, on these pages, stems on beamed 16th notes and shorter – i.e. notes with two or more beams – in the middle of a beamed group stop at the closest beam, while in the vast majority of published music, stems extend to the furthest beam. But the latter is much less likely to lead to OMR errors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.