Abstract
A decade has passed since the first review of research on a ‘flagship application’ of music information retrieval (MIR): the problem of music genre recognition (MGR). During this time, about 500 works addressing MGR have been published, and at least 10 campaigns have been run to evaluate MGR systems. This makes MGR one of the most researched areas of MIR. So, where does MGR now lie? We show that in spite of this massive amount of work, MGR does not lie far from where it began, and the paramount reason for this is that most evaluation in MGR lacks validity. We perform a case study of all published research using the most-used benchmark dataset in MGR during the past decade: GTZAN. We show that none of the evaluations in these many works is valid to produce conclusions with respect to recognizing genre, i.e. that a system is using criteria relevant for recognizing genre. In fact, the problems of validity in evaluation also affect research in music emotion recognition and autotagging. We conclude by discussing the implications of our work for MGR and MIR in the next ten years.
Acknowledgements
Thanks to: Fabien Gouyon, Nick Collins, Arthur Flexer, Mark Plumbley, Geraint Wiggins, Mark Levy, Roger Dean, Julián Urbano, Alan Marsden, Lars Kai Hansen, Jan Larsen, Mads G. Christensen, Sergios Theodoridis, Aggelos Pikrakis, Dan Stowell, Rémi Gribonval, Geoffrey Peeters, Diemo Schwarz, Roger Dannenberg, Bernard Mont-Reynaud, Gaël Richard, Rolf Bardeli, Jort Gemmeke, Curtis Roads, Stephen Pope, Yi-Hsuan Yang, George Tzanetakis, Constantine Kotropoulos, Yannis Panagakis, Ulaş Bağci, Engin Erzin, and João Paulo Papa for illuminating discussions about these topics (which does not mean any endorse the ideas herein). Mads G. Christensen, Nick Collins, Cynthia Liem, and Clemens Hage helped identify several excerpts in GTZAN, and my wife Carla Sturm endured my repeated listening to all of its excerpts. Thanks to the many, many associate editors and anonymous reviewers for the comments that helped move this work closer and closer to being publishable.
Notes
4 The bibliography and spreadsheet that we use to generate this figure are available as supplementary material, as well as online here: http://imi.aau.dk/~bst/software.
5 The dataset can be downloaded from here: http://marsyas.info/download/data_sets
6 Fabbri (Citation1999) also notes that genre helps ‘to speed up communication’.
8 All relevant references are available in the supplementary material, as well as online: http://imi.aau.dk/~bst/software.
9 Available here: http://visal.cs.cityu.edu.hk/downloads/#gtzankeys
10 Personal communication.
12 This is the file ‘country.00015.wav’ in GTZAN.
13 ‘Power Nap’ by J.S. Epperson (Binaural Beats Entrainment), 2010.
14 Our machine-readable index of this metadata is available at: http://imi.aau.dk/~bst/software.
15 These can be heard online here: http://imi.aau.dk/~bst/research/GTZANtable2
16 Which is now over 100 considering publications in 2013 not included in our survey (Sturm, Citation2012b).
17 A modern-day equivalent is ‘Maggie’, a dog that has performed arithmetic feats on nationally syndicated television programs: http://www.oprah.com/oprahshow/Maggie-the-Dog-Does-Math
18 We use here the multivariate dataset, http://archive.ics.uci.edu/ml/datasets/Adult
19 This feature takes a value in ‘Wife’, ‘Husband’, ‘Unmarried’, ‘Child’.
20 This feature takes a value in ‘Tech-support’, ‘Craft-repair’, ‘Other-service’, ‘Sales’, ‘Exec-managerial’, ‘Prof-specialty’, ‘Handlers-cleaners’, ‘Machine-op-inspct’, ‘Adm-clerical’, ‘Farming-fishing’, ‘Transport-moving’, ‘Priv-house-serv’, ‘Protective-serv’, ‘Armed-Forces’.
21 This feature takes a value in ‘Married’, ‘Divorced’, ‘Never-married’, ‘Widowed’, ‘Separated’.
22 At least, the scope of the first conclusion must be limited to 1996 USA, and that of the second to the dataset.
23 R. Hamming, ‘You get what you measure’, lecture at Naval Postgraduate School, June 1995. http://www.youtube.com/watch?v=LNhcaVi3zPA
24 What conclusion is valid in this case has yet to be determined.
25 Personal communication with J.P. Papa.
This work was supported by Independent Postdoc Grant 11-105218 from Det Frie Forskningsråd.