Abstract
In music genre classification, most approaches rely on statistical characteristics of low-level features computed on short audio frames. In these methods, it is implicitly considered that frames carry equally relevant information loads and that either individual frames, or distributions thereof, somehow capture the specificities of each genre. In this paper we study the representation space defined by short-term audio features with respect to class boundaries, and compare different processing techniques to partition this space. These partitions are evaluated in terms of accuracy on two genre classification tasks, with several types of classifiers. Experiments show that a randomized and unsupervised partition of the space, used in conjunction with a Markov Model classifier lead to accuracies comparable to the state of the art. We also show that unsupervised partitions of the space tend to create less hubs.
Acknowledgements
This research was supported by Convénio FCT/CAPES 2009; Fundação para a Ciência e a Tecnologia (FCT) and QREN-AdI grant for the project Palco3.0/3121 in Portugal; Ministerio de Educación in Spain. This work was partially supported by FCT through LASIGE Multiannual Funding and VIRUS research project (PTDC/EIA-EIA/101012/2008). The first author is supported by PROTEC grant SFRH/PROTEC/50118/2009.
Notes
1The set of features used is detailed in Section 2.3.
3Some authors (Panagakis & Arce, 2009) have presented results with above 90% accuracy with this dataset but these are obtained through a 10-fold cross-validation procedure and the training set is therefore larger than the one used in our set-up.