398
Views
18
CrossRef citations to date
0
Altmetric
Articles

Using general-purpose compression algorithms for music analysis

&
Pages 1-16 | Received 12 Dec 2014, Accepted 09 Dec 2015, Published online: 08 Feb 2016
 

Abstract

General-purpose compression algorithms encode files as dictionaries of substrings with the positions of these strings’ occurrences. We hypothesized that such algorithms could be used for pattern discovery in music. We compared LZ77, LZ78, Burrows–Wheeler and COSIATEC on classifying folk song melodies. A novel method was used, combining multiple viewpoints, the k-nearest-neighbour algorithm and a novel distance metric, corpus compression distance. Using single viewpoints, COSIATEC outperformed the general-purpose compressors, with a classification success rate of 85% on this task. However, by combining 8 of the 10 best-performing viewpoints, including seven that used LZ77, the classification success rate rose to over 94%. In a second experiment, we compared LZ77 with COSIATEC on the task of discovering subject and countersubject entries in fugues by J.S. Bach. When voice information was absent in the input data, COSIATEC outperformed LZ77 with a mean score of 0.123, compared with 0.053 for LZ77. However, when the music was processed a voice at a time, the score for LZ77 more than doubled to 0.124. We also discovered a significant correlation between compression factor and score for all the algorithms, supporting the hypothesis that the best analyses are those represented by the shortest descriptions.

Acknowledgements

The work reported in this paper was carried out as part of the EU FP7 project, ‘Learning to Create’ (Lrn2Cre8).

Notes

1 See, for example, chapter 25 of book 2 of Aristotle’s Posterior Analytics.

2 See, for example, the results reported at http://tukaani.org/lzma/benchmarks.html.

3 Of course, in practice, our alphabet would be a finite subset of , but this would still be very large and therefore significantly increase the size of the dictionary.

4 In practice, when , the algorithm returns (xc). This improves compression a little because it uses one character, whereas ‘’ uses two.

Additional information

Funding

The project Lrn2Cre8 acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET grant number 610859.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 471.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.