210
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Large vocabulary automatic chord estimation using bidirectional long short-term memory recurrent neural network with even chance training

ORCID Icon &
Pages 53-67 | Received 29 Aug 2016, Accepted 08 Aug 2017, Published online: 30 Oct 2017

References

  • Bello, J. P., & Pickens, J. (2005). A robust mid-level representation for harmonic content in music signals. In Proceedings of the 6th International Society for Music Information Retrieval Conference, ISMIR (Vol. 5, pp. 304–311), London, UK.
  • Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends\textregistered in Machine Learning, 2(1), 1–127.
  • Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2013). Audio chord recognition with recurrent neural networks. In Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR (pp. 335–340), Curitiba, Brazil.
  • Brown, J. C. (1991). Calculation of a constant Q spectral transform. The Journal of the Acoustical Society of America, 89(1), 425–434.
  • Burgoyne, J. A., Pugin, L., Kereliuk, C., & Fujinaga, I. (2007). A cross-validated study of modelling strategies for automatic chord recognition in audio. In Proceedings of the 8th International Society for Music Information Retrieval Conference, ISMIR (pp. 251–254), Vienna, Austria.
  • Burgoyne, J. A., Wild, J., & Fujinaga, I. (2011). An expert ground truth set for audio chord recognition and music analysis. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR (Vol. 11, pp. 633–638), Miami, FL.
  • Cannam, C., Mauch, M., Davies, M. E. P., Dixon, S., Landone, C., Noland, K., ... Figueira, L. A. (2013). MIREX 2013 entry: Vamp plugins from the centre for digital music, Curitiba, Brazil.
  • Catteau, B., Martens, J.-P., & Leman, M. (2007). A probabilistic framework for audio-based tonal key and chord recognition. In B. Decker & H. J, Lenz (Eds.), Advances in data analysis (pp. 637–644). Berlin, Germany: Springer.
  • Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 6(1), 1–6.
  • Cho, T. (2014). Improved techniques for automatic chord recognition from music audio signals (PhD thesis). New York City, NY: New York University.
  • Cho, T., & Bello, J. P. (2013). MIREX 2013: Large vocabulary chord recognition system using multi-band features and a multi-stream HMM. Music Information Retrieval Evaluation eXchange (MIREX), Curitiba, Brazil.
  • Deng, J., & Kwok, Y.-K. (2016a). Automatic chord estimation on SeventhsBass chord vocabulary using deep neural network. In Proceedings of the 41th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China.
  • Deng, J., & Kwok, Y.-K. (2016b). A hybrid Gaussian-HMM-deep-learning approach for automatic chord estimation with very large vocabulary. In Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR, New York City, NY.
  • Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification. New York City, NY: Wiley.
  • Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.
  • Fujishima, T. (1999). Realtime chord recognition of musical sound: A system using common lisp music. In Proceedings of the 25th International Computer Music Conference (Vol. 1999, pp. 464–467), Beijing, China.
  • Gómez, E. (2006). Tonal description of music audio signals. PhD Thesis. Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona.
  • Graves, A. (2012). Supervised sequence labelling with recurrent neural networks. Berlin: Springer.
  • Hamel, P., & Eck, D. (2010). Learning features from music audio with deep belief networks. In Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR (pp. 339–344). Utrecht, The Netherlands.
  • Harte, C. (2010). Towards automatic extraction of harmony information from music signals (PhD thesis). Department of Electronic Engineering, Queen Mary, University of London, London, UK.
  • Harte, C., & Sandler, M. (2005). Automatic chord identification using a quantised chromagram. In Audio Engineering Society Convention 118. Barcelona, Spain: Audio Engineering Society.
  • Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
  • Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm, and system development. (Foreword by: Raj Reddy). Prentice Hall PTR.
  • Humphrey, E. J., & Bello, J. P. (2012). Rethinking automatic chord recognition with convolutional neural networks. In Proceedings of the 11th International Conference on Machine Learning and Applications (ICMLA) (Vol. 2, pp. 357–362). Boca Raton, FL: IEEE.
  • Humphrey, E. J., Bello, J. P., & LeCun, Y. (2013). Feature learning and deep architectures: New directions for music informatics. Journal of Intelligent Information Systems, 41(3), 461–481.
  • Jolliffe, I. (2002). Principal component analysis. New York, NY: Wiley Online Library.
  • Jordan, M. I. (1986). Attractor dynamics and parallellism in a connectionist sequential machine. In Proceedings of the Eighth Annual Meeting of the Cognitive Science Society (pp. 531–546). Amherst, MA: Lawrence Erlbaum Associates.
  • Khadkevich, M., & Omologo, M. (2011). Time-frequency reassigned features for automatic chord recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 181–184). Prague, Czech Republic: IEEE.
  • Korzeniowski, F., & Widmer, G. (2016). Feature learning for chord recognition: The deep chroma extractor. In Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR, New York City, NY.
  • Lang, K. J., Waibel, A. H., & Hinton, G. E. (1990). A time-delay neural network architecture for isolated word recognition. Neural Networks, 3(1), 23–43.
  • Lawson, C. L., & Hanson, R. J. (1974). Solving least squares problems (Vol. 15). Prentice-Hall.
  • LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), 1995.
  • Mauch, M. (2010a). Automatic chord transcription from audio using computational models of musical context (PhD thesis). School of Electronic Engineering and Computer Science Queen Mary, University of London.
  • Mauch, M. (2010b). Simple chord estimate: Submission to the MIREX chord estimation task. Music Information Retrieval Evaluation Exchange (MIREX), Utrecht, Netherlands.
  • Mauch, M., & Dixon, S. (2010a). Approximate note transcription for the improved identification of difficult chords. In Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR (pp. 135–140), Utrecht, The Netherlands.
  • Mauch, M., & Dixon, S. (2010b). MIREX 2010: Chord detection using a dynamic Bayesian network. Music Information Retrieval Evaluation Exchange (MIREX), Utrecht, Netherlands.
  • Mauch, M., & Dixon, S. (2010c). Simultaneous estimation of chords and musical context from audio. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1280–1289.
  • Ni, Y., McVicar, M., Santos-Rodriguez, R., & De Bie, T. (2012). An end-to-end machine learning system for harmonic analysis of music. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1771–1783.
  • Oppenheim, A. V., Willsky, A. S., & Nawab, S. H. (1983). Signals and systems (Vol. 2). Englewood Cliffs, NJ: Prentice-Hall.
  • Papadopoulos, H., & Peeters, G. (2008). Simultaneous estimation of chord progression and downbeats from an audio file. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp. 121–124). Las Vegas, NV: IEEE.
  • Papadopoulos, H., & Tzanetakis, G. (2012). Modeling chord and key structure with Markov logic. In Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR (pp. 127–132). Porto, Portugal: Citeseer.
  • Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. Computer Music Journal, 26(2), 27–49.
  • Pauwels, J., & Peeters, G. (2013). Evaluating automatically estimated chord sequences. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 749–753). Vancouver, Canada: IEEE.
  • Prechelt, L. (1998). Early stopping-but when? In Neural networks: Tricks of the trade (pp. 55–69). Berlin, Germany: Springer.
  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. Cognitive Modeling, 5(3), 1–2.
  • Rumelhart, D. E., McClelland, J. L., \ & PDP Research Group. (1988). Parallel distributed processing (Vol. 1) . Massachusetts: IEEE.
  • Ryynänen, M. P., & Klapuri, A. (2005). Polyphonic music transcription using note event modeling. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005 (pp. 319–322). New York, NY: IEEE.
  • Sheh, A., & Ellis, D. P. W. (2003). Chord segmentation and recognition using EM-trained hidden Markov models. In Proceedings of the 4th International Society for Music Information Retrieval Conference, ISMIR (pp. 185–191). Baltimore, MD: International Symposium on Music Information Retrieval.
  • Sigtia, S., Boulanger-Lewandowski, N., & Dixon, S. (2015). Audio chord recognition with a hybrid recurrent neural network. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Malaga, Spain.
  • Sigtia, S., & Dixon, S. (2014). Improved music feature learning with deep neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6959–6963). Florence, Italy: IEEE.
  • Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
  • Wakefield, G. H. (1999). Mathematical representation of joint time-chroma distributions. In SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation (pp. 637–645). Denver, CO: International Society for Optics and Photonics.
  • Weller, A., Ellis, D., & Jebara, T. (2009). Structured prediction models for chord transcription of music audio. In International Conference on Machine Learning and Applications, ICMLA (pp. 590–595). Miami, FL: IEEE.
  • Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.
  • Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv preprint arXiv:1212.5701.
  • Zhou, X., & Lerch, A. (2015). Chored detection using deep learning. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR (Vol. 53).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.