Search in:

Advanced search

Journal of New Music Research Latest Articles

Submit an article Journal homepage

Open access

471

Views

CrossRef citations to date

Altmetric

Research Article

Hindustani raga and singer classification using 2D and 3D pose estimation from video recordings

Martin Claytona Department of Music, Durham University, Durham, UKCorrespondence[email protected]

https://orcid.org/0000-0002-9670-5077 View further author information

Jin Lia Department of Music, Durham University, Durham, UK;b School of Computer Science, Shaanxi Normal University, Xi'an, People's Republic of China

https://orcid.org/0000-0002-0260-3169 View further author information

Alison Clarkec Advanced Research Computing, Durham University, Durham, UKView further author information

Marion Weinzierlc Advanced Research Computing, Durham University, Durham, UKView further author information

Received 15 Oct 2021, Accepted 10 Mar 2024, Published online: 03 Apr 2024

Cite this article
https://doi.org/10.1080/09298215.2024.2331788
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Al Ghamdi, M., Zhang, L., & Gotoh, Y. (2012). Spatio-temporal SIFT and its application to human action classification. In Fusiello A., Murino V., & Cucchiara R. (Eds.), Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science (Vol. 7583, pp. 301–310). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-33863-2_30
Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2021). OpenPose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172–186. https://doi.org/10.1109/TPAMI.2019.2929257
PubMed Web of Science ®Google Scholar
Clarke, A., Weinzierl, M., & Li, J. (2021). Pose estimation for raga (v1.0.1). Zenodo. https://doi.org/10.5281/zenodo.5526676
Google Scholar
Clayton, M. (2007a). Observing entrainment in music performance: Video-based observational analysis of Indian musicians’ tanpura playing and beat marking. Musicae Scientiae, 11(1), 27–59. https://doi.org/10.1177/102986490701100102
Web of Science ®Google Scholar
Clayton, M. (2007b). Time, gesture and attention in a Khyāl performance. Asian Music, 38(2), 71–96. https://doi.org/10.1353/amu.2007.0032
Web of Science ®Google Scholar
Clayton, M., Jakubowski, K., & Eerola, T. (2019). Interpersonal entrainment in Indian instrumental music performance: Synchronization and movement coordination relate to tempo, dynamics, metrical and cadential structure. Musicae Scientiae, 23(3), 304–331. https://doi.org/10.1177/1029864919844809
Web of Science ®Google Scholar
Clayton, M., Jakubowski, K., Eerola, T., Keller, P. E., Camurri, A., Volpe, G., & Alborno, P. (2020). Interpersonal entrainment in music performance: Theory, method and model. Music Perception, 38(2), 136–194. https://doi.org/10.1525/mp.2020.38.2.136
Web of Science ®Google Scholar
Clayton, M., Leante, L., & Tarsitani, S. (2021a). North Indian raga performance. OSF. May 14. https://doi.org/10.17605/OSF.IO/NKJGZ
Google Scholar
Clayton, M., Li, J., Clarke, A. R., Weinzierl, M., Leante, L., & Tarsitani, S. (2021b). Hindustani raga and singer classification using pose estimation. OSF. October 14. https://doi.org/10.17605/OSF.IO/T5BWA
Google Scholar
Clayton, M., Rao, P., Shikarpur, N., Roychowdhury, S., & Li, J. (2022). Raga classification from vocal performances using multimodal analysis. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India. https://dap-lab.github.io/multimodal-raga-supplementary/
Google Scholar
Dahl, S., Bevilacqua, F., Bresin, R., Clayton, M., Leante, L., Poggi, I., & Rasamimanana, N. (2009). Gestures in performance. In R. I. Godoy & M. Leman (Eds.), Musical gestures: Sound, movement, and meaning (pp. 36–68). Routledge.
Google Scholar
Fatone, G. A., Clayton, M., Leante, L., & Rahaim, M. (2011). Imagery, melody and gesture in cross-cultural perspective. In A. Gritten & E. King (Eds.), New perspectives on music and gesture (pp. 203–220). Ashgate.
Google Scholar
Godoy, R. I., & Leman, M. (Eds.). (2009). Musical gestures: Sound, movement, and meaning. Routledge.
Google Scholar
Goldin-Meadow, S. (2003). Hearing gesture: How our hands help us think. Harvard University Press.
Google Scholar
Gritten, A., & King, E. (Eds.). (2011). New perspectives on music and gesture. Ashgate.
Google Scholar
Jakubowski, K., Eerola, T., Alborno, P., Volpe, G., Camurri, A., & Clayton, M. (2017). Extracting coarse body movements from video in music performance: A comparison of automated computer vision techniques with motion capture data. Frontiers in Digital Humanities, 4, 9. https://doi.org/10.3389/fdigh.2017.00009
Google Scholar
Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231. https://doi.org/10.1109/TPAMI.2012.59
PubMed Web of Science ®Google Scholar
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.
Google Scholar
Leante, L. (2009). The lotus and the king: Imagery, gesture and meaning in a Hindustani Rāg. Ethnomusicology Forum, 18(2), 185–206. https://doi.org/10.1080/17411910903141874
Google Scholar
Leante, L. (2013a). Gesture and imagery in music performance: Perspectives from North Indian classical music. In T. Shephard & A. Leonard (Eds.), The Routledge companion to music and visual culture (pp. 145–152). Routledge.
Google Scholar
Leante, L. (2013b). Imagery, movement and listeners’ construction of meaning in North Indian classical music. In M. Clayton, B. Dueck, & L. Leante (Eds.), Experience and meaning in music performance (pp. 161–187). Oxford University Press.
Google Scholar
Leante, L. (2018). The cuckoo’s song: Imagery and movement in monsoon ragas. In I. Rajamani, M. Pernau, & K. R. Butler Schofield (Eds.), Monsoon feelings: A history of emotions in the rain (pp. 255–290). Niyogi Books.
Google Scholar
Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 661–670). https://doi.org/10.1145/2623330.2623612
Google Scholar
Liu, M., Liu, H., & Chen, C. (2017). Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition, 68, 346–362. https://doi.org/10.1016/j.patcog.2017.02.030
Web of Science ®Google Scholar
Liu, Z., Zhang, H., Chen, Z., Wang, Z., & Ouyang, W. (2020). Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 143–152). https://openaccess.thecvf.com/content_CVPR_2020/papers/Liu_Disentangling_and_Unifying_Graph_Convolutions_for_Skeleton-Based_Action_Recognition_CVPR_2020_paper.pdf
Google Scholar
Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Web of Science ®Google Scholar
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. The University of Chicago Press.
Google Scholar
McNeill, D. (2005). Gesture and thought. University of Chicago Press.
Google Scholar
Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., & Theobalt, C. (2018). Single-shot multi-person 3d pose estimation from monocular RGB. In 2018 International Conference on 3D Vision (pp. 120–130). https://arxiv.org/abs/1712.03453v3
Google Scholar
Moran, N. (2013). Social co-regulation and communication in North Indian duo performances. In M. Clayton, B. Dueck, & L. Leante (Eds.), Experience and meaning in music performance (pp. 64–94). Oxford University Press.
Google Scholar
Paschalidou, S., & Clayton, M. (2015). Towards a sound-gesture analysis in Hindustani Dhrupad vocal music: Effort and raga space. In International Conference on the Multimodal Experience of Music (ICMEM), Sheffield. https://www.researchgate.net/publication/312029966_Towards_a_sound-gesture_analysis_in_Hindustani_Dhrupad_vocal_music_effort_and_raga_space
Google Scholar
Paschalidou, S., Eerola, T., & Clayton, M. (2016). Voice and movement as predictors of gesture types and physical effort in virtual object interactions of classical Indian singing. In Proceedings of the 3rd International Symposium on Movement and Computing (MOCO ‘16), Association for Computing Machinery, New York, NY, USA, Article 45 (pp. 1–2). https://doi.org/10.1145/2948910.2948914
Google Scholar
Pearson, L. (2013). Gesture and the sonic event in Karnatak music. Empirical Musicology Review, 8(1), 2–14. https://doi.org/10.18061/emr.v8i1.3918
Google Scholar
Pearson, L., & Pouw, W. (2022). Gesture–vocal coupling in Karnatak music performance: A neuro-bodily distributed aesthetic entanglement. Annals of the New York Academy of Sciences, 1515(1), 219–236. https://doi.org/10.1111/nyas.14806
PubMed Web of Science ®Google Scholar
Potempski, F., Sabo, A., & Patterson, K. K. (2021). Technical note: Quantifying music-dance synchrony with the application of a deep learning-based 2D pose estimator. bioRxiv 2020.10.09.333617. https://doi.org/10.1101/2020.10.09.333617
Google Scholar
Rahaim, M. (2012). Musicking bodies: Gesture and voice in Hindustani music. Wesleyan University Press.
Google Scholar
Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627–1639. https://doi.org/10.1021/ac60214a047
Web of Science ®Google Scholar
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. arXiv Preprint. https://arxiv.org/abs/1406.2199
Google Scholar
Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62(1), 77–89. https://doi.org/10.1016/S0034-4257(97)00083-7
Web of Science ®Google Scholar
Tao, Y., & Papadias, D. (2006). Maintaining sliding window skylines on data streams. IEEE Transactions on Knowledge and Data Engineering, 18(3), 377–391. https://doi.org/10.1109/TKDE.2006.48
Web of Science ®Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4489–4497). https://doi.org/10.1109/ICCV.2015.510
Google Scholar
Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI Conference on Artificial Intelligence, North America. April 2018. https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135
Google Scholar
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., & Zheng, N. (2017). View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2117–2126). https://arxiv.org/abs/1703.08274
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Hindustani raga and singer classification using 2D and 3D pose estimation from video recordings

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Hindustani raga and singer classification using 2D and 3D pose estimation from video recordings

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date