Sound Event Detection System Based on VGGSKCCT Model Architecture with Knowledge Distillation

Sung-Jen HuangDepartment of Computer Science and Engineering, National Sun Yat-sen University 70 Lian-Hai Road Kaohsiung, Taiwan, Republic of China

Chia-Chuan LiuDepartment of Computer Science and Engineering, National Sun Yat-sen University 70 Lian-Hai Road Kaohsiung, Taiwan, Republic of China

Chia-Ping ChenDepartment of Computer Science and Engineering, National Sun Yat-sen University 70 Lian-Hai Road Kaohsiung, Taiwan, Republic of ChinaCorrespondence[email protected]

Article: 2152948 | Received 17 May 2022, Accepted 18 Nov 2022, Published online: 16 Dec 2022

Cite this article
https://doi.org/10.1080/08839514.2022.2152948
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Bilen, C., G. Ferroni, F. Tuveri, J. Azcarreta, and S. Krstulovic. 2019. A framework for the robust evaluation of sound event detection. arXiv preprint arXiv:1910 08440. https://arxiv.org/abs/1910.08440.
Google Scholar
Canwen, X., W. Zhou, G. Tao, F. Wei, and M. Zhou. 2020. BERT-of-Theseus: compressing bert by progressive module replacing. CoRr abs/2002.02925 https://arxiv.org/abs/2002.02925.
Google Scholar
Chih-Yuan, K., Y.S. Chen, Y.W. Liu, and M. R. Bai. 2021. Sound event detection by consistency training and pseudo-labeling with feature-pyramid convolutional recurrent neural networks. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Toronto, 376–19.
Google Scholar
Ding, X., X. Zhang, M. Ningning, J. Han, G. Ding, and J. Sun. 2021. RepVGG: making VGG-style ConvNets great again. CoRr abs/2101.03697 https://arxiv.org/abs/2101.03697.
Google Scholar
Endo, H., and H. Nishizaki. 2022. Peer collaborative learning for polyphonic sound event detection. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Singapore, 826–30. IEEE.
Google Scholar
Gulati, A., J. Qin, C.C. Chiu, N. Parmar, Y. Zhang, Y. Jiahui, W. Han, S. Wang, Z. Zhang, Y. Wu, et al. 2020. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005 08100
Google Scholar
Hinton, G., O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, http://arxiv.org/abs/1503.02531.
Google Scholar
Hongyi, Z., M. Cissé, Y. N. Dauphin, and D. Lopez-Paz. 2017. Mixup: beyond empirical risk minimization. CoRr abs/1710.09412 http://arxiv.org/abs/1710.09412.
Google Scholar
Kaiming, H., X. Zhang, S. Ren, and J. Sun. 2015. Deep residual learning for image recognition. CoRr abs/1512.03385 http://arxiv.org/abs/1512.03385.
Google Scholar
Kim, N. K., and H. Kook Kim. 2021. Self-Training With Noisy Student Model And Semi-Supervised Loss Function For DCASE 2021 Challenge Task 4. Technical Report. DCASE2021 Challenge.
Google Scholar
Mesaros, A., T. Heittola, and T. Virtanen. 2016. Metrics for polyphonic sound event detection. Applied Sciences 6 (6):162. doi: 10.3390/app6060162.
Google Scholar
Miyazaki, K., T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, and K. Takeda. 2020. Convolution-Augmented Transformer For Semi-Supervised Sound Event Detection. Technical Report. DCASE2020 Challenge.
Google Scholar
Nam, H., K. Byeong-Yun, G.T. Lee, S.H. Kim, W.H. Jung, S.M. Choi, and Y.H. Park. 2021. Heavily Augmented Sound Event Detection utilizing Weak Predictions. Technical Report. DCASE2021 Challenge.
Google Scholar
Park, D. S., W. Chan, Y. Zhang, C.C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le2019SpecAugment: a simple data augmentation method for automatic speech recognitionInterspeech 2019sepISCA. http://doi.org/10.21437/2Finterspeech.2019-2680
Google Scholar
Sam, S., and A. M. Rush. 2020. Pre-trained summarization distillation. CoRr abs/2010.13002 https://arxiv.org/abs/2010.13002.
Google Scholar
Serizel, R., N. Turpault, A. Shah, and J. Salamon. 2020. Sound event detection in synthetic domestic environments. In ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, Barcelona, Spain, May. https://hal.inria.fr/hal-02355573.
Google Scholar
Simonyan, K., and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations San Diego.
Google Scholar
Tarvainen, A., and H. Valpola. 2017. Weight-averaged consistency targets improve semi-supervised deep learning results. CoRr abs/1703.01780 http://arxiv.org/abs/1703.01780.
Google Scholar
Turc, I., M.W. Chang, K. Lee, and K. Toutanova. 2019. Well-read students learn better: on the importance of pre-training compact models. arXiv preprint arXiv:1908 08962.
Google Scholar
Turpault, N., R. Serizel, A. Parag Shah, and J. Salamon. 2019. Sound event detection in domestic environments with weakly labeled data and soundscape synthesis. In Workshop on Detection and Classification of Acoustic Scenes and Events, New York City, United States, October. https://hal.inria.fr/hal-02160855.
Google Scholar
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need Advances in Neural Information Processing Systems Vancouver. 30.
Google Scholar
Verma, V., A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz. 2019. Interpolation consistency training for semi-supervised learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI Macao’19, 3635–41. AAAI Press.
Google Scholar
Xiang, L., W. Wang, H. Xiaolin, and J. Yang. 2019. Selective kernel networks. CoRr abs/1903.06586 http://arxiv.org/abs/1903.06586.
Google Scholar
Yang, L., J. Hao, Z. Hou, and W. Peng. 2020. Two-Stage domain adaptation for sound event detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan, November, 230–34.
Google Scholar
Zheng, X., H. Chen, and Y. Song. 2021. Zheng USTC Team’s Submission For DCASE2021 Task4 – Semi-Supervised Sound Event Detection. Technical Report. DCASE2021 Challenge.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Sound Event Detection System Based on VGGSKCCT Model Architecture with Knowledge Distillation

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Sound Event Detection System Based on VGGSKCCT Model Architecture with Knowledge Distillation

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date