1,674
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Sound Event Detection System Based on VGGSKCCT Model Architecture with Knowledge Distillation

, &
Article: 2152948 | Received 17 May 2022, Accepted 18 Nov 2022, Published online: 16 Dec 2022

References

  • Bilen, C., G. Ferroni, F. Tuveri, J. Azcarreta, and S. Krstulovic. 2019. A framework for the robust evaluation of sound event detection. arXiv preprint arXiv:1910 08440. https://arxiv.org/abs/1910.08440.
  • Canwen, X., W. Zhou, G. Tao, F. Wei, and M. Zhou. 2020. BERT-of-Theseus: compressing bert by progressive module replacing. CoRr abs/2002.02925 https://arxiv.org/abs/2002.02925.
  • Chih-Yuan, K., Y.S. Chen, Y.W. Liu, and M. R. Bai. 2021. Sound event detection by consistency training and pseudo-labeling with feature-pyramid convolutional recurrent neural networks. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Toronto, 376–19.
  • Ding, X., X. Zhang, M. Ningning, J. Han, G. Ding, and J. Sun. 2021. RepVGG: making VGG-style ConvNets great again. CoRr abs/2101.03697 https://arxiv.org/abs/2101.03697.
  • Endo, H., and H. Nishizaki. 2022. Peer collaborative learning for polyphonic sound event detection. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Singapore, 826–30. IEEE.
  • Gulati, A., J. Qin, C.C. Chiu, N. Parmar, Y. Zhang, Y. Jiahui, W. Han, S. Wang, Z. Zhang, Y. Wu, et al. 2020. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005 08100
  • Hinton, G., O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, http://arxiv.org/abs/1503.02531.
  • Hongyi, Z., M. Cissé, Y. N. Dauphin, and D. Lopez-Paz. 2017. Mixup: beyond empirical risk minimization. CoRr abs/1710.09412 http://arxiv.org/abs/1710.09412.
  • Kaiming, H., X. Zhang, S. Ren, and J. Sun. 2015. Deep residual learning for image recognition. CoRr abs/1512.03385 http://arxiv.org/abs/1512.03385.
  • Kim, N. K., and H. Kook Kim. 2021. Self-Training With Noisy Student Model And Semi-Supervised Loss Function For DCASE 2021 Challenge Task 4. Technical Report. DCASE2021 Challenge.
  • Mesaros, A., T. Heittola, and T. Virtanen. 2016. Metrics for polyphonic sound event detection. Applied Sciences 6 (6):162. doi: 10.3390/app6060162.
  • Miyazaki, K., T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, and K. Takeda. 2020. Convolution-Augmented Transformer For Semi-Supervised Sound Event Detection. Technical Report. DCASE2020 Challenge.
  • Nam, H., K. Byeong-Yun, G.T. Lee, S.H. Kim, W.H. Jung, S.M. Choi, and Y.H. Park. 2021. Heavily Augmented Sound Event Detection utilizing Weak Predictions. Technical Report. DCASE2021 Challenge.
  • Park, D. S., W. Chan, Y. Zhang, C.C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le2019SpecAugment: a simple data augmentation method for automatic speech recognitionInterspeech 2019sepISCA. http://doi.org/10.21437/2Finterspeech.2019-2680
  • Sam, S., and A. M. Rush. 2020. Pre-trained summarization distillation. CoRr abs/2010.13002 https://arxiv.org/abs/2010.13002.
  • Serizel, R., N. Turpault, A. Shah, and J. Salamon. 2020. Sound event detection in synthetic domestic environments. In ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, Barcelona, Spain, May. https://hal.inria.fr/hal-02355573.
  • Simonyan, K., and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations San Diego.
  • Tarvainen, A., and H. Valpola. 2017. Weight-averaged consistency targets improve semi-supervised deep learning results. CoRr abs/1703.01780 http://arxiv.org/abs/1703.01780.
  • Turc, I., M.W. Chang, K. Lee, and K. Toutanova. 2019. Well-read students learn better: on the importance of pre-training compact models. arXiv preprint arXiv:1908 08962.
  • Turpault, N., R. Serizel, A. Parag Shah, and J. Salamon. 2019. Sound event detection in domestic environments with weakly labeled data and soundscape synthesis. In Workshop on Detection and Classification of Acoustic Scenes and Events, New York City, United States, October. https://hal.inria.fr/hal-02160855.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need Advances in Neural Information Processing Systems Vancouver. 30.
  • Verma, V., A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz. 2019. Interpolation consistency training for semi-supervised learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI Macao’19, 3635–41. AAAI Press.
  • Xiang, L., W. Wang, H. Xiaolin, and J. Yang. 2019. Selective kernel networks. CoRr abs/1903.06586 http://arxiv.org/abs/1903.06586.
  • Yang, L., J. Hao, Z. Hou, and W. Peng. 2020. Two-Stage domain adaptation for sound event detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan, November, 230–34.
  • Zheng, X., H. Chen, and Y. Song. 2021. Zheng USTC Team’s Submission For DCASE2021 Task4 – Semi-Supervised Sound Event Detection. Technical Report. DCASE2021 Challenge.