196
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Modified layer deep convolution neural network for text-independent speaker recognition

& ORCID Icon
Pages 273-285 | Received 23 Aug 2020, Accepted 16 Jun 2022, Published online: 09 Jul 2022

References

  • Ai, O. C., Hariharan, M., Yaacob, S., & Chee, L. S. (2012). Classification of speech dysfluencies with MFCC and LPCC features. Expert Systems with Applications, 39(2), 2157–2165. https://doi.org/10.1016/j.eswa.2011.07.065
  • Bhattacharya, G., Alam, J., & Kenny, P. (2017). Deep speaker embeddings for short-duration speaker verification. Proceedings of Interspeech 2017 (pp. 1517–1521). https://doi.org/10.21437/Interspeech.2017-1575
  • Billeb, S., Rathgeb, C., Reininger, H., Kasper, K., & Busch, C. (2015). Biometric template protection for speaker recognition based on universal background models. IET Biometrics, 4(2), 116–126. https://doi.org/10.1049/iet-bmt.2014.0031
  • Bunrit, S., Inkian, T., Kerdprasop, N., & Kerdprasop, K. (2019). Text-independent speaker identification using deep learning model of convolution neural network. International Journal of Machine Learning and Computing, 9(2), 143–148. https://doi.org/10.18178/IJMLC.2019.9.2.778
  • Cai, W., Chen, J., & Li, M. (2018). Exploring the encoding layer and loss function in end-to-end speaker and language recognition system. Odyssey. https://doi.org/10.21437/Odyssey.2018-11
  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798. https://doi.org/10.1109/TASL.2010.2064307
  • Dinkel, H., Chen, N., Qian, Y., & Yu, K. (2017). End-to-end spoofing detection with raw waveform CLDNNS. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4860–4864). https://doi.org/10.1109/ICASSP.2017. 7953080.
  • Garcia-Romero, D., & Espy-Wilson, C. (2011). Analysis of i-vector length normalization in speaker recognition systems. 12th Annual Conference of the INTERSPEECH (pp. 249–252). https://doi.org/10.21437/Interspeech.2011-53.
  • Hajibabaei, M., & Dai, D. (2018). unified hypersphere embedding for speaker recognition. ArXiv, abs/1807.08312.
  • Heigold, G., Moreno, I., Bengio, S., & Shazeer, N. M. (2016). End-to-end text-dependent speaker verification. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5115–5119). DOI: https://doi.org/10.1109/ICASSP.2016.7472652
  • Hu, H., Tang, B., Gong, X., Wei, W., & Wang, H. Intelligent fault diagnosis of the high-speed train with big data based on deep neural networks. (2017). IEEE Transactions on Industrial Informatics, 13(4), 2106–2116. 10.1109/ TII.2017.2683528. https://doi.org/10.1109/TII.2017.2683528
  • Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits and Systems for Video Technology, 14(1), 4–20. https://doi.org/10.1109/TCSVT.2003.818349
  • Kabal, P., & Ramachandran, R. The computation of line spectral frequencies using chebyshev polynomials. acoustics, speech and signal processing. (1987). IEEE Transactions On ASSP, 34(6), 1419–1426. 10.1109/ TASSP.1986.1164983. https://doi.org/10.1109/TASSP.1986.1164983
  • Karthikeyan, V., & Suja priyadharsini, S. (2021). A strong hybrid adaboost classification algorithm for speaker recognition. Sādhanā, 46(3), 1–19. https://doi.org/10.1007/s12046-021-01649-6
  • Karthikeyan, V., & Suja Priyadharsini, S. (2022). Hybrid machine learning classification scheme for speaker identification. Journal of Forensic Sciences, 46(3), 1033–1048. https://doi.org/10.1111/1556-4029.15006
  • Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors, in odyssey, 2010.
  • Kenny, P., Stafylakis, T., Ouellet, P., Gupta, V., & Alam, M. J. (2014). Deep neural networks for extracting baum-welch statistics for speaker recognition Odyssey, The Speaker Lang. Recognition workshop Finland, vol 2014. (pp. 293–298).
  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40. https://doi.org/10.1016/j.specom.2009.08.009
  • Li, C., Ma, X., Jiang, B., Li, X., Zhang, X., Liu, X., Cao, Y., Kannan, A., & Zhu, Z. (2017). Deep Speaker: An end-to-end neural speaker embedding system. ArXiv, abs/1705.02304.
  • Martinez, J., Perez, H., Escamilla, E., & Suzuki, M. M. (2012). Speaker recognition using mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. CONIELECOMP 2012, 22nd International Conference On Electrical Communications And Computers (pp. 248–251). https://doi.org/10.1109/CONIELECOMP. 2012.6189918.
  • Masum, M., & Shahriar, H. (2020). TL-NID: Deep neural network with transfer learning for network intrusion detection. 2020 15th International Conference For Internet Technology And Secured Transactions (ICITST) (pp. 1–7). 10.23919/ ICITST51030.2020.9351317.
  • Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J. M., Courville, A. C., & Bengio, Y. (2017). SampleRNN: An unconditional end-to-end neural audio generation model. ArXiv, Abs/1612.07837.
  • Muckenhirn, H., Magimai.-Doss, M., & Marcel, S. (2018). Towards directly modeling raw speech signal for speaker verification using CNNS. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4884–4888). https://doi.org/10.1109/ICASSP.2018.8462165.
  • Nagrani, A., Chung, J. S., & Zisserman, A. (2017). VoxCeleb: A large-scale speaker identification dataset. INTERSPEECH.
  • Palaz, D., Magimai-Doss, M., & Collobert, R. (2015). Analysis of CNN-based speech recognition system using raw speech as input. Proceedings of. Interspeech 2015 (pp. 11–15). https://doi.org/10.21437/Interspeech.2015-3
  • Prabhakar, S., Pankanti, S., & Jain, A. (2003). Biometric recognition: Security And privacy concerns. Security & Privacy, IEEE, 1(2), 33–42.
  • Ravanelli, M., Brakel, P., Omologo, M., & Bengio, Y. (2018). Light gated recurrent units for speech recognition. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(2), 92–102. https://doi.org/10.1109/TETCI.2017.2762739
  • Richardson, F., Reynolds, D. A., & Dehak, N. (2015,a). A unified deep neural network for speaker and language recognition. Proceedings of Interspeech 2015 (pp. 1146–1150). https://doi.org/10.21437/Interspeech.2015-299.
  • Richardson, F., Reynolds, D., & Dehak, N. (2015,b). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675. https://doi.org/10.1109/LSP.2015.2420092
  • Sadjadi, S. O., Ganapathy, S., & Pelecanos, J. W. (2016). The IBM 2016 speaker recognition system. Odyssey. arXiv:1602.07291.
  • Salehghaffari, H. (2018). Speaker verification using convolutional neural networks. abs/1803.05427. ArXiv.
  • Singh, S., & Rajan, E. (2011). Vector quantization approach for speaker recognition using MFCC and inverted MFCC. International Journal of Computer Applications, 17(1), 1–7. https://doi.org/10.5120/2188-2774
  • Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., & Khudanpur, S. (2016). Deep neural network-based speaker embeddings for end-to-end speaker verification. 2016 IEEE Spoken Language Technology Workshop (SLT) (pp. 165–170). https://doi.org/10.1109/SLT.2016.7846260
  • Snyder, D., Garcia-Romero, D., Povey, D., & Khudanpur, S. (2017). Deep neural network embeddings for text-independent speaker verification. Proceedings of Interspeech 2017 (pp. 999–1003). https://doi.org/10.21437/Interspeech.2017-620.
  • Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., & Khudanpur, S. (2018). X-Vectors: Robust DNN embeddings for speaker recognition. 2018 IEEE International Conference On Acoustics, Speech And Signal Processing (ICASSP) (pp. 5329–5333). https://doi.org/10.1109/ICASSP.2018.8461375.
  • Tiwari, V. (2010). MFCC and its applications in speaker recognition. International Journal on Emerging Technologies, 1(1), 19–22.
  • Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Press (pp. 5200–5204). https://doi.org/10.1109/ICASSP. 2016.7472669
  • Variani, E., Lei, X., McDermott, E., Lopez-Moreno, I., & Gonzalez-Dominguez, J. (2014) Deep neural networks for small footprint text-dependent speaker verification. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4052–4056). https://doi.org/10.1109/ICASSP.2014.6854363.
  • Wan, L., Wang, Q., Papir, A., & Lopez-Moreno, I., (2018). Generalized end-to-end loss for speaker verification. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4879–4883). https://doi.org/10.1109/ICASSP.2018.8462665
  • Xie, W., Nagrani, A., Chung, J. S., & Zisserman, A. (2019). Utterance-level aggregaton for speaker recognition in the wild. IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5791–5795). https://arxiv.org/abs/1902.10107.
  • Yaman, S., Pelecanos, J., & Sarikaya, R. (2012). Bottleneck features for speaker recognition. IEEE Odyssey, 12(12), 105–108.
  • Ye, F., & Yang, J. A. (2021). Deep neural network model for speaker identification. Applied Sciences, 11(8), 3603. https://doi.org/10.3390/app11083603
  • Yujin, Y., Peihua, Z., & Qun, Z. (2010). Research of speaker recognition based on combination of LPCC and MFCC, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, vol 3, pp. 765–767.
  • Zhang, C., Koishida, K., & Hansen, J. H. L. (2018). Text-independent speaker verification based on triplet convolutional neural network embeddings. IEEE Transactions on Audio, Speech, and Language, 26 (9), 1633–1644. DOI https://doi.org/10.1109/TASLP.2018.2831456

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.