Search in:

Advanced search

Journal of Experimental & Theoretical Artificial Intelligence Volume 36, 2024 - Issue 2

Submit an article Journal homepage

196

Views

CrossRef citations to date

Altmetric

Research Article

Modified layer deep convolution neural network for text-independent speaker recognition

V Karthikeyana Department of Electronics and Communication Engineering, Kalasalingam Institute of Technology, Krishnankoil, Tamilnadu, IndiaCorrespondence[email protected]

Suja Priyadharsini Sb Department of Electronics and Communication Engineering, Anna University Regional Campus-Tirunelveli, Tirunelveli, Tamilnadu, India

https://orcid.org/0000-0002-3926-5263

Pages 273-285 | Received 23 Aug 2020, Accepted 16 Jun 2022, Published online: 09 Jul 2022

Cite this article
https://doi.org/10.1080/0952813X.2022.2092560
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Ai, O. C., Hariharan, M., Yaacob, S., & Chee, L. S. (2012). Classification of speech dysfluencies with MFCC and LPCC features. Expert Systems with Applications, 39(2), 2157–2165. https://doi.org/10.1016/j.eswa.2011.07.065
Web of Science ®Google Scholar
Bhattacharya, G., Alam, J., & Kenny, P. (2017). Deep speaker embeddings for short-duration speaker verification. Proceedings of Interspeech 2017 (pp. 1517–1521). https://doi.org/10.21437/Interspeech.2017-1575
Google Scholar
Billeb, S., Rathgeb, C., Reininger, H., Kasper, K., & Busch, C. (2015). Biometric template protection for speaker recognition based on universal background models. IET Biometrics, 4(2), 116–126. https://doi.org/10.1049/iet-bmt.2014.0031
Web of Science ®Google Scholar
Bunrit, S., Inkian, T., Kerdprasop, N., & Kerdprasop, K. (2019). Text-independent speaker identification using deep learning model of convolution neural network. International Journal of Machine Learning and Computing, 9(2), 143–148. https://doi.org/10.18178/IJMLC.2019.9.2.778
Google Scholar
Cai, W., Chen, J., & Li, M. (2018). Exploring the encoding layer and loss function in end-to-end speaker and language recognition system. Odyssey. https://doi.org/10.21437/Odyssey.2018-11
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798. https://doi.org/10.1109/TASL.2010.2064307
Google Scholar
Dinkel, H., Chen, N., Qian, Y., & Yu, K. (2017). End-to-end spoofing detection with raw waveform CLDNNS. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4860–4864). https://doi.org/10.1109/ICASSP.2017. 7953080.
Google Scholar
Garcia-Romero, D., & Espy-Wilson, C. (2011). Analysis of i-vector length normalization in speaker recognition systems. 12th Annual Conference of the INTERSPEECH (pp. 249–252). https://doi.org/10.21437/Interspeech.2011-53.
Google Scholar
Hajibabaei, M., & Dai, D. (2018). unified hypersphere embedding for speaker recognition. ArXiv, abs/1807.08312.
Google Scholar
Heigold, G., Moreno, I., Bengio, S., & Shazeer, N. M. (2016). End-to-end text-dependent speaker verification. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5115–5119). DOI: https://doi.org/10.1109/ICASSP.2016.7472652
Google Scholar
Hu, H., Tang, B., Gong, X., Wei, W., & Wang, H. Intelligent fault diagnosis of the high-speed train with big data based on deep neural networks. (2017). IEEE Transactions on Industrial Informatics, 13(4), 2106–2116. 10.1109/ TII.2017.2683528. https://doi.org/10.1109/TII.2017.2683528
Web of Science ®Google Scholar
Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits and Systems for Video Technology, 14(1), 4–20. https://doi.org/10.1109/TCSVT.2003.818349
Web of Science ®Google Scholar
Kabal, P., & Ramachandran, R. The computation of line spectral frequencies using chebyshev polynomials. acoustics, speech and signal processing. (1987). IEEE Transactions On ASSP, 34(6), 1419–1426. 10.1109/ TASSP.1986.1164983. https://doi.org/10.1109/TASSP.1986.1164983
Google Scholar
Karthikeyan, V., & Suja priyadharsini, S. (2021). A strong hybrid adaboost classification algorithm for speaker recognition. Sādhanā, 46(3), 1–19. https://doi.org/10.1007/s12046-021-01649-6
Web of Science ®Google Scholar
Karthikeyan, V., & Suja Priyadharsini, S. (2022). Hybrid machine learning classification scheme for speaker identification. Journal of Forensic Sciences, 46(3), 1033–1048. https://doi.org/10.1111/1556-4029.15006
Google Scholar
Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors, in odyssey, 2010.
Google Scholar
Kenny, P., Stafylakis, T., Ouellet, P., Gupta, V., & Alam, M. J. (2014). Deep neural networks for extracting baum-welch statistics for speaker recognition Odyssey, The Speaker Lang. Recognition workshop Finland, vol 2014. (pp. 293–298).
Google Scholar
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40. https://doi.org/10.1016/j.specom.2009.08.009
Web of Science ®Google Scholar
Li, C., Ma, X., Jiang, B., Li, X., Zhang, X., Liu, X., Cao, Y., Kannan, A., & Zhu, Z. (2017). Deep Speaker: An end-to-end neural speaker embedding system. ArXiv, abs/1705.02304.
Google Scholar
Martinez, J., Perez, H., Escamilla, E., & Suzuki, M. M. (2012). Speaker recognition using mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. CONIELECOMP 2012, 22nd International Conference On Electrical Communications And Computers (pp. 248–251). https://doi.org/10.1109/CONIELECOMP. 2012.6189918.
Google Scholar
Masum, M., & Shahriar, H. (2020). TL-NID: Deep neural network with transfer learning for network intrusion detection. 2020 15th International Conference For Internet Technology And Secured Transactions (ICITST) (pp. 1–7). 10.23919/ ICITST51030.2020.9351317.
Google Scholar
Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J. M., Courville, A. C., & Bengio, Y. (2017). SampleRNN: An unconditional end-to-end neural audio generation model. ArXiv, Abs/1612.07837.
Google Scholar
Muckenhirn, H., Magimai.-Doss, M., & Marcel, S. (2018). Towards directly modeling raw speech signal for speaker verification using CNNS. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4884–4888). https://doi.org/10.1109/ICASSP.2018.8462165.
Google Scholar
Nagrani, A., Chung, J. S., & Zisserman, A. (2017). VoxCeleb: A large-scale speaker identification dataset. INTERSPEECH.
Google Scholar
Palaz, D., Magimai-Doss, M., & Collobert, R. (2015). Analysis of CNN-based speech recognition system using raw speech as input. Proceedings of. Interspeech 2015 (pp. 11–15). https://doi.org/10.21437/Interspeech.2015-3
Google Scholar
Prabhakar, S., Pankanti, S., & Jain, A. (2003). Biometric recognition: Security And privacy concerns. Security & Privacy, IEEE, 1(2), 33–42.
Google Scholar
Ravanelli, M., Brakel, P., Omologo, M., & Bengio, Y. (2018). Light gated recurrent units for speech recognition. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(2), 92–102. https://doi.org/10.1109/TETCI.2017.2762739
Google Scholar
Richardson, F., Reynolds, D. A., & Dehak, N. (2015,a). A unified deep neural network for speaker and language recognition. Proceedings of Interspeech 2015 (pp. 1146–1150). https://doi.org/10.21437/Interspeech.2015-299.
Google Scholar
Richardson, F., Reynolds, D., & Dehak, N. (2015,b). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675. https://doi.org/10.1109/LSP.2015.2420092
Web of Science ®Google Scholar
Sadjadi, S. O., Ganapathy, S., & Pelecanos, J. W. (2016). The IBM 2016 speaker recognition system. Odyssey. arXiv:1602.07291.
Google Scholar
Salehghaffari, H. (2018). Speaker verification using convolutional neural networks. abs/1803.05427. ArXiv.
Google Scholar
Singh, S., & Rajan, E. (2011). Vector quantization approach for speaker recognition using MFCC and inverted MFCC. International Journal of Computer Applications, 17(1), 1–7. https://doi.org/10.5120/2188-2774
Google Scholar
Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., & Khudanpur, S. (2016). Deep neural network-based speaker embeddings for end-to-end speaker verification. 2016 IEEE Spoken Language Technology Workshop (SLT) (pp. 165–170). https://doi.org/10.1109/SLT.2016.7846260
Google Scholar
Snyder, D., Garcia-Romero, D., Povey, D., & Khudanpur, S. (2017). Deep neural network embeddings for text-independent speaker verification. Proceedings of Interspeech 2017 (pp. 999–1003). https://doi.org/10.21437/Interspeech.2017-620.
Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., & Khudanpur, S. (2018). X-Vectors: Robust DNN embeddings for speaker recognition. 2018 IEEE International Conference On Acoustics, Speech And Signal Processing (ICASSP) (pp. 5329–5333). https://doi.org/10.1109/ICASSP.2018.8461375.
Google Scholar
Tiwari, V. (2010). MFCC and its applications in speaker recognition. International Journal on Emerging Technologies, 1(1), 19–22.
Google Scholar
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Press (pp. 5200–5204). https://doi.org/10.1109/ICASSP. 2016.7472669
Google Scholar
Variani, E., Lei, X., McDermott, E., Lopez-Moreno, I., & Gonzalez-Dominguez, J. (2014) Deep neural networks for small footprint text-dependent speaker verification. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4052–4056). https://doi.org/10.1109/ICASSP.2014.6854363.
Google Scholar
Wan, L., Wang, Q., Papir, A., & Lopez-Moreno, I., (2018). Generalized end-to-end loss for speaker verification. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4879–4883). https://doi.org/10.1109/ICASSP.2018.8462665
Google Scholar
Xie, W., Nagrani, A., Chung, J. S., & Zisserman, A. (2019). Utterance-level aggregaton for speaker recognition in the wild. IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5791–5795). https://arxiv.org/abs/1902.10107.
Google Scholar
Yaman, S., Pelecanos, J., & Sarikaya, R. (2012). Bottleneck features for speaker recognition. IEEE Odyssey, 12(12), 105–108.
Google Scholar
Ye, F., & Yang, J. A. (2021). Deep neural network model for speaker identification. Applied Sciences, 11(8), 3603. https://doi.org/10.3390/app11083603
Google Scholar
Yujin, Y., Peihua, Z., & Qun, Z. (2010). Research of speaker recognition based on combination of LPCC and MFCC, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, vol 3, pp. 765–767.
Google Scholar
Zhang, C., Koishida, K., & Hansen, J. H. L. (2018). Text-independent speaker verification based on triplet convolutional neural network embeddings. IEEE Transactions on Audio, Speech, and Language, 26 (9), 1633–1644. DOI https://doi.org/10.1109/TASLP.2018.2831456
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Modified layer deep convolution neural network for text-independent speaker recognition

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Modified layer deep convolution neural network for text-independent speaker recognition

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date