4,837
Views
9
CrossRef citations to date
0
Altmetric
COMPUTER SCIENCE

Identification and authentication of user voice using DNN features and i-vector

, , ORCID Icon, & | (Reviewing editor)
Article: 1751557 | Received 09 Sep 2019, Accepted 12 Mar 2020, Published online: 21 Apr 2020

References

  • Chan, V., Jaitli, N., Le, K., Vinhals O. 2016. Listen, listen and say: A neural network for recognizing conversational speech with a large vocabulary. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4960–21).
  • Jun Du, Peng Liu, Frank Soong, Jian-Lai Zhou, Ren-Hua Wang (2006). Speech recognition performance with noise in HMM recognition. Processing Chinese Spoken Language (Lecture Notes in the Field of Computer Science), 4274, 358–369.
  • Fine, S., Navratil J., Gopinath R.A. 2001. GMM/SVM hybrid approach to speaker identification. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’01) (pp. 417–420).
  • P. Ghahremani, B. Baba Ali, D. Povey, K. Riedhammer, J. Trmal, S. Khudanpur 2014. Algorithm for extracting pitch, configured for automatic speech recognition. In The IEEE International Conference on Acoustics, Speech and Signals Processing (ICASSP) (pp. 2494–2498), Florence, Italy .
  • Gemmeke, J. F., Virtanen, T., Hurmalainen, A. (2011). Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Operations on Sound, Speech and Language Processing, 19(7), 2067–2080. https://doi.org/10.1109/TASL.2011.2112350
  • David J. Hand, Robert J. Till. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186. https://doi.org/10.1023/A:1010920819831
  • Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimension of data using neural networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647
  • Juang B.H., Rabiner L.R. (2005) “Automatic Speech Recognition—A Brief History of the Technology,” Elsevier Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 11, 806–819
  • Jitendra, A., McCowan, I. (2003). Speech/music segmentation using the functions of entropy and dynamism within the framework of the HMM classification. Speech Communication, 40(3), 351–363. https://doi.org/10.1016/S0167-6393(02)00087-0
  • Kalimoldayev, M. N., Mamyrbayev, O. Z., Kydyrbekova, A. S., Mekebayev, N. O. (2019). Voice verification and identification using i-vector representation. International Journal of Mathematics and Physics, 10(1), 66. https://ijmph.kaznu.kz/index.php/kaznu/article/view/280 https://doi.org/10.26577/ijmph-2019-i1-9
  • Kundu, S., Mantena, G., Qian, Y., Tan, T., Delcroix, M., Sim, K. S. 2016. A joint study of the acoustic factor for reliable automatic speech recognition on the basis of deep neural networks. In The IEEE International Conference on acoustics, Voice and Signal Processing (ICASSP) (pp. 5025–5029).
  • Ley, Y., Scheffer, N., Ferrer, L., McLaren, M. 2014. Speaker recognition using phonetically aware deep neural network. In Acoustics Speech and Signal Processing (ICASSP), 2014 IEEE International Conference (pp.1695–1699).
  • Makovkin, K. A. (2006). Hybrid models: Hidden Markov models and neural networks, their application in speech recognition. In Coll.: Modeling, algorithms and architecture of speech recognition systems (pp. 96–118). EC of the Russian Academy of Sciences Moscow: Computing Center RAS.
  • Mamyrbayev, O. Z., Kydyrbekova, A. S., Turdalyuly, M., Mekebaev, N. O. 2019. Review of user identification and authentication methods by voice. In Materials of the scientific conference “Innovative IT and Smart Technologies” : Mat. scientific conf. - Almaty: IITT MON RK, (pp.315–321).
  • Mamyrbayev, O. Z., Othman, M., Akhmediyarova, A. T., Kydyrbekova, A. S., Mekebayev, N. O. (2019). Voice verification using -vectors and neural networks with limited training data. Bulletin of the National Academy of Sciences of the RK Issue, 3, 36–43. https://www.researchgate.net/publication/333891112
  • Mamyrbayev, O. Z., Turdalyuly, M., Mekebaev, N. O., Kydyrbekova, A. S., Turdalykyzy T., Keylan A. 2019. Automatic recognition of the speech using digital neural networks. In 11th Asian Conference ACIIDS, Indonesia, Proceedings, Part II
  • Mangu, L., Brill, E., & Tolke, A. S. (2000). Finding consensus in speech recognition: Minimizing word errors and other applications of confusion networks. Computer Speech & Language, 14(4), 373–400. https://doi.org/10.1006/csla.2000.0152
  • Meuwly D., Drygajlo A. 2001. Forensic Speaker Recognition Based on a Bayesian Framework and Gaussian Mixture Modelling (GMM), in A Speaker Odyssey-The Speaker Rechoping Workshop, Crete, Greece, 52–55, http://www.isca-speech.org/archive .
  • Richardson, F., Reynolds, D., & Dehak, N. (2015). The unified deep neural network for speech and language recognition. Preprint arXiv arXiv: 1504.00923.
  • Savchenko, V. V. (2009). The phonetic decoding method of words in the task of automatic speech recognition based on the principle of minimum information mismatch. Proceedings of Russian Universities. Radio Electronics,Issue 5. from. 41–49.https://elibrary.ru/item
  • Senor, A., & Lopez-Moreno, I. 2014. Improving the independence of DNN carriers using i-vector inputs. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New York: Google Inc. (pp. 225–229).
  • Shahamiri, S. R., & Binti Salim, S. S. (2014). Artificial neural networks as speech recognizers for dysarthric speech: Identifying the best- performing set of MFCC parameters and studying a speaker- independent approach. Advanced Engineering Informatics, 28(1), 102–110. https://doi.org/10.1016/j.aei.2014.01.001
  • Valsan, Z., Gavat, I., & Sabach, B. (2002). Statistical and hybrid methods of speech recognition in Romanian. International Journal of Speech Technologies, 5(3), 259–268. https://doi.org/10.1023/A:1020249008539
  • Varga, A., & Steeneken, H. J. (1993). Evaluation for automatic speech recognition: II. NOISEX-92: Database and experiment on the study of the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. https://doi.org/10.1016/0167-6393(93)90095-3
  • Wu, S. L., Kingsbury, E., Morgan, N., & Greenberg, S. 1998. Incorporating information from syllable-length time scales into automatic speech recognition. In The IEEE International Conference on Acoustics, Speech and Signal Processing, Seatle, (pp. 721–724).
  • Yao, K., Yu., D., Seyde, F., Su, H., Deng, L., & Gong, Y. 2012. Adaptation of context-dependent deep neural networks for automatic speech recognition. IIn: Proceeding of the IEEE Spoken Language Technology Workshop (SLT), (pp. 366–369).
  • Young, S., Evermann, G., Gales, M., Hein, T., Kershaw, D., Liu, H. (2006, December). The Book of the CTC. Faculty of Engineering, University of Cambridge, edition of the CTC, version 3.4.
  • Yu, K., Liu, G., Ham, S., & Hansen, J. 2014. Spreading uncertainty in foreground analysis for noise-resistant speaker recognition. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4017–4021). htps://doi.org/10.1109/ICASSP.2014.6854356
  • Yuan, J., & Lieberman, M. (2008). Speaker identification in the SCOTUS building. Journal of the Acoustic Society of America, 123(5), 3878. https://doi.org/10.1121/1.2935783