Search in:

Cogent Engineering Volume 7, 2020 - Issue 1

Submit an article Journal homepage

Open access

4,837

Views

CrossRef citations to date

Altmetric

COMPUTER SCIENCE

Identification and authentication of user voice using DNN features and i-vector

Kydyrbekova Aizat1 Institute of Information and Computational Technology, Almaty050010, Kazakhstan;3 al-Farabi Kazakh National University, Almaty050040, KazakhstanCorrespondence[email protected]
View further author information

Othman Mohamed2 Department of Communication Tech and Network, Universiti Putra Malaysia, UPM 43400, Serdang, Selangor D.E., MalaysiaView further author information

Mamyrbayev Orken1 Institute of Information and Computational Technology, Almaty050010, Kazakhstan;3 al-Farabi Kazakh National University, Almaty050040, Kazakhstan

https://orcid.org/0000-0001-8318-3794 View further author information

Akhmediyarova Ainur1 Institute of Information and Computational Technology, Almaty050010, Kazakhstan;3 al-Farabi Kazakh National University, Almaty050040, KazakhstanView further author information

Bagashar Zhumazhanov3 al-Farabi Kazakh National University, Almaty050040, KazakhstanView further author information

Duc Pham4 University of Birmingham, UKView further author information

(Reviewing editor)

Article: 1751557 | Received 09 Sep 2019, Accepted 12 Mar 2020, Published online: 21 Apr 2020

Cite this article
https://doi.org/10.1080/23311916.2020.1751557
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Chan, V., Jaitli, N., Le, K., Vinhals O. 2016. Listen, listen and say: A neural network for recognizing conversational speech with a large vocabulary. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4960–21).
Google Scholar
Jun Du, Peng Liu, Frank Soong, Jian-Lai Zhou, Ren-Hua Wang (2006). Speech recognition performance with noise in HMM recognition. Processing Chinese Spoken Language (Lecture Notes in the Field of Computer Science), 4274, 358–369.
Google Scholar
Fine, S., Navratil J., Gopinath R.A. 2001. GMM/SVM hybrid approach to speaker identification. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’01) (pp. 417–420).
Google Scholar
P. Ghahremani, B. Baba Ali, D. Povey, K. Riedhammer, J. Trmal, S. Khudanpur 2014. Algorithm for extracting pitch, configured for automatic speech recognition. In The IEEE International Conference on Acoustics, Speech and Signals Processing (ICASSP) (pp. 2494–2498), Florence, Italy .
Google Scholar
Gemmeke, J. F., Virtanen, T., Hurmalainen, A. (2011). Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Operations on Sound, Speech and Language Processing, 19(7), 2067–2080. https://doi.org/10.1109/TASL.2011.2112350
Web of Science ®Google Scholar
David J. Hand, Robert J. Till. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186. https://doi.org/10.1023/A:1010920819831
Web of Science ®Google Scholar
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimension of data using neural networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647
PubMed Web of Science ®Google Scholar
Juang B.H., Rabiner L.R. (2005) “Automatic Speech Recognition—A Brief History of the Technology,” Elsevier Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 11, 806–819
Google Scholar
Jitendra, A., McCowan, I. (2003). Speech/music segmentation using the functions of entropy and dynamism within the framework of the HMM classification. Speech Communication, 40(3), 351–363. https://doi.org/10.1016/S0167-6393(02)00087-0
Web of Science ®Google Scholar
Kalimoldayev, M. N., Mamyrbayev, O. Z., Kydyrbekova, A. S., Mekebayev, N. O. (2019). Voice verification and identification using i-vector representation. International Journal of Mathematics and Physics, 10(1), 66. https://ijmph.kaznu.kz/index.php/kaznu/article/view/280 https://doi.org/10.26577/ijmph-2019-i1-9
Google Scholar
Kundu, S., Mantena, G., Qian, Y., Tan, T., Delcroix, M., Sim, K. S. 2016. A joint study of the acoustic factor for reliable automatic speech recognition on the basis of deep neural networks. In The IEEE International Conference on acoustics, Voice and Signal Processing (ICASSP) (pp. 5025–5029).
Google Scholar
Ley, Y., Scheffer, N., Ferrer, L., McLaren, M. 2014. Speaker recognition using phonetically aware deep neural network. In Acoustics Speech and Signal Processing (ICASSP), 2014 IEEE International Conference (pp.1695–1699).
Google Scholar
Makovkin, K. A. (2006). Hybrid models: Hidden Markov models and neural networks, their application in speech recognition. In Coll.: Modeling, algorithms and architecture of speech recognition systems (pp. 96–118). EC of the Russian Academy of Sciences Moscow: Computing Center RAS.
Google Scholar
Mamyrbayev, O. Z., Kydyrbekova, A. S., Turdalyuly, M., Mekebaev, N. O. 2019. Review of user identification and authentication methods by voice. In Materials of the scientific conference “Innovative IT and Smart Technologies” : Mat. scientific conf. - Almaty: IITT MON RK, (pp.315–321).
Google Scholar
Mamyrbayev, O. Z., Othman, M., Akhmediyarova, A. T., Kydyrbekova, A. S., Mekebayev, N. O. (2019). Voice verification using -vectors and neural networks with limited training data. Bulletin of the National Academy of Sciences of the RK Issue, 3, 36–43. https://www.researchgate.net/publication/333891112
Google Scholar
Mamyrbayev, O. Z., Turdalyuly, M., Mekebaev, N. O., Kydyrbekova, A. S., Turdalykyzy T., Keylan A. 2019. Automatic recognition of the speech using digital neural networks. In 11th Asian Conference ACIIDS, Indonesia, Proceedings, Part II
Google Scholar
Mangu, L., Brill, E., & Tolke, A. S. (2000). Finding consensus in speech recognition: Minimizing word errors and other applications of confusion networks. Computer Speech & Language, 14(4), 373–400. https://doi.org/10.1006/csla.2000.0152
Web of Science ®Google Scholar
Meuwly D., Drygajlo A. 2001. Forensic Speaker Recognition Based on a Bayesian Framework and Gaussian Mixture Modelling (GMM), in A Speaker Odyssey-The Speaker Rechoping Workshop, Crete, Greece, 52–55, http://www.isca-speech.org/archive .
Google Scholar
Richardson, F., Reynolds, D., & Dehak, N. (2015). The unified deep neural network for speech and language recognition. Preprint arXiv arXiv: 1504.00923.
Google Scholar
Savchenko, V. V. (2009). The phonetic decoding method of words in the task of automatic speech recognition based on the principle of minimum information mismatch. Proceedings of Russian Universities. Radio Electronics,Issue 5. from. 41–49.https://elibrary.ru/item
Google Scholar
Senor, A., & Lopez-Moreno, I. 2014. Improving the independence of DNN carriers using i-vector inputs. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New York: Google Inc. (pp. 225–229).
Google Scholar
Shahamiri, S. R., & Binti Salim, S. S. (2014). Artificial neural networks as speech recognizers for dysarthric speech: Identifying the best- performing set of MFCC parameters and studying a speaker- independent approach. Advanced Engineering Informatics, 28(1), 102–110. https://doi.org/10.1016/j.aei.2014.01.001
Web of Science ®Google Scholar
Valsan, Z., Gavat, I., & Sabach, B. (2002). Statistical and hybrid methods of speech recognition in Romanian. International Journal of Speech Technologies, 5(3), 259–268. https://doi.org/10.1023/A:1020249008539
Google Scholar
Varga, A., & Steeneken, H. J. (1993). Evaluation for automatic speech recognition: II. NOISEX-92: Database and experiment on the study of the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. https://doi.org/10.1016/0167-6393(93)90095-3
Web of Science ®Google Scholar
Wu, S. L., Kingsbury, E., Morgan, N., & Greenberg, S. 1998. Incorporating information from syllable-length time scales into automatic speech recognition. In The IEEE International Conference on Acoustics, Speech and Signal Processing, Seatle, (pp. 721–724).
Google Scholar
Yao, K., Yu., D., Seyde, F., Su, H., Deng, L., & Gong, Y. 2012. Adaptation of context-dependent deep neural networks for automatic speech recognition. IIn: Proceeding of the IEEE Spoken Language Technology Workshop (SLT), (pp. 366–369).
Google Scholar
Young, S., Evermann, G., Gales, M., Hein, T., Kershaw, D., Liu, H. (2006, December). The Book of the CTC. Faculty of Engineering, University of Cambridge, edition of the CTC, version 3.4.
Google Scholar
Yu, K., Liu, G., Ham, S., & Hansen, J. 2014. Spreading uncertainty in foreground analysis for noise-resistant speaker recognition. In The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4017–4021). htps://doi.org/10.1109/ICASSP.2014.6854356
Google Scholar
Yuan, J., & Lieberman, M. (2008). Speaker identification in the SCOTUS building. Journal of the Acoustic Society of America, 123(5), 3878. https://doi.org/10.1121/1.2935783
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Identification and authentication of user voice using DNN features and i-vector

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Identification and authentication of user voice using DNN features and i-vector

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date