3,265
Views
8
CrossRef citations to date
0
Altmetric
COMPUTER SCIENCE

Neural architectures for gender detection and speaker identification

ORCID Icon, ORCID Icon, & | (Reviewing editor)
Article: 1727168 | Received 22 Oct 2019, Accepted 30 Jan 2020, Published online: 11 Feb 2020

References

  • Auer, P., Burgsteiner, H., & Maass, W. (2008, Jun). A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Networks: the Official Journal of the International Neural Network Society, 21(5), 786–13. doi:10.1016/j.neunet.2007.12.036
  • Collobert, R., & Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, Helsenki, Finland. pp. 160–167. ICML’08, ACM, New York, NY (2008).
  • Cunningham, P., & Delany, S. (2007, April). k-nearest neighbour classifiers. Multiple Classifier System, 1–17.
  • Darwiche, A. (2010, December). Bayesian networks. Communications of the ACM, 53(12), 80–90. doi:10.1145/1859204
  • Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing. \ 19(4), 1–11.
  • Du, S. S., Lee, J. D., Li, H., Wang, L., & Zhai, X. (2018). Gradient descent nds global minima of deep neural networks. Proceedings of the 36th International Conference on Machine Learning, Long Beach: California, 1–45. CoRR abs/1811.03804.
  • Freund, Y., & Schapire, R. E. (1999, Dec). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296. doi:10.1023/A:1007662407062
  • Girdhar, R., Carreira, J., Doersch, C., & Zisserman, A.: Video action transformer network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach: California (June 2019).
  • Ho, T. K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition. 1, 278–283. ICDAR’95, Montreal, Quebec, Canada, IEEE Computer Society, Washington, DC(1995).
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, Massachusetts Institute of Technology Press, 25, 1097–1105. Curran Associates, Inc.
  • Lee, H. S., Tsao, Y., Wang, H. M., & Jeng, S. K.: Clustering-based i-vector formulation for speaker recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Singapore, 1101–1105 (January 2014).
  • Li, C., Ma, X., Jiang, B., Li, X., Zhang, X., Liu, X., … Zhu, Z. (2017). Deep speaker: An end-to-end neural speaker embedding system. Published in ArXiv 2017, CoRR abs/1705.02304.
  • Li, L., Chen, Y., Shi, Y., Tang, Z., & Wang, D. (2017). Deep speaker feature learning for text-independent speaker verification. Published in ArXiv 2017, CoRR abs/1705.03670.
  • Lian, H. C., & Lu, B. L. (2006). Multi-view gender classification using local binary patterns and support vector machines. In J. Wang, Z. Yi, J. M. Zurada, B. L. Lu, & H. Yin (Eds.), Advances in neural networks - ISNN 2006 (pp. 202–209). Heidelberg: Springer Berlin Heidelberg, Berlin.
  • Liew, S. S., Khalil-Hani, M., Radzi, F., & Bakhteri, R. (2016, March). Gender classification: A convolutional neural network approach. Turkish Journal of Electrical Engineering and Computer Sciences, 24, 1248–1264. doi:10.3906/elk-1311-58
  • Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Alimhan, K., Kydyrbekova, A., & Turdalykyzy, T. (2019). Automatic recognition of kazakh speech using deep neural net-works. In N. T. Nguyen, F. L. Gaol, T. P. Hong, & B. Trawinski (Eds.), Intelligent information and database systems (pp. 465–474). Cham: Springer International Publishing.
  • Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Mukhsina, K., Alimhan, K., BabaAli, B., … Akhmetov, B. (2019, January). Continuous speech recognition of kazakh language. ITM Web of Conferences, 24, 01012. doi:10.1051/itmconf/20192401012
  • Naeem, M., Khan, A., Qureshi, S. A., Riaz, N., Zul Kar, S., & Bhutto, A. R. (2013). Gender classification with decision trees. International Journal of Signal Processing, Image Processing and Pattern Recognition 6(1), 165–176. February, 2013
  • Pang, B., Zha, K., Cao, H., Shi, C., & Lu, C.: Deep rnn framework for visual sequential applications. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach: California, 423–432. (June 2019).
  • Qu, M., Bengio, Y., & Tang, J. (2019). GMNN: Graph markov neural networks. Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. CoRR abs/1905.06214.
  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted gaussian mixture models. In Digital signal processing, Academic Press, London, UK, 19–41 (2000) doi:10.1006/dspr.1999.0361
  • Sahidullah, M., & Saha, G. (2012, May). Design, analysis and experimental evaluation of block based transformation in mfcc computation for speaker recognition. Speech Communication, 54, 543–565. doi:10.1016/j.specom.2011.11.004
  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, Manno-Lugano: Switzerland, Published by Elsevier Ltd, 61, 85–117. published online 2014; based on TR arXiv:1404.7828 [cs.NE].
  • Toleu, A., Tolegen, G., & Makazhanov, A.: Character-aware neural morphological disambiguation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 2, 666–671. Association for Computational Linguistics, Vancouver, Canada (July 2017a).
  • Toleu, A., Tolegen, G., & Makazhanov, A.: Character-based deep learning models for token and sentence segmentation. In: Conference: 5th International Conference on Turkic Languages Processing (TurkLang 2017). Kazan, Tatarstan, Russian Feder-ation (October 2017b).
  • Tur, G., Hakkani-tur, D., & Oazer, K. (2003, June). A statistical information extraction system for turkish. Natural Language Engineering, 9(2), 181–210. doi:10.1017/S135132490200284X
  • Van den Oord, A., Dieleman, S., & Schrauwen, B. (2013). Deep content-based music recommendation. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems . 26, 2643–2651. Curran Associates, Inc.
  • Villalba, J., Brummer, N., & Dehak, N. (2017, August). Tied variational autoencoder backends for i-vector speaker recognition, INTERSPEECH 2017, August 20–24, 2017, Stockholm, Sweden, 1004–1008.
  • Youse, M., Youse, M., Fathi, M., & Fogliatto, F. (2019, October). Patient visit forecasting in an emergency department using a deep neural network approach. Kybernetes Ahead-of-print, 46,  643–651.