816
Views
0
CrossRef citations to date
0
Altmetric
Articles

Speech feature extraction using linear Chirplet transform and its applications*

ORCID Icon, &
Pages 376-391 | Received 12 Jan 2023, Accepted 21 Apr 2023, Published online: 03 May 2023

References

  • Cowie, Roddy, & Douglas-Cowie, Ellen (1996). Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In Proceeding of 4th international conference on spoken language processing (ICSLP '96) (Vol. 3, pp. 1989–1992). https://www.isca-speech.org/archive/icslp_1996/cowie96_icslp.html
  • Do, H. D., Chau, D. T., Nguyen, D. D., & Tran, S. T. (2021). Enhancing speech signal features with linear envelope subtraction. In K. Wojtkiewicz, J. Treur, E. Pimenidis, and M. Maleszka (Eds.), Advances in computational collective intelligence (pp. 313–323). Springer International Publishing.
  • Do, H. D., Chau, D. T., & Tran, S. T. (2022). Speech representation using linear chirplet transform and its application in speaker-related recognition. In N. T. Nguyen, Y. Manolopoulos, R. Chbeir, A. Kozierkiewicz, and B. Trawiński (Eds.), Computational collective intelligence (pp. 719–729). Springer International Publishing.
  • Do, H. D., Tran, S. T., & Chau, D. T. (2020a). Speech separation in the frequency domain with autoencoder. The Journal of Communication, 15, 841–848. https://doi.org/10.12720/jcm.15.11.841-848
  • Do, H. D., Tran, S. T., & Chau, D. T. (2020b). Speech source separation using variational autoencoder and bandpass filter. IEEE Access, 8, 156219–156231. https://doi.org/10.1109/Access.6287639
  • Do, H. D., Tran, S. T., & Chau, D. T. (2020c). A variational autoencoder approach for speech signal separation. In N. T. Nguyen, B. H. Hoang, C. P. Huynh, D. Hwang, B. Trawiński, and G. Vossen (Eds.), Computational collective intelligence (pp. 558–567). Springer International Publishing.
  • Fisher, W. (1986). The darpa speech recognition research database: specifications and status. In Proceedings of DARPA workshop on speech recognition (Vol. 1, pp. 93–99).
  • Gulati, A., Qin, J., Chiu, C. C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y., & Pang, R. (2020). Conformer: convolution-augmented transformer for speech recognition. preprint arXiv:2005.08100.
  • Jaitly, N., Le, Q. V., Vinyals, O., Sutskever, I., Sussillo, D., & Bengio, S. (2016). An online sequence-to-sequence model using partial conditioning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29). Curran Associates, Inc.,.
  • Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: a review. International Journal of Speech Technology, 15(2), 99–117. https://doi.org/10.1007/s10772-011-9125-1
  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jacke, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541
  • Liu, Y., An, H., & Bian, S. (2020). Hilbert-huang transform and the application. In 2020 IEEE international conference on artificial intelligence and information systems (ICAIIS) (pp. 534–539). https://ieeexplore.ieee.org/document/9194944
  • Luong, H. T., & Vu, H. Q. (2016). A non-expert kaldi recipe for vietnamese speech recognition system. In Proceedings of the 3rd international workshop on worldwide language service infrastructure (pp. 51–55). https://aclanthology.org/W16-5207/
  • Mann, S., & Haykin, S. (1991). The chirplet transform: a generalization of gabor's logon transform. In Vision interface (pp. 205–212). https://www.semanticscholar.org/paper/The-Chirplet-Transform-%3A-A-Generalization-of-Gabor-Mann-Haykin/a47d6f83be87c3874b188b3e6a2fd94ab8617189
  • Mann, S., & Haykin, S. (1995). The chirplet transform: physical considerations. IEEE Transactions on Signal Processing, 43(11), 2745–2761. https://doi.org/10.1109/78.482123
  • Mihovilovic, D., & Bracewell, R. N. (1991). Adaptive chirplet representation of signals on time-frequency plane. Electronics Letters, 27(13), 1159–1161. https://doi.org/10.1049/el:19910723
  • Nwe, T., Foo, S., & De Silva, L. (2003). Detection of stress and emotion in speech using traditional and fft based log energy features. In 4th International conference on information, communications and signal processing, 2003 and the 4th pacific rim conference on multimedia, proceedings of the 2003 joint (Vol. 3, pp. 1619–1623). https://ieeexplore.ieee.org/document/1292741
  • Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206–5210). https://ieeexplore.ieee.org/document/7178964
  • Peng, Z. K., Meng, G., Chu, F. L., Lang, Z. Q., Zhang, W. M., & Yang, Y. (2011). Polynomial chirplet transform with application to instantaneous frequency estimation. IEEE Transactions on Instrumentation and Measurement, 60(9), 3222–3229. https://doi.org/10.1109/TIM.2011.2124770
  • Tzirakis, P., Zhang, J., & Schuller, B. W. (2018). End-to-end speech emotion recognition using deep neural networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5089–5093). https://ieeexplore.ieee.org/document/8462677
  • Yang, Y., Peng, Z. K., Dong, X. J., Zhang, W. M., & Meng, G. (2014). General parameterized time-frequency transform. IEEE Transactions on Signal Processing, 62(11), 2751–2764. https://doi.org/10.1109/TSP.78
  • Yu, G., & Zhou, Y. (2016). General linear chirplet transform. Mechanical Systems and Signal Processing, 70-71, 958–973. https://doi.org/10.1016/j.ymssp.2015.09.004