Speech feature extraction using linear Chirplet transform and its applications*

Hao Duc Doa Vietnam National University, Ho Chi Minh City, Vietnam;b University of Science, Ho Chi Minh City, Vietnam;c FPT University, Ho Chi Minh City, VietnamCorrespondence[email protected]

https://orcid.org/0000-0002-9014-1506 View further author information

Duc Thanh Chaua Vietnam National University, Ho Chi Minh City, Vietnam;b University of Science, Ho Chi Minh City, VietnamView further author information

Son Thai Trana Vietnam National University, Ho Chi Minh City, Vietnam;b University of Science, Ho Chi Minh City, VietnamView further author information

Pages 376-391 | Received 12 Jan 2023, Accepted 21 Apr 2023, Published online: 03 May 2023

Cite this article
https://doi.org/10.1080/24751839.2023.2207267
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

References

Cowie, Roddy, & Douglas-Cowie, Ellen (1996). Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In Proceeding of 4th international conference on spoken language processing (ICSLP '96) (Vol. 3, pp. 1989–1992). https://www.isca-speech.org/archive/icslp_1996/cowie96_icslp.html
Google Scholar
Do, H. D., Chau, D. T., Nguyen, D. D., & Tran, S. T. (2021). Enhancing speech signal features with linear envelope subtraction. In K. Wojtkiewicz, J. Treur, E. Pimenidis, and M. Maleszka (Eds.), Advances in computational collective intelligence (pp. 313–323). Springer International Publishing.
Google Scholar
Do, H. D., Chau, D. T., & Tran, S. T. (2022). Speech representation using linear chirplet transform and its application in speaker-related recognition. In N. T. Nguyen, Y. Manolopoulos, R. Chbeir, A. Kozierkiewicz, and B. Trawiński (Eds.), Computational collective intelligence (pp. 719–729). Springer International Publishing.
Google Scholar
Do, H. D., Tran, S. T., & Chau, D. T. (2020a). Speech separation in the frequency domain with autoencoder. The Journal of Communication, 15, 841–848. https://doi.org/10.12720/jcm.15.11.841-848
Google Scholar
Do, H. D., Tran, S. T., & Chau, D. T. (2020b). Speech source separation using variational autoencoder and bandpass filter. IEEE Access, 8, 156219–156231. https://doi.org/10.1109/Access.6287639
Google Scholar
Do, H. D., Tran, S. T., & Chau, D. T. (2020c). A variational autoencoder approach for speech signal separation. In N. T. Nguyen, B. H. Hoang, C. P. Huynh, D. Hwang, B. Trawiński, and G. Vossen (Eds.), Computational collective intelligence (pp. 558–567). Springer International Publishing.
Google Scholar
Fisher, W. (1986). The darpa speech recognition research database: specifications and status. In Proceedings of DARPA workshop on speech recognition (Vol. 1, pp. 93–99).
Google Scholar
Gulati, A., Qin, J., Chiu, C. C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y., & Pang, R. (2020). Conformer: convolution-augmented transformer for speech recognition. preprint arXiv:2005.08100.
Google Scholar
Jaitly, N., Le, Q. V., Vinyals, O., Sutskever, I., Sussillo, D., & Bengio, S. (2016). An online sequence-to-sequence model using partial conditioning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29). Curran Associates, Inc.,.
Google Scholar
Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: a review. International Journal of Speech Technology, 15(2), 99–117. https://doi.org/10.1007/s10772-011-9125-1
Google Scholar
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jacke, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541
Web of Science ®Google Scholar
Liu, Y., An, H., & Bian, S. (2020). Hilbert-huang transform and the application. In 2020 IEEE international conference on artificial intelligence and information systems (ICAIIS) (pp. 534–539). https://ieeexplore.ieee.org/document/9194944
Google Scholar
Luong, H. T., & Vu, H. Q. (2016). A non-expert kaldi recipe for vietnamese speech recognition system. In Proceedings of the 3rd international workshop on worldwide language service infrastructure (pp. 51–55). https://aclanthology.org/W16-5207/
Google Scholar
Mann, S., & Haykin, S. (1991). The chirplet transform: a generalization of gabor's logon transform. In Vision interface (pp. 205–212). https://www.semanticscholar.org/paper/The-Chirplet-Transform-%3A-A-Generalization-of-Gabor-Mann-Haykin/a47d6f83be87c3874b188b3e6a2fd94ab8617189
Google Scholar
Mann, S., & Haykin, S. (1995). The chirplet transform: physical considerations. IEEE Transactions on Signal Processing, 43(11), 2745–2761. https://doi.org/10.1109/78.482123
Web of Science ®Google Scholar
Mihovilovic, D., & Bracewell, R. N. (1991). Adaptive chirplet representation of signals on time-frequency plane. Electronics Letters, 27(13), 1159–1161. https://doi.org/10.1049/el:19910723
Web of Science ®Google Scholar
Nwe, T., Foo, S., & De Silva, L. (2003). Detection of stress and emotion in speech using traditional and fft based log energy features. In 4th International conference on information, communications and signal processing, 2003 and the 4th pacific rim conference on multimedia, proceedings of the 2003 joint (Vol. 3, pp. 1619–1623). https://ieeexplore.ieee.org/document/1292741
Google Scholar
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206–5210). https://ieeexplore.ieee.org/document/7178964
Google Scholar
Peng, Z. K., Meng, G., Chu, F. L., Lang, Z. Q., Zhang, W. M., & Yang, Y. (2011). Polynomial chirplet transform with application to instantaneous frequency estimation. IEEE Transactions on Instrumentation and Measurement, 60(9), 3222–3229. https://doi.org/10.1109/TIM.2011.2124770
Web of Science ®Google Scholar
Tzirakis, P., Zhang, J., & Schuller, B. W. (2018). End-to-end speech emotion recognition using deep neural networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5089–5093). https://ieeexplore.ieee.org/document/8462677
Google Scholar
Yang, Y., Peng, Z. K., Dong, X. J., Zhang, W. M., & Meng, G. (2014). General parameterized time-frequency transform. IEEE Transactions on Signal Processing, 62(11), 2751–2764. https://doi.org/10.1109/TSP.78
Web of Science ®Google Scholar
Yu, G., & Zhou, Y. (2016). General linear chirplet transform. Mechanical Systems and Signal Processing, 70-71, 958–973. https://doi.org/10.1016/j.ymssp.2015.09.004
Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Speech feature extraction using linear Chirplet transform and its applications*

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Speech feature extraction using linear Chirplet transform and its applications*

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date