816
Views
0
CrossRef citations to date
0
Altmetric
Articles

Speech feature extraction using linear Chirplet transform and its applications*

ORCID Icon, &
Pages 376-391 | Received 12 Jan 2023, Accepted 21 Apr 2023, Published online: 03 May 2023

Figures & data

Figure 1. Illustration of the Linear Chirplet Transform with three main steps. In step #1, the blue line is rotated by an angle θ, then becomes the green line. In step #2, the green line is shifted by αt0 to be the red line. At the final step, the red line is transformed with STFT.

Figure 1. Illustration of the Linear Chirplet Transform with three main steps. In step #1, the blue line is rotated by an angle θ, then becomes the green line. In step #2, the green line is shifted by αt0 to be the red line. At the final step, the red line is transformed with STFT.

Figure 2. Linear Chirplet Transform with chirp rate α=0. In this case, LCT performs equivalent with the Fourier transform.

Figure 2. Linear Chirplet Transform with chirp rate α=0. In this case, LCT performs equivalent with the Fourier transform.

Figure 3. Linear Chirplet Transform with positive chirp rate α=5. In the TF plane, the signal is highlighted with red color when the frequency achieving high energy increases over time.

Figure 3. Linear Chirplet Transform with positive chirp rate α=5. In the TF plane, the signal is highlighted with red color when the frequency achieving high energy increases over time.

Algorithm 1. Speech feature extraction using Linear Chirplet Transform

Figure 4. Illustration of 3D time-frequency representation returned by Linear Chirplet Transform for input audio with the content ‘There was a change now’, said a woman.

Figure 4. Illustration of 3D time-frequency representation returned by Linear Chirplet Transform for input audio with the content ‘There was a change now’, said a woman.

Table 1. Some statistics in TIMIT and VIVOS.

Table 2. Some statistics in LibriSpeech.

Table 3. Speaker gender recognition in TIMIT and VIVOS.

Table 4. Speaker dialect recognition in TIMIT and VIVOS.

Table 5. Speech recognition with different features for English (E) in LibriSpeech and Vietnamese (V) in VIVOS.