Figures & data
Figure 1. Illustration of the Linear Chirplet Transform with three main steps. In step 1, the blue line is rotated by an angle θ, then becomes the green line. In step
2, the green line is shifted by
to be the red line. At the final step, the red line is transformed with STFT.
![Figure 1. Illustration of the Linear Chirplet Transform with three main steps. In step #1, the blue line is rotated by an angle θ, then becomes the green line. In step #2, the green line is shifted by αt0 to be the red line. At the final step, the red line is transformed with STFT.](/cms/asset/a199fcb9-8b0e-45e0-9791-40d06cf2bbe9/tjit_a_2207267_f0001_oc.jpg)
Figure 2. Linear Chirplet Transform with chirp rate . In this case, LCT performs equivalent with the Fourier transform.
![Figure 2. Linear Chirplet Transform with chirp rate α=0. In this case, LCT performs equivalent with the Fourier transform.](/cms/asset/b37fcc4b-00c9-4531-b87b-d0431eb3b0ce/tjit_a_2207267_f0002_oc.jpg)
Figure 3. Linear Chirplet Transform with positive chirp rate . In the TF plane, the signal is highlighted with red color when the frequency achieving high energy increases over time.
![Figure 3. Linear Chirplet Transform with positive chirp rate α=5. In the TF plane, the signal is highlighted with red color when the frequency achieving high energy increases over time.](/cms/asset/1abb016e-8249-4f14-956e-a7e87d887e11/tjit_a_2207267_f0003_oc.jpg)
Algorithm 1. Speech feature extraction using Linear Chirplet Transform
Figure 4. Illustration of 3D time-frequency representation returned by Linear Chirplet Transform for input audio with the content ‘There was a change now’, said a woman.
![Figure 4. Illustration of 3D time-frequency representation returned by Linear Chirplet Transform for input audio with the content ‘There was a change now’, said a woman.](/cms/asset/4eac6295-a5d5-41b3-92b6-e7b937a3d817/tjit_a_2207267_f0004_oc.jpg)
Table 1. Some statistics in TIMIT and VIVOS.
Table 2. Some statistics in LibriSpeech.
Table 3. Speaker gender recognition in TIMIT and VIVOS.
Table 4. Speaker dialect recognition in TIMIT and VIVOS.
Table 5. Speech recognition with different features for English (E) in LibriSpeech and Vietnamese (V) in VIVOS.