3,265
Views
8
CrossRef citations to date
0
Altmetric
COMPUTER SCIENCE

Neural architectures for gender detection and speaker identification

ORCID Icon, ORCID Icon, & | (Reviewing editor)
Article: 1727168 | Received 22 Oct 2019, Accepted 30 Jan 2020, Published online: 11 Feb 2020

Figures & data

Figure 1. An original audio signal

Figure 1. An original audio signal

Figure 2. MFCC feature of the audio signal

Figure 2. MFCC feature of the audio signal

Table 1. Data sets for gender detection

Table 2. Data sets for speaker identification. The number of each speaker’s audios are shown below at corresponding each Speaker ID

Table 3. Hyper-parameters of MLP used for both tasks

Table 4. Hyper-parameters of CNN used for both tasks

Figure 3. Distribution of audios for gender and speakers

Figure 3. Distribution of audios for gender and speakers

Figure 4. Training curve of MLP and CNN for gender detection

Figure 4. Training curve of MLP and CNN for gender detection

Table 5. Results for gender recognition with different features: L denotes for using the normalized fattened long mfcc vector; G denotes for using the z-score and Gramian matrix transformation. P is precision, R is recall, and F1 is F1-score

Table 6. Results for speaker identification

Figure 5. Training curve of MLP and CNN for speaker identification

Figure 5. Training curve of MLP and CNN for speaker identification

Figure 6. Results for gender detection after adding the noise signal to the test set

Figure 6. Results for gender detection after adding the noise signal to the test set

Figure 7. Results for speaker identification after adding the noise signal to the test set

Figure 7. Results for speaker identification after adding the noise signal to the test set

Figure 8. T-SNE visualization of the intermediate output of MLP and CNN models for test set after model training

Figure 8. T-SNE visualization of the intermediate output of MLP and CNN models for test set after model training