The development of synthetic child speech in three South African languages

Camryn Terblanchea Division of Communication Sciences and Disorders, University of Cape Town, SACorrespondence[email protected]

https://orcid.org/0000-0002-2854-0346

Tyler T. Schnoorb Department of Linguistics, University of Alberta, Canada

https://orcid.org/0009-0006-5297-655X

Michal Hartya Division of Communication Sciences and Disorders, University of Cape Town, SA

https://orcid.org/0000-0002-6152-5551

Benjamin V. Tuckerb Department of Linguistics, University of Alberta, Canada;c Department of Communication Sciences and Disorders, Northern Arizona University, USA

https://orcid.org/0000-0001-8965-7890

Figures & data

Figure 1. Overview of the Process to Generate Synthetic Child Speech for Each South African Language Using Tacotron 2.

Figure 2. MOS Responses, with Reference to Speaker, Language, and Warm Start Type.

Note. This figure presents MOS responses categorized by speaker type (adult and child), language (Afrikaans/AFR, South African English/SAE, and isiXhosa/XHO), and warm start type (North American English/NAE, and South African/SA). The vertical axis represents MOS responses from 0 (completely unnatural) to 4 (completely natural), while the horizontal axis denotes the different languages. Each language corresponds to a specific combination of speaker and warm start type, providing a comprehensive overview of the perceived speech synthesis quality across these dimensions.

Table 1. Fixed effects coefficients of all the voices.

Download CSV Display Table

Table 2. Fixed effects coefficients of the child voices.

Download CSV Display Table

Figure 3. Tacotron 2 Mel-Spectrogram (a) and Alignment (b) Plots of Synthesized Speech: “The Quick Brown Fox Jumped Over the Lazy Dog”.

Note. The mel-spectrogram is a spectrogram with the mel scale as its y-axis. It is a good indicator of the signal strength at various frequencies in the waveform. The alignment plot is a quick way to visualize a model’s success. A straight diagonal line from the bottom left to the top right is a good indicator that the model is producing something similar to speech.

Supplemental material

2023_0083_Supplemental_Material.docx

Download MS Word (400.7 KB)

The development of synthetic child speech in three South African languages

Table 1. Fixed effects coefficients of all the voices.

Table 2. Fixed effects coefficients of the child voices.

2023_0083_Supplemental_Material.docx

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

The development of synthetic child speech in three South African languages

Figures & data

Table 1. Fixed effects coefficients of all the voices.

Table 2. Fixed effects coefficients of the child voices.

2023_0083_Supplemental_Material.docx

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date