Search in:

IETE Technical Review Volume 38, 2021 - Issue 1: Artificial Intelligence Oriented Information Hiding and Multimedia Forensics

Submit an article Journal homepage

155

Views

CrossRef citations to date

Altmetric

Articles

A Particular Character Speech Synthesis System Based on Deep Learning

Yuan MeiSchool of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei430072, People’s Republic of ChinaView further author information

Deng-pan YeSchool of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei430072, People’s Republic of ChinaView further author information

Shun-zhi JiangSchool of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei430072, People’s Republic of ChinaView further author information

Jia-rui LiuSchool of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei430072, People’s Republic of ChinaView further author information

Pages 184-194 | Published online: 08 Oct 2020

Cite this article
https://doi.org/10.1080/02564602.2020.1824623
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

D. Snyder, P. Ghahremani, D. Povey, et al. “Deep neural network-based speaker embeddings for end-to-end speaker verification,” in Conference: 2016 IEEE Spoken language technology Workshop (SLT), Dec 2016.
Google Scholar
S. O. Arik, M. Chrzanowski, A. Coates, et al. “Deep Voice: Real-time neural text-to-speech.” Feb 2017.
Google Scholar
S. Arik, G. Diamos, A. Gibiansky, et al. “Deep voice 2: Multi-speaker neural text-to-speech,” in NIPS, May 2017.
Google Scholar
T. Capes, P. Coles, A. Conkie, et al., “Siri on-device deep learning-guided unit selection text-to-speech system,” in Interspeech Interpesch, 2017.
Google Scholar
J. Sotelo, S. Mehri, K. Kumar, et al., “Char2wav: End-to-end speech synthesis,” in ICLR workshop, 2017.
Google Scholar
W. Wang, S. Xu, and B. Xu. “First step towards end-to-end parametric TTS synthesis: Generating spectral parameters with neural attention” in Proceedings Interspeech, 2016, pp. 2243–2247.
Google Scholar
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Arxiv preprint arXiv:1409.0473, 2014.
Google Scholar
H. Valbret, E. Moulines, and J. P. Tubach. “Voice transformation using PSOLA technique [C]//Acoustics, speech, and signal processing,” in ICASSP-92. 1992 IEEE International conference on. IEEE, Vol. 11, 1992-06, 1992, pp. 175–187.
Google Scholar
J. Simonin, L. Delophin-poulat, and G. Damnati, “Gaussian density tree structure in a multi-Gaussian HMM-based speech recognition system,” in Proc. Int. Conf. Spoken language processing, 1998.
Google Scholar
M. Tamura, T. Masuko, K. Tokuda, et al., “Speaker adaptation for HMM-based speech synthesis system using MLLR,” in [C]//ESCA/COCOSDA Workshop on speech synthesis. blue Mountains, Australia: ISCA, 1998, pp. 273–276.
Google Scholar
A. V. D. Oord, S. Dieleman, H. Zen, et al. “WaveNet: A generative model for raw audio,” Sep 2016.
Google Scholar
Y. Wang, R. Skerry-Ryan, D. Stanton, et al. “Tacotron: Towards end-to-end speech synthesis,” in Interspeech Interpesch, Aug 2017.
Google Scholar
J. Shen, R. Pang, R. J. Weiss, et al. “Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions,” Dec 2017.
Google Scholar
W. Ping, K. Peng, A. Gibiansky, et al., “Deep voice 3: scaling text-to-speech with convolutional sequence learning,” in ICLR, 2018.
Google Scholar
J. Sotelo, S. Mehri, K. Kumar, et al., “Char2wav: End-to-end speech synthesis,” in ICLR2017 workshop submission, 2017.
Google Scholar
S. Mehri, K. Kumar, I. Gulrajani, et al., “SampleRNN: An unconditional end-to-end neural audio generation model,” in ICLR, 2017.
Google Scholar
Y. Taigman, L. Wolf, A. Polyak,, et al. “Voice synthesis for in-the-wild speakers via a phonological loop,” in Arxiv:1707.06588, 2017.
Google Scholar
X. Gonzalvo, S. Tazari, C. Chan, et al., “Recent advances in Google real-time HMM-driven unit selection synthesizer,” in Interspeech, 2016.
Google Scholar
Y. N. Dauphin, A. Fan, M. Auli, et al. “Language modeling with gated convolutional networks,” in ICML, 2017.
Google Scholar
V. Ashish, S. Noam, P. Niki, et al., “Attention is all you need,” in Arxiv:1706.03762, 2017.
Google Scholar
J. Gehring, M. Auli, D. Grangier, et al., “Convolutional sequence to sequence learning,” in ICML, 2017.
Google Scholar
C. Raffel, M. T. Luong, P. J. Liu, et al., “Online and linear-time attention by enforcing monotonic alignments,” in ICML, 2017.
Google Scholar
J. Lorenzo-Trueba, F. Fang, X. Wang, et al. “Can we steal your vocal identity from the Internet: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data,” Proc. Odyssey 2018 The speaker and language recognition workshop, 2018, pp. 240–247.
Google Scholar
G. K. Anumanchipalli, J. Chartier, and E. F. Chang, “Speech synthesis from neural decoding of spoken sentences,” Nature, 568(7753):493–498, 2019.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A Particular Character Speech Synthesis System Based on Deep Learning

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A Particular Character Speech Synthesis System Based on Deep Learning

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date