Search in:

Advanced search

Augmentative and Alternative Communication Volume 30, 2014 - Issue 3

Submit an article Journal homepage

1,356

Views

CrossRef citations to date

Altmetric

Research Article

Towards Personalized Speech Synthesis for Augmentative and Alternative Communication

Timothy MillsDepartment of Speech Language Pathology and Audiology, Northeastern University, Boston, MA, USA

H. Timothy BunnellNemours Biomedical Research, Wilmington, DE, USA

Rupal PatelDepartment of Speech Language Pathology and Audiology and Computer Science, Northeastern University, Boston, MA, USACorrespondence[email protected]

Pages 226-236 | Received 10 Jul 2013, Accepted 01 Feb 2014, Published online: 15 Jul 2014

Cite this article
https://doi.org/10.3109/07434618.2014.924026
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11, 109–118.
Web of Science ®Google Scholar
Arslan, L. (1999). Speaker transformation algorithm using segmental codebooks (STASC). Speech Communication, 28, 211–226.
Web of Science ®Google Scholar
Bachorowski, J., & Owren, M. (1999). Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. Journal of the Acoustical Society of America, 106, 1054–1063.
PubMed Web of Science ®Google Scholar
Benoît, C., Grice, M., & Hazan, V. (1996). The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Communication, 18, 381–392.
Web of Science ®Google Scholar
Beutnagel, M., Conkie, A., & Syrdal, A. K. (1998, November). Diphone synthesis using unit selection. Paper presented at the 3rd ISCA Speech Synthesis Workshop (SSW3), Blue Mountains, Australia. Retrieved from http://www.isca-speech.org/archive_open/archive_papers/ssw3/ssw3_185.pdf
Google Scholar
Bloomberg, K., & Johnson, H. (1990). A statewide demographic survey of people with severe communication impairments. Augmentative and Alternative Communication, 6, 50–60.
Google Scholar
Bolinger, D. (1961). Contrastive accent and contrastive stress, Language, 37, 83–96.
Web of Science ®Google Scholar
Bolinger, D. (1989). Intonation and its uses. Palo Alto, CA: Stanford University Press.
Google Scholar
Brophy-Arnott, M. B., Newell, A. F., Arnott, J. L., & Condie, D. (1992). A survey of the communication-impaired population of Tayside. European Journal of Disorders of Communication, 25, 159–173.
Google Scholar
Bunnell, H. T. (2010, September). Crafting small databases for unit selection TTS: Effects on intelligibility. Paper presented at the 7th ISCA Speech Synthesis Workshop (SSW7), Kyoto, Japan. Retrieved from http://isw3.naist.jp/∼tomoki/ssw7/www/doc/ssw7_proceedings_rev.pdf
Google Scholar
Bunnell, H. T., Hoskins, S. R., & Yarrington, D. M. (1998, November). A biphone constrained concatenation method for diphone synthesis. Paper presented at the 3rd ISCA Speech Synthesis Workshop (SSW3), Blue Mountains, Australia. Retrieved from http://www.isca-speech.org/archive_open/archive_papers/ssw3/ssw3_171.pdf
Google Scholar
Bunnell, H. T., & Lilley, J. (2007, August). Analysis methods for assessing TTS intelligibility. Paper presented at the 6th ISCA Speech Synthesis Workshop (SSW6), Bonn, Germany. Retrieved from http://www.isca-speech.org/archive_open/archive_papers/ssw6/ssw6_374.pdf
Google Scholar
Bunnell, H. T., & Pennington, C. (2010). Advances in computer speech synthesis and implications for assistive technologies. In J. Mullenix & S. Stern (Eds.), Computer synthesized speech technologies: Tools for aiding impairment (pp . 71–91). Hershey, PA: IGI Global.
Google Scholar
Bunnell, H. T., Pennington, C., Yarrington, D., & Gray, J. (2005). Automatic personal synthetic voice construction. Paper presented at Eurospeech 2005, 89–92. Retrieved from http://www.isca-speech.org/archive/interspeech_2005/i05_0089.html
Google Scholar
Carrell, T. D. (1984). Contributions of fundamental frequency, formant spacing, and glottal waveform to talker identification (Doctoral dissertation). Indiana University, MI, USA.
Google Scholar
Carrell, T. D. (1985). Effects of glottal waveform on the perception of talker sex. Journal of the Acoustical Society of America, 70, S97.
Google Scholar
Chen, Y., Chu, M., Chang, E., Liu, J., & Liu, R. (2003, September). Voice conversion with smoothed GMM and MAP adaptation. Paper presented at Eurospeech 2003, Geneva, Switzerland. Retrieved from: http://www.isca-speech.org/archive/eurospeech_2003/e03_2413.html
Google Scholar
Chiba, T., & Kajiyama, J. (1941). The vowel: Its nature and structure. Tokyo, Japan: Tokyo-Kaiseikan.
Google Scholar
Collins, S. (2000). Men's voices and women's choices. Animal Behavior, 60, 773–780.
PubMed Web of Science ®Google Scholar
Cossette, L. & Duclos, É. (2002). A profile of disability in Canada, 2001 (89-577-XIE). Ottawa, Canada: Statistics Canada. Retrieved from http://www.statcan.gc.ca/pub/89-577-x/pdf/4228016-eng.pdf
Google Scholar
Creer, S., Green, P., Cunningham, S., & Yamagishi, J. (2010). Building personalized synthetic voices for individuals with dysarthria using the HTS toolkit. In J. Mullenix & S. Stern (Eds.), Computer synthesized speech technologies: Tools for aiding impairment (pp . 92–115). Hershey, PA: IGI Global.
Google Scholar
Cruttenden, A. (1981). Falls and rises: meanings and universals. Journal of Linguistics17, 77–91.
Web of Science ®Google Scholar
Cruttenden, A. (1986). Intonation. Cambridge, UK: Cambridge University Press.
Google Scholar
Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964.
Google Scholar
Eatock, J., & Mason, J. (1994, April). A quantitative assessment of the relative speaker discriminating properties of phonemes. Paper presented at the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Adelaide, Australia. doi:10.1109/ICASSP.1994.389337
Google Scholar
Egan, J.P. (1948). Articulation testing methods. The Laryngoscope58, 955–991.
PubMed Web of Science ®Google Scholar
Erro, D., Moreno, A., & Bonafonte, A. (2010a). Voice conversion based on weighted frequency warping. IEEE Transactions on Audio, Speech, and Language Processing, 18, 922–931.
Google Scholar
Erro, D., Moreno, A., & Bonafonte, A. (2010b). INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Transactions on Audio, Speech, and Language Processing, 18, 944–953.
Google Scholar
Fant, G. (1960). Acoustic theory of speech production. The Hague, Netherlands: Mouton & Co.
Google Scholar
Fant, G. (1966). A note on vocal tract size factors and non-uniform F-pattern scaling. Speech Transmission Laboratories Quarterly Progress Status Report,7(4), 22–30. Stockholm, Sweden: KTH Royal Institute of Technology. Retrieved from http://www.speech.kth.se/prod/publications/files/qpsr/1966/1966_7_4_022-030.pdf
Google Scholar
Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratories Quarterly Progress Status Report, 26 (4), 1–13. Stockholm, Sweden: KTH Royal Institute of Technology. Retrieved from http://www.speech.kth.se/prod/publications/files/qpsr/1985/1985_26_4_001-013.pdf
Google Scholar
Feinberg, D. R., Jones, B. C., Little, A. C., Burt, D. M., & Perrett, D. I. (2005). Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Animal Behavior, 69, 561–568.
Web of Science ®Google Scholar
Fellows, J. M., Remez, R. E., & Rubin, P. E. (1997). Perceiving the sex and identity of a talker without natural vocal timbre. Perception and Psychophysics59, 839–849.
PubMedGoogle Scholar
Fitch, W. T., & Giedd, J. (1999). Morphology and development of the human vocal tract: A study using magnetic resonance imaging. Journal of the Acoustical Society of America, 106, 1511–1522.
PubMed Web of Science ®Google Scholar
Gorenflo, C., Gorenflo, D., & Santer, S. A. (1994). Effects of synthetic voice output on attitudes toward the augmented communicator. Journal of Speech and Hearing Research, 37, 64–68.
PubMedGoogle Scholar
Hartman, D., & Danhauer, J. (1976). Perceptual features of speech for males in four perceived age decades. Journal of the Acoustical Society of America, 59, 713–715.
PubMed Web of Science ®Google Scholar
Helander, E., Nurminen, J., & Gabbouj, M. (2008, April). LSF mapping for voice conversion with very small training sets. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV.
Google Scholar
Helander, E., Silén, H., Míguez, J., & Gabbouj, M. (2010, September). Maximum a posteriori voice conversion using sequential Monte Carlo methods. Paper presented at Interspeech 2010, Makuhari, Japan.
Google Scholar
Helander, E., Silén, H., Virtanen, T., & Gabbouj, M. (2012). Voice conversion using dynamic kernel partial least squares regression. IEEE Transactions on Audio, Speech, and Language Processing, 20, 806–817.
Google Scholar
Hertz, S.R. (2006, September). A model of the regularities underlying speaker variation: Evidence from hybrid synthesis. Paper presented at the Ninth International Conference on Spoken Language Processing (ICSLP). Pittsburgh, PA. Retrieved from http://www.novaspeech.com/Documents/interspeech2006.pdf
Google Scholar
Hollien, H., & Klepper, B. (1984). The speaker identification problem. Advances in Forensic Psychology and Psychiatry, 1, 87–111.
Google Scholar
Itoh, K., (1992). Perceptual analysis of speaker identity. In: S. Saito (Ed.), Speech science and technology (pp. 133–145). Burke, VA: IOS press.
Google Scholar
Jassem, W. (1971). Pitch and compass of the speaking voice. Journal of the International Phonetic Association, 1, 59–68.
Google Scholar
Jreige, C., Patel, R., & Bunnell, H. T. (2009). VocaliD: personalizing text-to-speech synthesis for individuals with severe speech impairment. Assets ‘09: Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (pp. 259–260). New York, NY: ACM. doi:10.1145/1639642.1639704
Google Scholar
Kain, A., & Macon, M. W. (1998, May). Spectral voice conversion for text-to-speech synthesis. Paper presented at the IEEE Interational Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seattle, WA. 285–288. doi:10.1109/ICASSP.1998.674423
Google Scholar
Kain, A., & Macon, M. W. (2001, May). Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, UT. doi:10.1109/ICASSP.2001.941039
Google Scholar
Kain, A., Niu, X., Hosom, J.-P., Miao, Q., & van Santen, J. P. H. (2004, June). Formant re-synthesis of dysarthric speech. Paper presented at the 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, PA. Retrieved from http://www.isca-speech.org/archive_open/ssw5/ssw5_025.html
Google Scholar
Kain, A., & van Santen, J. (2009, April). Using speech transformation to increase speech intelligibility for the hearing- and speaking-impaired. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan. doi:10.1109/icassp.2009.4960406
Google Scholar
Kawahara, H., Masuda-Katsuse, I., & de Cheveigne , A. (1999). Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds. Speech Communication, 27, 187–207.
Web of Science ®Google Scholar
King, S. & Karaiskos, V. (2009, September). The Blizzard Challenge 2009. Paper presented at the Blizzard Challenge Workshop, Edinburgh, UK.
Google Scholar
Krieman, J., & Papcun, G. (1991). Comparing discrimination and recognition of unfamiliar voices. Speech Communication, 10, 265–275. doi:10.1016/1067-6393(91)90016-M
Web of Science ®Google Scholar
Kuhn, R, Junqua, J.-C., Nguyen, P., & Niedzielski, N. (2000). Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Acoustics, Speech, and Signal Processing, 8, 695–707.
Google Scholar
Ladefoged, O., & Ladefoged, J. (1980). The ability of listeners to identify voices. UCLA Working Papers in Phonetics, 49, 43–51. Los Angeles, CA: UCLA Phonetics Lab.
Google Scholar
Lass, N. J., Ruscello, D. M., & Lakawicz, J. A. (1988). Listeners’ perceptions of nonspeech characteristics of normal and dysarthric children. Journal of Communication Disorders, 21, 385–391.
PubMed Web of Science ®Google Scholar
Lavner, Y., Gath, I., & Rosenhouse, J. (2000). The effects of acoustic modifications on the identification of familiar voices speaking isolated vowels. Speech Communication, 30, 9–26.
Web of Science ®Google Scholar
Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT Press.
Google Scholar
Lehiste, I. (1976). Suprasegmental features of speech. In N.J. Lass (Ed.), Contemporary issues in experimental phonetics (pp. 225–239). New York, NY: Academic Press.
Google Scholar
Ling Z.-H., Richmond, K., Yamagishi, J., & Wang, R.-H. (2009). Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1171–1185. doi:10.1109/tasl.2009.2014796
Web of Science ®Google Scholar
Ling, Z.-H., Richmond, K., & Yamagishi. J. (2010a). An analysis of HMM-based prediction of articulatory movements. Speech Communication, 52, 834–846.
Web of Science ®Google Scholar
Ling, Z.-H., Richmond, K., & Yamagishi, J. (2010b, September). HMM-based Text-to-Articulation-movement prediction and analysis of critical articulators. Paper presented at Interspeech 2010, Makuhari, Japan. Retrieved from: http://hdl.handle.net/1842.4563.
Google Scholar
Linville, S. (1998). Acoustic correlates of perceived versus actual sexual orientation in men's speech. Folia Phoniatrica et Logopaedica, 50, 35–48.
PubMed Web of Science ®Google Scholar
Masuko, T., Tokuda, K., Kobayashi, T., & Imai, S. (1997, April). Voice characteristics conversion for HMM-based speech synthesis system. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Munich, Germany. doi:10.1109/ ICASSP.2009.4960406
Google Scholar
Matas, J., Mathy-Laikko, P., Beukelman, D., & Legresley, K. (1985). Identifying the nonspeaking population: A demographic study. Augmentative and Alternative Communication, 1, 17–31.
Google Scholar
Monsen, R. B., & Engebretson, A. M. (1977). Study of variations in the male and female glottal wave. Journal of the Acoustical Society of America, 62, 981–993.
PubMed Web of Science ®Google Scholar
Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9, 453–467.
Web of Science ®Google Scholar
Munson, B., McDonald, E. C., DeBoe, N. L., & White, A. R. (2006). The acoustic and perceptual bases of judgments of women and men's sexual orientation from read speech. Journal of Phonetics, 34, 202–240.
Web of Science ®Google Scholar
Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustical Society of America, 93, 1097–1108. doi:10.1121/1.405558
PubMed Web of Science ®Google Scholar
Muthukumar, P.K., Black, A.W., & Bunnell, H.T. (2013, August). Optimizations and fitting procedures for the Liljencrants-Fant model for statistical parametric speech synthesis. Paper presented at InterSpeech 2013, Lyon, France. Retrieved from http://www.isca-speech.org/archive/interspeech_2013/i13_0397.html
Google Scholar
Narendranath, M., Murthy, H. A., Rajendran, S., & Yegnanarayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication16, 207–216.
Web of Science ®Google Scholar
Nass, C., & Lee, K. M. (2001). Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency attraction. Journal of Experimental Psychology: Applied, 7, 171–181.
PubMed Web of Science ®Google Scholar
Netsell, R. (1973). Speech Physiology. In F. Minifie, T. J. Hixon, & F. Williams (Eds.), Normal aspects of speech, hearing, and language (pp. 211–234). Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Nguyen, B. P., & Akagi, M. (2008). Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model. Proceedings of the Second International Conference on Communications and Electronics, 224–229. doi:10.1109/CCE.2008.4578962
Google Scholar
Nolan, F. (1983). The phonetic bases of speaker recognition. Cambridge, UK: Cambridge University Press.
Google Scholar
Nurminen, J, Popa, V., Tian, J., Tang, Y., & Kiss, I. (2006, June). A parametric approach for voice conversion. Paper presented at the TC-STAR Workshop on Speech-to-Speech Translation. Barecelona, Spain. Retrieved from http://www.tcstar.org/pubblicazioni/scientific_publications/Nokia/2006/S2STranslation06_nokia3.pdf
Google Scholar
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5, 42–46.
PubMed Web of Science ®Google Scholar
Patel, R. (2002a). Phonatory control in adults with cerebral palsy and severe dysarthria. Augmentative and Alternative Communication, 18, 2–10.
Google Scholar
Patel, R. (2002b). Prosodic Control in severe dysarthria: Preserved ability to mark the question-statement contrast. Journal of Speech, Language, and Hearing Research, 45, 858–870.
PubMed Web of Science ®Google Scholar
Patel, R. (2003). Acoustic characteristics of the question-statement contrast in severe dysarthria due to cerebral palsy. Journal of Speech, Language, and Hearing Research, 46, 1401–1415.
PubMed Web of Science ®Google Scholar
Patel, R. (2004). The acoustics of contrastive prosody in adults with cerebral palsy. Journal of Medical Speech-Language Pathology, 12, 189–193.
Web of Science ®Google Scholar
Patel, R., & Roden, A. (2008). Intelligibility and attitudes toward a speech synthesizer using dysarthric vocalizations. Journal of Medical Speech-Language Pathology, 16, 243–249.
Google Scholar
Patel, R., & Salata, A. (2006). Using computer games to mediate caregiver-child communication for children with severe dysarthria. Journal of Medical Speech-Language Pathology, 14, 279–284.
Web of Science ®Google Scholar
Patel, R., & Watkins, C. (2007). Stress identification in speakers with dysarthria due to cerebral palsy: An initial report. Journal of Medical Speech-Language Pathology, 15, 149–159.
Web of Science ®Google Scholar
Popa, V., Silen, H., Nurminen, J., & Gabbouj, M. (2012, March). Local linear transformation for voice conversion. Paper presented at the 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan. doi:10.1109/ICASSP.2012.6288922
Google Scholar
Pierrehumbert, J., Bent, T., Munson, B., Bradlow, A. R., & Bailey, J. M. (2004). The influence of sexual orientation on vowel production. Journal of the Acoustical Society of America, 116, 1905–1908.
PubMed Web of Science ®Google Scholar
Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication48, 1243–1261. doi:10.1016/j.specom.2006.06.002
Web of Science ®Google Scholar
Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., & Alku, P. (2011). HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Transactions on Audio, Speech, and Language Processing, 19, 153–165. doi:10.1109/TASL.2010.2045239
Google Scholar
Remez, R. E., Fellowes, J. M., & Rubin, P. E. (1997). Talker identification based on phonetic information. Journal of Experimental Psychology, 23, 651–666.
Google Scholar
Remez, R. E., & Rubin, P. E. (1993). On the intonation of sinusoidal sentences: Contour and pitch height. Journal of the Acoustical Society of America, 94, 1983–1988.
PubMed Web of Science ®Google Scholar
Rentzos, D., Vaseghi, S., Yan, W., & Ho, C.-H. (2004, May). Voice conversion through transformation of spectral and intonation features. Paper presented at the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Montreal, Canada. doi:10.1109/ICASSP.2004.1325912
Google Scholar
Sagisaka, Y. (1988, May). Speech synthesis by rule using an optimal selection of non-uniform synthesis units. Paper presented at the 1988 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New York, NY. doi:10.1109/ICASSP.1988.196677
Google Scholar
Shuang, Z-W., Bakis, R., Shectman, S., Chazan, D., & Qin, Y. (2006, September). Frequency warping based on mapping formant parameters. Paper presented at Interspeech 2006, Pittsburgh, PA. http://www.isca-speech.org/archive/interspeech_2006/i06_1768.html
Google Scholar
Sigafoos, J., Schlosser, R. W., & Sutherland, D. 2013. Augmentative and alternative communication. In: J. H. Stone & M. Blouin (Eds.), International encyclopedia of rehabilitation. Available online: http://cirrie.buffalo.edu/encyclopedia/en/article/50
Google Scholar
Siu, E., Tam, E., Sin, D, Ng, C., Lam, E., Chui, M., Lam, C. (2010). A survey of augmentative and alternative communication service provision in Hong Kong. Augmentative and Alternative Communication, 26, 289–298.
PubMed Web of Science ®Google Scholar
Smyth, R., Jacobs, G., & Rogers, H. (2003). Male voices and perceived sexual orientation: An experiment and theoretical approach. Language and Society, 32, 329–350.
Web of Science ®Google Scholar
Stevens, K. N. (1998). Acoustic phonetics. Cambridge, MA: MIT Press.
Google Scholar
Stylianou, Y., Dutoit, T., & Schroeter, J. (1997, September). Diphone concatenation using a harmonic plus noise model of speech. Paper presented at Eurospeech 1997, Rhodes, Greece. Retrieved from: http://www.isca-speech.org/archive/eurospeech_1997/e97_0613.html
Google Scholar
Sunderman, D., Ney, H, & Hoge, H. (2003, December). VTLN-based cross-language voice conversion. Paper presented at the 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), St. Thomas, Virgin Islands. doi:10.1109/ASRU.2003.1318521
Google Scholar
Syrdal, A. K., Bunnell, H. T., Hertz, S. R., Mishra, T., Spiegel, M., Bickley, C., . . . Makashay, M. J. (2012, September). Text-to-speech intelligibility across speech rates. Paper presented at InterSpeech 2012, Portland, OR. Retrieved from: http://www.isca-speech.org/archive/interspeech_2012/i12_0623.html
Google Scholar
Takeda, K., Abe, K., & Sagisaka, Y. (1992). On the basic scheme and algorithms in non-uniform unit speech synthesis. In G. Bailly, C. Benoît, & T. R. Sawallis (Eds.), Talking machines: Theories, models, and designs (pp. 93–105). Amsterdam, The Netherlands: North-Holland Publishing Co.
Google Scholar
Tamura, M., Masuko, T., Tokuda, K., & Kobayashi, T. (2001, May). Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR. Paper presented at the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, UT. doi:10.1109/ICASSP.2001.941037
Google Scholar
Toda, T., Lu, J., Saruwatari, H., & Shikano, K. (2000, October). Straight-based voice conversion algorithm based on Gaussian mixture model. Paper presented at the Sixth International Conference on Spoken Language Processing, Beijing, China. Retrieved from: http://hdl.handle.net/10061/8187
Google Scholar
Toda, T., Ohtani, Y., & Shikano, K. (2007a, April). One-to-many and many-to-one voice conversion based on eigenvoices. Paper presented at the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, HI. doi:10.1109/ICASSP.2007.367303
Google Scholar
Toda, T., Black, A. W., & Tokuda, K. (2007b). Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 15, 2222–2235. doi:10.1109/tasl.2007.907344
Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., & Kitamura, T. (2000, June). Speech parameter generation algorithms for HMM-based speech synthesis. Paper presented at the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey. doi:10.1109/ICASSP.2000.861820
Google Scholar
Walton, J., & Orlikoff, R. (1994). Speaker race identification from acoustic cues in the vocal signal. Journal of Speech Language and Hearing Research37, 4, 738–745.
Web of Science ®Google Scholar
Watts, O., Yamagishi, J., Berkling, K., & King, S. (2008, October). HMM-based synthesis of child speech. Paper presented at the First Workshop on Child, Computer and Interaction (ICMI’08 post-conference workshop), Chania, Greece. Retrieved from: http://hdl.handle.net/1842/3817
Google Scholar
Watts, O., Yamagishi, J., King, S., & Berkling, K. (2009 September). HMM adaptation and voice conversion for the synthesis of child speech: A comparison. Paper presented at Interspeech 2009, Brighton, United Kingdom. Retrieved from: http://www.isca-speech.org/archive/interspeech_2009/i09_2627.html
Google Scholar
Watts, O., Yamagishi, J., King, S., & Berkling, K. (2010). Synthesis of child speech with HMM adaptation and voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1005–1016. doi:10.1109/TASL.2009.2035029
Google Scholar
Yamagishi, J., & Kobayashi, T. (2007). Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Transactions on Information and Systems, E90-D, 533–543.
Google Scholar
Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., Renals, S. (2009). A robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1208–1230. doi:10.1109/TASL.2009.2016394
Google Scholar
Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51, 1039–1064.
Web of Science ®Google Scholar
Zuckerman, M., & Miyake, K. (1993). The attractive voice: What makes it so? Journal of Nonverbal Behaviour, 17, 119–135.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Towards Personalized Speech Synthesis for Augmentative and Alternative Communication

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Towards Personalized Speech Synthesis for Augmentative and Alternative Communication

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date