1,356
Views
22
CrossRef citations to date
0
Altmetric
Research Article

Towards Personalized Speech Synthesis for Augmentative and Alternative Communication

, &
Pages 226-236 | Received 10 Jul 2013, Accepted 01 Feb 2014, Published online: 15 Jul 2014

References

  • Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11, 109–118.
  • Arslan, L. (1999). Speaker transformation algorithm using segmental codebooks (STASC). Speech Communication, 28, 211–226.
  • Bachorowski, J., & Owren, M. (1999). Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. Journal of the Acoustical Society of America, 106, 1054–1063.
  • Benoît, C., Grice, M., & Hazan, V. (1996). The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Communication, 18, 381–392.
  • Beutnagel, M., Conkie, A., & Syrdal, A. K. (1998, November). Diphone synthesis using unit selection. Paper presented at the 3rd ISCA Speech Synthesis Workshop (SSW3), Blue Mountains, Australia. Retrieved from http://www.isca-speech.org/archive_open/archive_papers/ssw3/ssw3_185.pdf
  • Bloomberg, K., & Johnson, H. (1990). A statewide demographic survey of people with severe communication impairments. Augmentative and Alternative Communication, 6, 50–60.
  • Bolinger, D. (1961). Contrastive accent and contrastive stress, Language, 37, 83–96.
  • Bolinger, D. (1989). Intonation and its uses. Palo Alto, CA: Stanford University Press.
  • Brophy-Arnott, M. B., Newell, A. F., Arnott, J. L., & Condie, D. (1992). A survey of the communication-impaired population of Tayside. European Journal of Disorders of Communication, 25, 159–173.
  • Bunnell, H. T. (2010, September). Crafting small databases for unit selection TTS: Effects on intelligibility. Paper presented at the 7th ISCA Speech Synthesis Workshop (SSW7), Kyoto, Japan. Retrieved from http://isw3.naist.jp/∼tomoki/ssw7/www/doc/ssw7_proceedings_rev.pdf
  • Bunnell, H. T., Hoskins, S. R., & Yarrington, D. M. (1998, November). A biphone constrained concatenation method for diphone synthesis. Paper presented at the 3rd ISCA Speech Synthesis Workshop (SSW3), Blue Mountains, Australia. Retrieved from http://www.isca-speech.org/archive_open/archive_papers/ssw3/ssw3_171.pdf
  • Bunnell, H. T., & Lilley, J. (2007, August). Analysis methods for assessing TTS intelligibility. Paper presented at the 6th ISCA Speech Synthesis Workshop (SSW6), Bonn, Germany. Retrieved from http://www.isca-speech.org/archive_open/archive_papers/ssw6/ssw6_374.pdf
  • Bunnell, H. T., & Pennington, C. (2010). Advances in computer speech synthesis and implications for assistive technologies. In J. Mullenix & S. Stern (Eds.), Computer synthesized speech technologies: Tools for aiding impairment (pp . 71–91). Hershey, PA: IGI Global.
  • Bunnell, H. T., Pennington, C., Yarrington, D., & Gray, J. (2005). Automatic personal synthetic voice construction. Paper presented at Eurospeech 2005, 89–92. Retrieved from http://www.isca-speech.org/archive/interspeech_2005/i05_0089.html
  • Carrell, T. D. (1984). Contributions of fundamental frequency, formant spacing, and glottal waveform to talker identification (Doctoral dissertation). Indiana University, MI, USA.
  • Carrell, T. D. (1985). Effects of glottal waveform on the perception of talker sex. Journal of the Acoustical Society of America, 70, S97.
  • Chen, Y., Chu, M., Chang, E., Liu, J., & Liu, R. (2003, September). Voice conversion with smoothed GMM and MAP adaptation. Paper presented at Eurospeech 2003, Geneva, Switzerland. Retrieved from: http://www.isca-speech.org/archive/eurospeech_2003/e03_2413.html
  • Chiba, T., & Kajiyama, J. (1941). The vowel: Its nature and structure. Tokyo, Japan: Tokyo-Kaiseikan.
  • Collins, S. (2000). Men's voices and women's choices. Animal Behavior, 60, 773–780.
  • Cossette, L. & Duclos, É. (2002). A profile of disability in Canada, 2001 (89-577-XIE). Ottawa, Canada: Statistics Canada. Retrieved from http://www.statcan.gc.ca/pub/89-577-x/pdf/4228016-eng.pdf
  • Creer, S., Green, P., Cunningham, S., & Yamagishi, J. (2010). Building personalized synthetic voices for individuals with dysarthria using the HTS toolkit. In J. Mullenix & S. Stern (Eds.), Computer synthesized speech technologies: Tools for aiding impairment (pp . 92–115). Hershey, PA: IGI Global.
  • Cruttenden, A. (1981). Falls and rises: meanings and universals. Journal of Linguistics17, 77–91.
  • Cruttenden, A. (1986). Intonation. Cambridge, UK: Cambridge University Press.
  • Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964.
  • Eatock, J., & Mason, J. (1994, April). A quantitative assessment of the relative speaker discriminating properties of phonemes. Paper presented at the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Adelaide, Australia. doi:10.1109/ICASSP.1994.389337
  • Egan, J.P. (1948). Articulation testing methods. The Laryngoscope58, 955–991.
  • Erro, D., Moreno, A., & Bonafonte, A. (2010a). Voice conversion based on weighted frequency warping. IEEE Transactions on Audio, Speech, and Language Processing, 18, 922–931.
  • Erro, D., Moreno, A., & Bonafonte, A. (2010b). INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Transactions on Audio, Speech, and Language Processing, 18, 944–953.
  • Fant, G. (1960). Acoustic theory of speech production. The Hague, Netherlands: Mouton & Co.
  • Fant, G. (1966). A note on vocal tract size factors and non-uniform F-pattern scaling. Speech Transmission Laboratories Quarterly Progress Status Report,7(4), 22–30. Stockholm, Sweden: KTH Royal Institute of Technology. Retrieved from http://www.speech.kth.se/prod/publications/files/qpsr/1966/1966_7_4_022-030.pdf
  • Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratories Quarterly Progress Status Report, 26 (4), 1–13. Stockholm, Sweden: KTH Royal Institute of Technology. Retrieved from http://www.speech.kth.se/prod/publications/files/qpsr/1985/1985_26_4_001-013.pdf
  • Feinberg, D. R., Jones, B. C., Little, A. C., Burt, D. M., & Perrett, D. I. (2005). Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Animal Behavior, 69, 561–568.
  • Fellows, J. M., Remez, R. E., & Rubin, P. E. (1997). Perceiving the sex and identity of a talker without natural vocal timbre. Perception and Psychophysics59, 839–849.
  • Fitch, W. T., & Giedd, J. (1999). Morphology and development of the human vocal tract: A study using magnetic resonance imaging. Journal of the Acoustical Society of America, 106, 1511–1522.
  • Gorenflo, C., Gorenflo, D., & Santer, S. A. (1994). Effects of synthetic voice output on attitudes toward the augmented communicator. Journal of Speech and Hearing Research, 37, 64–68.
  • Hartman, D., & Danhauer, J. (1976). Perceptual features of speech for males in four perceived age decades. Journal of the Acoustical Society of America, 59, 713–715.
  • Helander, E., Nurminen, J., & Gabbouj, M. (2008, April). LSF mapping for voice conversion with very small training sets. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV.
  • Helander, E., Silén, H., Míguez, J., & Gabbouj, M. (2010, September). Maximum a posteriori voice conversion using sequential Monte Carlo methods. Paper presented at Interspeech 2010, Makuhari, Japan.
  • Helander, E., Silén, H., Virtanen, T., & Gabbouj, M. (2012). Voice conversion using dynamic kernel partial least squares regression. IEEE Transactions on Audio, Speech, and Language Processing, 20, 806–817.
  • Hertz, S.R. (2006, September). A model of the regularities underlying speaker variation: Evidence from hybrid synthesis. Paper presented at the Ninth International Conference on Spoken Language Processing (ICSLP). Pittsburgh, PA. Retrieved from http://www.novaspeech.com/Documents/interspeech2006.pdf
  • Hollien, H., & Klepper, B. (1984). The speaker identification problem. Advances in Forensic Psychology and Psychiatry, 1, 87–111.
  • Itoh, K., (1992). Perceptual analysis of speaker identity. In: S. Saito (Ed.), Speech science and technology (pp. 133–145). Burke, VA: IOS press.
  • Jassem, W. (1971). Pitch and compass of the speaking voice. Journal of the International Phonetic Association, 1, 59–68.
  • Jreige, C., Patel, R., & Bunnell, H. T. (2009). VocaliD: personalizing text-to-speech synthesis for individuals with severe speech impairment. Assets ‘09: Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (pp. 259–260). New York, NY: ACM. doi:10.1145/1639642.1639704
  • Kain, A., & Macon, M. W. (1998, May). Spectral voice conversion for text-to-speech synthesis. Paper presented at the IEEE Interational Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seattle, WA. 285–288. doi:10.1109/ICASSP.1998.674423
  • Kain, A., & Macon, M. W. (2001, May). Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, UT. doi:10.1109/ICASSP.2001.941039
  • Kain, A., Niu, X., Hosom, J.-P., Miao, Q., & van Santen, J. P. H. (2004, June). Formant re-synthesis of dysarthric speech. Paper presented at the 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, PA. Retrieved from http://www.isca-speech.org/archive_open/ssw5/ssw5_025.html
  • Kain, A., & van Santen, J. (2009, April). Using speech transformation to increase speech intelligibility for the hearing- and speaking-impaired. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan. doi:10.1109/icassp.2009.4960406
  • Kawahara, H., Masuda-Katsuse, I., & de Cheveigne , A. (1999). Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds. Speech Communication, 27, 187–207.
  • King, S. & Karaiskos, V. (2009, September). The Blizzard Challenge 2009. Paper presented at the Blizzard Challenge Workshop, Edinburgh, UK.
  • Krieman, J., & Papcun, G. (1991). Comparing discrimination and recognition of unfamiliar voices. Speech Communication, 10, 265–275. doi:10.1016/1067-6393(91)90016-M
  • Kuhn, R, Junqua, J.-C., Nguyen, P., & Niedzielski, N. (2000). Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Acoustics, Speech, and Signal Processing, 8, 695–707.
  • Ladefoged, O., & Ladefoged, J. (1980). The ability of listeners to identify voices. UCLA Working Papers in Phonetics, 49, 43–51. Los Angeles, CA: UCLA Phonetics Lab.
  • Lass, N. J., Ruscello, D. M., & Lakawicz, J. A. (1988). Listeners’ perceptions of nonspeech characteristics of normal and dysarthric children. Journal of Communication Disorders, 21, 385–391.
  • Lavner, Y., Gath, I., & Rosenhouse, J. (2000). The effects of acoustic modifications on the identification of familiar voices speaking isolated vowels. Speech Communication, 30, 9–26.
  • Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT Press.
  • Lehiste, I. (1976). Suprasegmental features of speech. In N.J. Lass (Ed.), Contemporary issues in experimental phonetics (pp. 225–239). New York, NY: Academic Press.
  • Ling Z.-H., Richmond, K., Yamagishi, J., & Wang, R.-H. (2009). Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1171–1185. doi:10.1109/tasl.2009.2014796
  • Ling, Z.-H., Richmond, K., & Yamagishi. J. (2010a). An analysis of HMM-based prediction of articulatory movements. Speech Communication, 52, 834–846.
  • Ling, Z.-H., Richmond, K., & Yamagishi, J. (2010b, September). HMM-based Text-to-Articulation-movement prediction and analysis of critical articulators. Paper presented at Interspeech 2010, Makuhari, Japan. Retrieved from: http://hdl.handle.net/1842.4563.
  • Linville, S. (1998). Acoustic correlates of perceived versus actual sexual orientation in men's speech. Folia Phoniatrica et Logopaedica, 50, 35–48.
  • Masuko, T., Tokuda, K., Kobayashi, T., & Imai, S. (1997, April). Voice characteristics conversion for HMM-based speech synthesis system. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Munich, Germany. doi:10.1109/ ICASSP.2009.4960406
  • Matas, J., Mathy-Laikko, P., Beukelman, D., & Legresley, K. (1985). Identifying the nonspeaking population: A demographic study. Augmentative and Alternative Communication, 1, 17–31.
  • Monsen, R. B., & Engebretson, A. M. (1977). Study of variations in the male and female glottal wave. Journal of the Acoustical Society of America, 62, 981–993.
  • Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9, 453–467.
  • Munson, B., McDonald, E. C., DeBoe, N. L., & White, A. R. (2006). The acoustic and perceptual bases of judgments of women and men's sexual orientation from read speech. Journal of Phonetics, 34, 202–240.
  • Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustical Society of America, 93, 1097–1108. doi:10.1121/1.405558
  • Muthukumar, P.K., Black, A.W., & Bunnell, H.T. (2013, August). Optimizations and fitting procedures for the Liljencrants-Fant model for statistical parametric speech synthesis. Paper presented at InterSpeech 2013, Lyon, France. Retrieved from http://www.isca-speech.org/archive/interspeech_2013/i13_0397.html
  • Narendranath, M., Murthy, H. A., Rajendran, S., & Yegnanarayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication16, 207–216.
  • Nass, C., & Lee, K. M. (2001). Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency attraction. Journal of Experimental Psychology: Applied, 7, 171–181.
  • Netsell, R. (1973). Speech Physiology. In F. Minifie, T. J. Hixon, & F. Williams (Eds.), Normal aspects of speech, hearing, and language (pp. 211–234). Englewood Cliffs, NJ: Prentice-Hall.
  • Nguyen, B. P., & Akagi, M. (2008). Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model. Proceedings of the Second International Conference on Communications and Electronics, 224–229. doi:10.1109/CCE.2008.4578962
  • Nolan, F. (1983). The phonetic bases of speaker recognition. Cambridge, UK: Cambridge University Press.
  • Nurminen, J, Popa, V., Tian, J., Tang, Y., & Kiss, I. (2006, June). A parametric approach for voice conversion. Paper presented at the TC-STAR Workshop on Speech-to-Speech Translation. Barecelona, Spain. Retrieved from http://www.tcstar.org/pubblicazioni/scientific_publications/Nokia/2006/S2STranslation06_nokia3.pdf
  • Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5, 42–46.
  • Patel, R. (2002a). Phonatory control in adults with cerebral palsy and severe dysarthria. Augmentative and Alternative Communication, 18, 2–10.
  • Patel, R. (2002b). Prosodic Control in severe dysarthria: Preserved ability to mark the question-statement contrast. Journal of Speech, Language, and Hearing Research, 45, 858–870.
  • Patel, R. (2003). Acoustic characteristics of the question-statement contrast in severe dysarthria due to cerebral palsy. Journal of Speech, Language, and Hearing Research, 46, 1401–1415.
  • Patel, R. (2004). The acoustics of contrastive prosody in adults with cerebral palsy. Journal of Medical Speech-Language Pathology, 12, 189–193.
  • Patel, R., & Roden, A. (2008). Intelligibility and attitudes toward a speech synthesizer using dysarthric vocalizations. Journal of Medical Speech-Language Pathology, 16, 243–249.
  • Patel, R., & Salata, A. (2006). Using computer games to mediate caregiver-child communication for children with severe dysarthria. Journal of Medical Speech-Language Pathology, 14, 279–284.
  • Patel, R., & Watkins, C. (2007). Stress identification in speakers with dysarthria due to cerebral palsy: An initial report. Journal of Medical Speech-Language Pathology, 15, 149–159.
  • Popa, V., Silen, H., Nurminen, J., & Gabbouj, M. (2012, March). Local linear transformation for voice conversion. Paper presented at the 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan. doi:10.1109/ICASSP.2012.6288922
  • Pierrehumbert, J., Bent, T., Munson, B., Bradlow, A. R., & Bailey, J. M. (2004). The influence of sexual orientation on vowel production. Journal of the Acoustical Society of America, 116, 1905–1908.
  • Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication48, 1243–1261. doi:10.1016/j.specom.2006.06.002
  • Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., & Alku, P. (2011). HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Transactions on Audio, Speech, and Language Processing, 19, 153–165. doi:10.1109/TASL.2010.2045239
  • Remez, R. E., Fellowes, J. M., & Rubin, P. E. (1997). Talker identification based on phonetic information. Journal of Experimental Psychology, 23, 651–666.
  • Remez, R. E., & Rubin, P. E. (1993). On the intonation of sinusoidal sentences: Contour and pitch height. Journal of the Acoustical Society of America, 94, 1983–1988.
  • Rentzos, D., Vaseghi, S., Yan, W., & Ho, C.-H. (2004, May). Voice conversion through transformation of spectral and intonation features. Paper presented at the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Montreal, Canada. doi:10.1109/ICASSP.2004.1325912
  • Sagisaka, Y. (1988, May). Speech synthesis by rule using an optimal selection of non-uniform synthesis units. Paper presented at the 1988 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New York, NY. doi:10.1109/ICASSP.1988.196677
  • Shuang, Z-W., Bakis, R., Shectman, S., Chazan, D., & Qin, Y. (2006, September). Frequency warping based on mapping formant parameters. Paper presented at Interspeech 2006, Pittsburgh, PA. http://www.isca-speech.org/archive/interspeech_2006/i06_1768.html
  • Sigafoos, J., Schlosser, R. W., & Sutherland, D. 2013. Augmentative and alternative communication. In: J. H. Stone & M. Blouin (Eds.), International encyclopedia of rehabilitation. Available online: http://cirrie.buffalo.edu/encyclopedia/en/article/50
  • Siu, E., Tam, E., Sin, D, Ng, C., Lam, E., Chui, M., Lam, C. (2010). A survey of augmentative and alternative communication service provision in Hong Kong. Augmentative and Alternative Communication, 26, 289–298.
  • Smyth, R., Jacobs, G., & Rogers, H. (2003). Male voices and perceived sexual orientation: An experiment and theoretical approach. Language and Society, 32, 329–350.
  • Stevens, K. N. (1998). Acoustic phonetics. Cambridge, MA: MIT Press.
  • Stylianou, Y., Dutoit, T., & Schroeter, J. (1997, September). Diphone concatenation using a harmonic plus noise model of speech. Paper presented at Eurospeech 1997, Rhodes, Greece. Retrieved from: http://www.isca-speech.org/archive/eurospeech_1997/e97_0613.html
  • Sunderman, D., Ney, H, & Hoge, H. (2003, December). VTLN-based cross-language voice conversion. Paper presented at the 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), St. Thomas, Virgin Islands. doi:10.1109/ASRU.2003.1318521
  • Syrdal, A. K., Bunnell, H. T., Hertz, S. R., Mishra, T., Spiegel, M., Bickley, C., . . . Makashay, M. J. (2012, September). Text-to-speech intelligibility across speech rates. Paper presented at InterSpeech 2012, Portland, OR. Retrieved from: http://www.isca-speech.org/archive/interspeech_2012/i12_0623.html
  • Takeda, K., Abe, K., & Sagisaka, Y. (1992). On the basic scheme and algorithms in non-uniform unit speech synthesis. In G. Bailly, C. Benoît, & T. R. Sawallis (Eds.), Talking machines: Theories, models, and designs (pp. 93–105). Amsterdam, The Netherlands: North-Holland Publishing Co.
  • Tamura, M., Masuko, T., Tokuda, K., & Kobayashi, T. (2001, May). Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR. Paper presented at the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, UT. doi:10.1109/ICASSP.2001.941037
  • Toda, T., Lu, J., Saruwatari, H., & Shikano, K. (2000, October). Straight-based voice conversion algorithm based on Gaussian mixture model. Paper presented at the Sixth International Conference on Spoken Language Processing, Beijing, China. Retrieved from: http://hdl.handle.net/10061/8187
  • Toda, T., Ohtani, Y., & Shikano, K. (2007a, April). One-to-many and many-to-one voice conversion based on eigenvoices. Paper presented at the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, HI. doi:10.1109/ICASSP.2007.367303
  • Toda, T., Black, A. W., & Tokuda, K. (2007b). Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 15, 2222–2235. doi:10.1109/tasl.2007.907344
  • Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., & Kitamura, T. (2000, June). Speech parameter generation algorithms for HMM-based speech synthesis. Paper presented at the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey. doi:10.1109/ICASSP.2000.861820
  • Walton, J., & Orlikoff, R. (1994). Speaker race identification from acoustic cues in the vocal signal. Journal of Speech Language and Hearing Research37, 4, 738–745.
  • Watts, O., Yamagishi, J., Berkling, K., & King, S. (2008, October). HMM-based synthesis of child speech. Paper presented at the First Workshop on Child, Computer and Interaction (ICMI’08 post-conference workshop), Chania, Greece. Retrieved from: http://hdl.handle.net/1842/3817
  • Watts, O., Yamagishi, J., King, S., & Berkling, K. (2009 September). HMM adaptation and voice conversion for the synthesis of child speech: A comparison. Paper presented at Interspeech 2009, Brighton, United Kingdom. Retrieved from: http://www.isca-speech.org/archive/interspeech_2009/i09_2627.html
  • Watts, O., Yamagishi, J., King, S., & Berkling, K. (2010). Synthesis of child speech with HMM adaptation and voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1005–1016. doi:10.1109/TASL.2009.2035029
  • Yamagishi, J., & Kobayashi, T. (2007). Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Transactions on Information and Systems, E90-D, 533–543.
  • Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., Renals, S. (2009). A robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1208–1230. doi:10.1109/TASL.2009.2016394
  • Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51, 1039–1064.
  • Zuckerman, M., & Miyake, K. (1993). The attractive voice: What makes it so? Journal of Nonverbal Behaviour, 17, 119–135.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.