262
Views
2
CrossRef citations to date
0
Altmetric
Research Articles

Perceived naturalness of emotional voice morphs

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 731-747 | Received 22 Jun 2022, Accepted 05 Apr 2023, Published online: 27 Apr 2023

References

  • Alku, P., Tiitinen, H., & Näätänen, R. (1999). A method for generating natural-sounding speech stimuli for cognitive brain research. Clinical Neurophysiology, 110(8), 1329–1333. https://doi.org/10.1016/S1388-2457(99)00088-7
  • Anand, S., & Stepp, C. E. (2015). Listener perception of monopitch, naturalness, and intelligibility for speakers with Parkinson's disease. Journal of Speech, Language, and Hearing Research, 58(4), 1134–1144. https://doi.org/10.1044/2015_JSLHR-S-14-0243
  • ANSI. (1973). Terminology. In Psychoacoustical. S3. 20 (pp. 61–67). American National Standards Institute, Psychoacoustical.
  • Arias, P., Rachman, L., Liuni, M., & Aucouturier, J. J. (2021). Beyond correlation: Acoustic transformation methods for the experimental study of emotional voice and speech. Emotion Review, 13(1), 12–24. https://doi.org/10.1177/1754073920934544
  • Assmann, P. F., Dembling, S., & Nearey, T. M. (2006). Effects of frequency shifts on perceived naturalness and gender information in speech. In INTERSPEECH. Symposium conducted at the meeting of Citeseer.
  • Assmann, P. F., & Katz, W. F. (2000). Time-varying spectral change in the vowels of children and adults. The Journal of the Acoustical Society of America, 108(4), 1856–1866. https://doi.org/10.1121/1.1289363
  • Baird, A., Jørgensen, S. H., Parada-Cabaleiro, E., Cummings, N., Hantke, S., & Schüller, B. (2018a). The perception of vocal traits in synthesized voices: Age, gender, and human likeness. Journal of the Audio Engineering Society, 66(4), 277–285. https://doi.org/10.17743/jaes.2018.0023
  • Baird, A., Parada-Cabaleiro, E., Hantke, S., Burkhardt, F., Cummings, N., & Schüller, B. (2018b, September 2). The perception and analysis of the likeability and human likeness of synthesized speech. In Interspeech 2018 (pp. 2863–2867). ISCA. https://doi.org/10.21437/Interspeech.2018-1093
  • Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636. https://doi.org/10.1037/0022-3514.70.3.614
  • Belin, P., Bestelmeyer, P. E. G., Latinus, M., & Watson, R. (2011). Understanding voice perception. British Journal of Psychology, 102(4), 711–725. https://doi.org/10.1111/j.2044-8295.2011.02041.x
  • Bestelmeyer, P. E. G., Rouger, J., DeBruine, L. M., & Belin, P. (2010). Auditory adaptation in vocal affect perception. Cognition, 117(2), 217–223. https://doi.org/10.1016/j.cognition.2010.08.008
  • Boersma, P. (2018). Praat: Doing phonetics by computer [Computer program]: Version 6.0.46, retrieved January 2020 from http://www.praat.org/.
  • Bruckert, L., Bestelmeyer, P. E. G., Latinus, M., Rouger, J., Charest, I., Rousselet, G. A., Kawahara, H., & Belin, P. (2010). Vocal attractiveness increases by averaging. Current Biology, 20(2), 116–120. https://doi.org/10.1016/j.cub.2009.11.034
  • Burton, M. W., & Blumstein, S. E. (1995). Lexical effects on phonetic categorization: The role of stimulus naturalness and stimulus quality. Journal of Experimental Psychology: Human Perception and Performance, 21(5), 1230–1235. https://doi.org/10.1037/0096-1523.21.5.1230
  • Cabral, J. P., Cowan, B. R., Zibrek, K., & McDonnell, R. (2017). The influence of synthetic voice on the evaluation of a virtual character. In Interspeech 2017 (pp. 229–233). ISCA. https://doi.org/10.21437/Interspeech.2017-325
  • Calder, A. J., Rowland, D., Young, A. W., Nimmo-Smith, I., Keane, J., & Perrett, D. I. (2000). Caricaturing facial expressions. Cognition, 76(2), 105–146. https://doi.org/10.1016/S0010-0277(00)00074-3
  • Christensen, R. H. B. (2015). Package ‘ordinal’. Stand, 19, 2016.
  • Coughlin-Woods, S., Lehman, M. E., & Cooke, P. A. (2005). Ratings of speech naturalness of children ages 8-16 years. Perceptual and Motor Skills, 100(2), 295–304. https://doi.org/10.2466/pms.100.2.295-304
  • Crookes, K., Ewing, L., Gildenhuys, J. D., Kloth, N., Hayward, W. G., Oxner, M., Pond, S., & Rhodes, G. (2015). How well do computer-generated faces tap face expertise? PLoS One, 10(11), e0141353. https://doi.org/10.1371/journal.pone.0141353
  • Crumpton, J., & Bethel, C. L. (2016). A survey of using vocal prosody to convey emotion in robot speech. International Journal of Social Robotics, 8(2), 271–285. https://doi.org/10.1007/s12369-015-0329-4
  • Cumming, G. (2014). The New statistics. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
  • Eadie, T. L., & Doyle, P. C. (2002). Direct magnitude estimation and interval scaling of naturalness and severity in tracheoesophageal (TE) speakers. Journal of Speech, Language, and Hearing Research, 45(6), 1088–1096. https://doi.org/10.1044/1092-4388(2002/087)
  • Ekman, P. (1992). Are there basic emotions? Psychological Review, 99(3), 550–553. https://doi.org/10.1037/0033-295X.99.3.550
  • Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18. https://doi.org/10.1037/a0024338
  • Frühholz, S., Klaas, H. S., Patel, S., & Grandjean, D. (2015). Talking in fury: The cortico-subcortical network underlying angry vocalizations. Cerebral Cortex, 25(9), 2752–2762. https://doi.org/10.1093/cercor/bhu074
  • Giordano, B. L., Whiting, C., Kriegeskorte, N., Kotz, S. A., Gross, J., & Belin, P. (2021). The representational dynamics of perceived voice emotions evolve from categories to dimensions. Nature Human Behaviour, 5(9), 1203–1213. https://doi.org/10.1038/s41562-021-01073-0
  • Gong, L. (2008). How social is social responses to computers? The function of the degree of anthropomorphism in computer representations. Computers in Human Behavior, 24(4), 1494–1509. https://doi.org/10.1016/j.chb.2007.05.007
  • Grichkovtsova, I., Morel, M., & Lacheret, A. (2012). The role of voice quality and prosodic contour in affective speech perception. Speech Communication, 54(3), 414–429. https://doi.org/10.1016/j.specom.2011.10.005
  • Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. The American Journal of Psychology, 57(2), 243. https://doi.org/10.2307/1416950
  • Hortensius, R., Hekele, F., & Cross, E. S. (2018). The perception of emotion in artificial agents. IEEE Transactions on Cognitive and Developmental Systems, 10(4), 852–864. https://doi.org/10.1109/TCDS.2018.2826921
  • Hubbard, D. J., & Assmann, P. F. (2013). Perceptual adaptation to gender and expressive properties in speech: The role of fundamental frequency. The Journal of the Acoustical Society of America, 133(4), 2367–2376. https://doi.org/10.1121/1.4792145
  • Ilves, M., & Surakka, V. (2013). Subjective responses to synthesised speech with lexical emotional content: The effect of the naturalness of the synthetic voice. Behaviour & Information Technology, 32(2), 117–131. https://doi.org/10.1080/0144929X.2012.702285
  • Ilves, M., Surakka, V., & Vanhala, T. (2011). The effects of emotionally worded synthesized speech on the ratings of emotions and voice quality. In (pp. 588–598). Springer. https://doi.org/10.1007/978-3-642-24600-5_62
  • Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129(5), 770–814. https://doi.org/10.1037/0033-2909.129.5.770
  • Kätsyri, J., Förger, K., Mäkäräinen, M., & Takala, T. (2015). A review of empirical evidence on different uncanny valley hypotheses: Support for perceptual mismatch as one road to the valley of eeriness. Frontiers in Psychology, 6, 390. https://doi.org/10.3389/fpsyg.2015.00390
  • Kawahara, H., Morise, M., & Skuk, V. G. (2013). Temporally variable multi-aspect N-way morphing based on interference-free speech representations. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1–10). https://doi.org/10.1109/APSIPA.2013.6694355
  • Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T., & Banno, H. (2008). TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3933–3936). https://doi.org/10.1109/ICASSP.2008.4518514
  • Kawahara, H., & Skuk, V. G. (2019). Voice morphing. In S. Frühholz & P. Belin (Eds.), The Oxford handbook of voice perception (pp. 685–706). Oxford University Press.
  • Klopfenstein, M., Bernard, K., & Heyman, C. (2020). The study of speech naturalness in communication disorders: A systematic review of the literature. Clinical Linguistics & Phonetics, 34(4), 327–338. https://doi.org/10.1080/02699206.2019.1652692
  • Kloth, N., Rhodes, G., & Schweinberger, S. R. (2017). Watching the brain recalibrate: Neural correlates of renormalization during face adaptation. Neuroimage, 155, 1–9. https://doi.org/10.1016/j.neuroimage.2017.04.049
  • Lakens, D., & Caldwell, A. R. (2019). Simulation-based power-analysis for factorial ANOVA designs. https://doi.org/10.31234/osf.io/baxsf
  • Mackey, L. S., Finn, P., & Ingham, R. J. (1997). Effect of speech dialect on speech naturalness ratings: A systematic replication of Martin, Haroldson, and Triden (1984). Journal of Speech, Language, and Hearing Research, 40(2), 349–360. https://doi.org/10.1044/jslhr.4002.349
  • Martin, R. R., Haroldson, S. K., & Triden, K. A. (1984). Stuttering and speech naturalness. Journal of Speech and Hearing Disorders, 49(1), 53–58. https://doi.org/10.1044/jshd.4901.53
  • MATLAB. (2020). version 9.8.0 (R2020a). The MathWorks Inc.
  • Mayo, C., Clark, R. A. J., & King, S. (2011). Listeners’ weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis. Speech Communication, 53(3), 311–326. https://doi.org/10.1016/j.specom.2010.10.003
  • McAleer, P., Todorov, A., & Belin, P. (2014). How do you say ‘Hello’? Personality impressions from brief novel voices. PLoS One, 9(3), e90779. https://doi.org/10.1371/journal.pone.0090779
  • McGinn, C., & Torre, I. (2019, March 11–14). Can you tell the Robot by the Voice? An exploratory study on the role of voice in the perception of robots. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 211–221). IEEE. https://doi.org/10.1109/HRI.2019.8673305
  • Meltzner, G. S., & Hillman, R. E. (2005). Impact of aberrant acoustic properties on the perception of sound quality in electrolarynx speech. Journal of Speech, Language, and Hearing Research, 48(4), 766–779. https://doi.org/10.1044/1092-4388(2005/053)
  • Mitchell, W. J., Szerszen, K. A., Lu, A. S., Schermerhorn, P. W., Scheutz, M., & Macdorman, K. F. (2011). A mismatch in the human realism of face and voice produces an uncanny valley. I-Perception, 2(1), 10–12. https://doi.org/10.1068/i0415
  • Mori, M., Macdorman, K. F., & Kageki, N. (2012). The Uncanny valley [from the field]. IEEE Robotics & Automation Magazine, 19(2), 98–100. https://doi.org/10.1109/MRA.2012.2192811
  • Nadler, J. T., Weston, R., & Voyles, E. C. (2015). Stuck in the middle: The use and interpretation of mid-points in items on questionnaires. The Journal of General Psychology, 142(2), 71–89. https://doi.org/10.1080/00221309.2014.994590
  • Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are social actors. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Celebrating Interdependence - CHI ‘94. ACM Press.
  • Nusbaum, H. C., Francis, A. L., & Henly, A. S. (1997). Measuring the naturalness of synthetic speech. International Journal of Speech Technology, 2(1), 7–19. https://doi.org/10.1007/BF02215800
  • Nussbaum, C., Schirmer, A., & Schweinberger, S. R. (2022). Contributions of fundamental frequency and timbre to vocal emotion perception and their electrophysiological correlates. In press.
  • Nussbaum, C., Schirmer, A., & Schweinberger, S. R. (2022). Contributions of fundamental frequency and timbre to vocal emotion perception and their electrophysiological correlates. Social Cognitive and Affective Neuroscience, 17(12), 1145–1154. https://doi.org/10.1093/scan/nsac033
  • Nussbaum, C., von Eiff, C. I., Skuk, V. G., & Schweinberger, S. R. (2022). Vocal emotion adaptation aftereffects within and across speaker genders: Roles of timbre and fundamental frequency. Cognition, 219, 104967. https://doi.org/10.1016/j.cognition.2021.104967
  • Paulmann, S., & Kotz, S. A. (2018). The electrophysiology and time course of processing vocal emotion expressions. In S. Frühholz, P. Belin, S. Frühholz, P. Belin, & K. R. Scherer (Eds.), The Oxford handbook of voice perception (pp. 458–472). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198743187.013.20
  • Pell, M. D., Rothermich, K., Liu, P., Paulmann, S., Sethi, S., & Rigoulot, S. (2015). Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biological Psychology, 111, 14–25. https://doi.org/10.1016/j.biopsycho.2015.08.008
  • Péron, J., Cekic, S., Haegelen, C., Sauleau, P., Patel, S., Drapier, D., Vérin, M., & Grandjean, D. (2015). Sensory contribution to vocal emotion deficit in Parkinson's disease after subthalamic stimulation. Cortex; A Journal Devoted to the Study of the Nervous System and Behavior, 63, 172–183. https://doi.org/10.1016/j.cortex.2014.08.023
  • Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99(2), 143–165. https://doi.org/10.1037/0033-2909.99.2.143
  • Schindler, S., Zell, E., Botsch, M., & Kissler, J. (2017). Differential effects of face-realism and emotion on event-related brain potentials and their implications for the uncanny valley theory. Scientific Reports, 7(1), 45003. https://doi.org/10.1038/srep45003
  • Schirmer, A., & Gunter, T. C. (2017). Temporal signatures of processing voiceness and emotion in sound. Social Cognitive and Affective Neuroscience, 12(6), 902–909. https://doi.org/10.1093/scan/nsx020
  • Schweinberger, S. R., Casper, C., Hauthal, N., Kaufmann, J. M., Kawahara, H., Kloth, N., Robertson, D. M., Simpson, A. P., & Zäske, R. (2008). Auditory adaptation in voice perception. Current Biology, 18(9), 684–688. https://doi.org/10.1016/j.cub.2008.04.015
  • Schweinberger, S. R., Pohl, M., & Winkler, P. (2020). Autistic traits, personality, and evaluations of humanoid robots by young and older adults. Computers in Human Behavior, 106, 106256. https://doi.org/10.1016/j.chb.2020.106256
  • Skuk, V. G., Dammann, L. M., & Schweinberger, S. R. (2015). Role of timbre and fundamental frequency in voice gender adaptation. The Journal of the Acoustical Society of America, 138(2), 1180–1193. https://doi.org/10.1121/1.4927696
  • Skuk, V. G., Kirchen, L., Oberhoffner, T., Guntinas-Lichius, O., Dobel, C., & Schweinberger, S. R. (2020). Parameter-Specific morphing reveals contributions of timbre and fundamental frequency cues to the perception of voice gender and Age in cochlear implant users. Journal of Speech, Language, and Hearing Research, 63(9), 3155–3175. https://doi.org/10.1044/2020_JSLHR-20-00026
  • Skuk, V. G., & Schweinberger, S. R. (2014). Influences of fundamental frequency, formant frequencies, aperiodicity, and spectrum level on the perception of voice gender. Journal of Speech, Language, and Hearing Research, 57(1), 285–296. https://doi.org/10.1044/1092-4388(2013/12-0314)
  • Spatola, N., & Wudarczyk, O. A. (2021). Ascribing emotions to robots: Explicit and implicit attribution of emotions and perceived robot anthropomorphism. Computers in Human Behavior, 124, 106934. https://doi.org/10.1016/j.chb.2021.106934
  • Stoet, G. (2010). Psytoolkit: A software package for programming psychological experiments using Linux. Behavior Research Methods, 42(4), 1096–1104. https://doi.org/10.3758/BRM.42.4.1096
  • Stoet, G. (2017). PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24–31. https://doi.org/10.1177/0098628316677643
  • Vojtech, J. M., Noordzij, J. P., Cler, G. J., & Stepp, C. E. (2019). The effects of modulating fundamental frequency and speech rate on the intelligibility, communication efficiency, and perceived naturalness of synthetic speech. American Journal of Speech-Language Pathology, 28(2S), 875–886. https://doi.org/10.1044/2019_AJSLP-MSC18-18-0052
  • von Eiff, C. I., Skuk, V. G., Zäske, R., Nussbaum, C., Frühholz, S., Feuer, U., Guntinas-Lichius, O., & Schweinberger, S. R. (2022). Parameter-Specific morphing reveals contributions of timbre to the perception of vocal emotions in cochlear implant users. Ear & Hearing, 43(4), 1178–1188. https://doi.org/10.1097/AUD.0000000000001181
  • Webster, M. A., & Maclin, O. H. (1999). Figural aftereffects in the perception of faces. Psychonomic Bulletin & Review, 6(4), 647–653. https://doi.org/10.3758/BF03212974
  • Whiting, C. M., Kotz, S. A., Gross, J., Giordano, B. L., & Belin, P. (2020). The perception of caricatured emotion in voice. Cognition, 200, 104249. https://doi.org/10.1016/j.cognition.2020.104249
  • Yamagishi, J., Veaux, C., King, S., & Renals, S. (2012). Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1), 1–5. https://doi.org/10.1250/ast.33.1
  • Yamasaki, R., Montagnoli, A., Murano, E. Z., Gebrim, E., Hachiya, A., Lopes da Silva, J. V., Behlau, M., & Tsuji, D. (2017). Perturbation measurements on the degree of naturalness of synthesized vowels. Journal of Voice, 31(3), 389.e1–389.e8. https://doi.org/10.1016/j.jvoice.2016.09.020
  • Yorkston, K. M., Beukelman, D. R., Strand, E. A., & Hakel, M. (1999). Management of motor speech disorders in children and adults. Austin, TX: Pro-ed.
  • Yorkston, K. M., Hammen, V. L., Beukelman, D. R., & Traynor, C. D. (1990). The effect of rate control on the intelligibility and naturalness of dysarthric speech. Journal of Speech and Hearing Disorders, 55(3), 550–560. https://doi.org/10.1044/jshd.5503.550
  • Young, A. W., & Bruce, V. (2011). Understanding person perception. British Journal of Psychology, 102(4), 959–974. https://doi.org/10.1111/j.2044-8295.2011.02045.x

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.