2,005
Views
8
CrossRef citations to date
0
Altmetric
Original Articles

Cross-modal processing of voices and faces in developmental prosopagnosia and developmental phonagnosia

ORCID Icon & ORCID Icon
Pages 644-657 | Received 24 Aug 2016, Accepted 22 Mar 2017, Published online: 02 Jun 2017
 

ABSTRACT

Conspecifics can be recognized from either the face or the voice alone. However, person identity information is rarely encountered in purely unimodal situations and there is increasing evidence that the face and voice interact in neurotypical identity processing. Conversely, developmental deficits have been observed that seem to be selective for face and voice recognition, developmental prosopagnosia and developmental phonagnosia, respectively. To date, studies on developmental prosopagnosia and phonagnosia have largely centred on within modality testing. Here, we review evidence from a small number of behavioural and neuroimaging studies which have examined the recognition of both faces and voices in these cohorts. A consensus from the findings is that, when tested in purely unimodal conditions, voice-identity processing appears normal in most cases of developmental prosopagnosia, as does face-identity processing in developmental phonagnosia. However, there is now first evidence that the multisensory nature of person identity impacts on identity recognition abilities in these cohorts. For example, unlike neurotypicals, auditory-only voice recognition is not enhanced in developmental prosopagnosia for voices which have been previously learned together with a face. This might also explain why the recognition of personally familiar voices is poorer in developmental prosopagnosics, compared to controls. In contrast, there is evidence that multisensory interactions might also lead to compensatory mechanisms in these disorders. For example, in developmental phonagnosia, voice recognition may be enhanced if voices have been learned with a corresponding face. Taken together, the reviewed findings challenge traditional models of person recognition which have assumed independence between face-identity and voice-identity processing and rather support an audio-visual model of human communication that assumes direction interactions between voice and face processing streams. In addition, the reviewed findings open up novel empirical research questions and have important implications for potential training regimes for developmental prosopagnosia and phonagnosia.

Disclosure statement

No potential conflict of interest was reported by the authors.

ORCID

Corrina Maguinness http://orcid.org/0000-0002-6200-4109

Katharina von Kriegstein http://orcid.org/0000-0001-7989-5860

Notes

1 The findings are not homogenous across all cases of prosopagnosia, for overviews see Susilo and Duchaine (Citation2013); Behrmann and Avidan (Citation2005).

2 A reviewer alerted us to the possibility that some undiagnosed cases of prosopagnosia may show superior voice recognition abilities. Conceivably, such individuals may be less likely to come forward for diagnosis of prosopagnosia, as they already compensate well for their disorder via auditory recognition cues. However, such a cohort has not been revealed yet.

3 It has also been reported that voice recognition can be impaired, rather than improved, by the presence of a visual face during learning, an effect referred to as “face-overshadowing” (Cook & Wilding, Citation1997, Cook & Wilding, Citation2001). Within this context, the saliency of the face interferes with the ability to attend to the voice identity. Zäske, Mühl, & Schweinberger (Citation2015) recently demonstrated that the face-overshadowing effect is mitigated over time. While they observed that the presence of a face initially impaired voice learning, with repeated exposure voice recognition was more robust for face-learned, compared to auditory-only learned, speakers. This is in line with the findings on face-learned in contrast to occupation-learned or name-learned voices (von Kriegstein & Giraud, Citation2006; von Kriegstein et al., Citation2008), as well as familiar voice processing (see von Kriegstein et al., Citation2005) where repeated day-to-day audio-visual interactions are likely typical.

Additional information

Funding

This work was funded by a Max Planck Research Group grant to KvK.