5,584
Views
73
CrossRef citations to date
0
Altmetric
Articles

Prediction, Bayesian inference and feedback in speech recognition

, &
Pages 4-18 | Received 18 Feb 2015, Accepted 05 Aug 2015, Published online: 04 Sep 2015

References

  • Abdel-Hamid, O., Deng, L., & Yu, D. (2013). Exploring convolutional neural network structures and optimization techniques for speech recognition. Proceedings of Interspeech 2013, Lyon (pp. 3366–3370).
  • Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73, 247–264. doi:10.1016/S0010-0277(99)00059-1
  • Arai, M., & Keller, F. (2013). The use of verb-specific information for prediction in sentence processing. Language and Cognitive Processes, 28, 525–560. doi:10.1080/01690965.2012.658072
  • Bever, T. G., & Poeppel, D. (2010). Analysis by synthesis: A (re-)emerging program of research for language and vision. Biolinguistics, 4.2–0.3, 174–200. ISSN 1450–3417
  • Brouwer, S., Mitterer, H., & Huettig, F. (2013). Discourse context and the recognition of reduced and canonical spoken words. Applied Psycholinguistics, 34, 519–539. doi:10.1017/S0142716411000853
  • Brunellière, A., & Soto-Faraco, S. (2013). The speakers’ accent shapes the listeners’ phonological predictions during speech perception. Brain and Language, 125, 82–93. doi:10.1016/j.bandl.2013.01.007
  • Chambers, C. G., Tanenhaus, M. K., Eberhard, K. M., Filip, H., & Carlson, G. N. (2002). Circumscribing referential domains during real-time language comprehension. Journal of Memory and Language, 47, 30–49. doi:10.1006/jmla.2001.2832
  • Clos, M., Langner, R., Meyer, M., Oechslin, M. S., Zilles, K., & Eickhoff, S. B. (2014). Effects of prior information on decoding degraded speech: An fMRI study. Human Brain Mapping, 35, 61–74. doi:10.1002/hbm.22151
  • Connine, C. M. (1987). Constraints on interactive processes in auditory word recognition: The role of sentence context. Journal of Memory and Language, 26, 527–538. doi:10.1016/0749-596X(87)90138-0
  • Connine, C. M., Titone, D., & Wang, J. (1993). Auditory word recognition: Extrinsic and intrinsic effects of word frequency. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 81–94. doi:10.1037/0278-7393.19.1.81
  • Cutler, A. (1976). Phoneme monitoring reaction time as a function of preceding intonation contour. Perception & Psychophysics, 20, 55–60. doi:10.3758/BF03198706
  • Dahan, D., & Tanenhaus, M. K. (2004). Continuous mapping from sound to meaning in spoken-language comprehension: Immediate effects of verb-based thematic constraints. Journal of Experimental Psychology: Learning, Memory and Cognition, 30, 498–513. doi:10.1037/0278-7393.30.2.498
  • Davis, M. H., & Johnsrude, I. S. (2007). Hearing speech sounds: Top-down influences on the interface between audition and speech perception. Hearing Research, 229, 132–147. doi:10.1016/j.heares.2007.01.014
  • DeWitt, I., & Rauschecker, J. P. (2012). Phoneme and word recognition in the auditory ventral stream. Proceedings of the National Academy of Sciences, 109, E505–E514. doi:10.1073/pnas.1113427109
  • Drew, P. J., & Abbott, L. F. (2003). Model of song selectivity and sequence generation in area HVc of the songbird. Journal of Neurophysiology, 89, 2697–2706. doi:10.1152/jn.00801.2002
  • Elman, J. L., & McClelland, J. L. (1988). Cognitive penetration of the mechanisms of perception: Compensation for coarticulation of lexically restored phonemes. Journal of Memory and Language, 27, 143–165. doi:10.1016/0749-596X(88)90071-X
  • Feldman, N. H., Griffiths, T. L., & Morgan, J. L. (2009). The influence of categories on perception: Explaining the perceptual magnet effect as optimal statistical inference. Psychological Review, 116, 752–782. doi:10.1037/a0017196
  • Friston, K. (2003). Learning and inference in the brain. Neural Networks: The Official Journal of the International Neural Network Society, 16, 1325–1352. doi:10.1016/j.neunet.2003.06.005
  • Gagnepain, P., Henson, R. N., & Davis, M. H. (2012). Temporal predictive codes for spoken words in auditory cortex. Current Biology, 22, 615–621. doi:10.1016/j.cub.2012.02.015
  • Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin & Review, 13, 361–377. doi:10.3758/BF03193857
  • Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance, 6, 110–125. doi:10.1037/0096-1523.6.1.110
  • Geisler, W. S. (2003). Ideal observer analysis. In L. Chalupa & J. Werner (Eds.), The visual neurosciences (pp. 825–837). Cambridge, MA: MIT Press.
  • Geisler, W. S., & Kersten, D. (2002). Illusions, perception and Bayes. Nature Neuroscience, 5, 508–510. doi:10.1038/nn0602-508
  • Gilbert, C. D., & Sigman, M. (2007). Brain states: Top-down influences in sensory processing. Neuron, 54, 677–696. doi:10.1016/j.neuron.2007.05.019
  • Gow, D. W. (2012). The cortical organization of lexical knowledge: A dual lexicon model of spoken language processing. Brain and Language, 121, 273–288. doi:10.1016/j.bandl.2012.03.005
  • Gow, D. W., Segawa, J. A., Ahlfors, S. P., & Lin, F.-H. (2008). Lexical influences on speech perception: A granger causality analysis of MEG and EEG source estimates. Neuroimage, 43, 614–623. doi:10.1016/j.neuroimage.2008.07.027
  • Halle, M., & Stevens, K. N. (1959). Analysis by synthesis. In W. Wathen-Dunn & L. E. Woods. Proceedings of the seminar on speech compression and processing (Vol. 2). AFCRC-TR-59-198. USAF Camb. Res. Ctr. 2: Paper D7.
  • Halle, M., & Stevens, K. N. (1962). Speech recognition: A model and a program for research. IRE Transactions on Information Theory, 8, 155–159. doi:10.1109/TIT.1962.1057686
  • Harrison, C. W. (1952). Experiments with linear prediction in television. Bell System Technical Journal, 31, 764–783. doi:10.1002/j.1538-7305.1952.tb01405.x
  • Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393–402. doi:10.1038/nrn2113
  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012). Deep neural networks for acoustic modelling in speech recognition. IEEE Signal Processing Magazine, 29(November), 82–97. doi:10.1109/MSP.2012.2205597
  • Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. doi:10.1162/neco.2006.18.7.1527
  • Hosoya, T., Baccus, S., & Meister, M. (2005). Dynamic predictive coding by the retina. Nature, 436(7047), 71–77. doi:10.1038/nature03689
  • Howes, D. H. (1957). On the relation between the intelligibility and frequency of occurrence of English words. Journal of the Acoustical Society of America, 29, 296–305. doi:10.1121/1.1908862
  • Huang, Y., & Rao, R. P. N. (2011). Predictive coding. Wiley Interdisciplinary Reviews: Cognitive Science, 2, 580–593. doi:10.1002/wcs.142
  • Johnsrude, I. S., Mackey, A., Hakyemez, H., Alexander, E., Trang, H. P., & Carlyon, R. P. (2013). Swinging at a cocktail party: Voice familiarity aids speech perception in the presence of a competing voice. Psychological Science, 24, 1995–2004. doi:10.1177/0956797613482467
  • Kamide, Y., & Scheepers, C., & Altmann, G. T. M. (2003). Integration of syntactic and semantic information in predictive processing: Cross-linguistic evidence from German and English. Journal of Psycholinguistic Research, 32, 37–55. doi:10.1023/A:1021933015362
  • Kilner, J. M., Friston, K. J., & Frith, C. D. (2007). Predictive coding: An account of the mirror neuron system. Cognitive Processing, 8(3), 159–166. doi:10.1007/s10339-007-0170-2
  • Kleinschmidt, D. F., & Jaeger, T. F. (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review, 122(2), 148–203. doi:10.1037/a0038695
  • Larson, E., Billimoria, C. P., & Sen, K. (2009). A biologically plausible computational model for auditory object recognition. Journal of Neurophysiology, 101, 323–331. doi:10.1152/jn.90664.2008
  • Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of speech code. Psychological Review, 74, 431–461. doi:10.1037/h0020279
  • Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear & Hearing, 19, 1–36. doi:10.1097/00003446-199802000-00001
  • Magnuson, J. S., McMurray, B., Tanenhaus, M. K., & Aslin, R. N. (2003). Lexical effects on compensation for coarticulation: The ghost of Christmash past. Cognitive Science, 27, 285–298. doi:10.1016/S0364-0213(03)00004-1
  • Magnuson, J. S., Tanenhaus, M. K., & Aslin, R. N. (2008). Immediate effects of form-class constraints on spoken word recognition. Cognition, 108, 866–873. doi:10.1016/j.cognition.2008.06.005
  • Mann, V. A., & Repp, B. H. (1981). Influence of preceding fricative on stop consonant perception. Journal of the Acoustical Society of America, 69, 548–558. doi:10.1121/1.385483
  • Marr, D. (1982). Vision. San Francisco, CA: Freeman. ISBN 0-7167-1284-9
  • Maye, J., Aslin, R., & Tanenhaus, M. (2008). The Weckud Wetch of the wast: Lexical adaptation to a novel accent. Cognitive Science: A Multidisciplinary Journal, 32, 543–562. doi:10.1080/03640210802035357
  • McClelland, J. L. (1991). Stochastic interactive processes and the effect of context on perception. Cognitive Psychology, 23, 1–44. doi:10.1016/0010-0285(91)90002-6
  • McClelland, J. L. (2013). Integrating probabilistic models of perception and interactive neural networks: A historical and tutorial review. Frontiers in Psychology, 4, 503. doi:10.3389/fpsyg.2013.00503
  • McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1–86. doi:10.1016/0010-0285(86)90015-0
  • McClelland, J. L., Mirman, D., Bolger, D. J., & Khaitan, P. (2014). Interactive activation and mutual constraint satisfaction in perception and cognition. Cognitive Science, 38, 1139–1189. doi:10.1111/cogs.12146
  • McQueen, J. M. (1991a). The influence of the lexicon on phonetic categorization: Stimulus quality in word-final ambiguity. Journal of Experimental Psychology: Human Perception and Performance, 17, 433–443. doi:10.1037/0096-1523.17.2.433
  • McQueen, J. M. (1991b). Phonetic decisions and their relationship to the lexicon. Ph.D. dissertation, University of Cambridge.
  • McQueen, J. M. (2003). The ghost of Christmas future: Didn't scrooge learn to be good? Commentary on Magnuson, McMurray, Tanenhaus and Aslin (2003). Cognitive Science, 27, 795–799. doi:10.1207/s15516709cog2705_6
  • McQueen, J. M., Cutler, A., Briscoe, T., & Norris, D. (1995). Models of continuous speech recognition and the contents of the vocabulary. Language and Cognitive Processes, 10, 309–331. doi:10.1080/01690969508407098
  • McQueen, J. M., Cutler, A., & Norris, D. (2006). Phonological abstraction in the mental lexicon. Cognitive Science, 30, 1113–1126. doi:10.1207/s15516709cog0000_79
  • McQueen, J. M., Jesse, A., & Norris, D. (2009). No lexical–prelexical feedback during speech perception or: Is it time to stop playing those Christmas tapes? Journal of Memory and Language, 61, 1–18. doi:10.1016/j.jml.2009.03.002
  • McQueen, J. M., Tyler, M. D., & Cutler, A. (2012). Lexical retuning of children's speech perception: Evidence for knowledge about words’ component sounds. Language Learning and Development, 8, 317–339. doi:10.1080/15475441.2011.641887
  • Miller, J. L., Green, K., & Schermer, T. (1984). On the distinction between the effects of sentential speaking rate and semantic congruity on word identification. Perception & Psychophysics, 36, 329–337. doi:10.3758/BF03202785
  • Mumford, D. (1992). On the computational architecture of the neocortex – II The role of cortico-cortical loops. Biological Cybernetics, 66, 241–251. doi:10.1007/BF00198477
  • Myers, E. B., & Blumstein, S. E. (2008). The neural bases of the lexical effect: An fMRI investigation. Cerebral Cortex, 18, 278–288. doi:10.1093/cercor/bhm053
  • Norris, D. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52, 189–234. doi:10.1016/0010-0277(94)90043-4
  • Norris, D. (2006). The Bayesian reader: Explaining word recognition as an optimal Bayesian decision process. Psychological Review, 113, 327–357. doi:10.1037/0033-295X.113.2.327
  • Norris, D. (2009). Putting it all together: A unified account of word recognition and reaction-time distributions. Psychological Review, 116, 207–219. doi:10.1037/a0014259
  • Norris, D., & Kinoshita, S. (2012). Reading through a noisy channel: Why there's nothing special about the perception of orthography. Psychological Review, 119, 517–545. doi:10.1037/a0028450
  • Norris, D., & McQueen, J. M. (2008). Shortlist B: A Bayesian model of continuous speech recognition. Psychological Review, 115, 357–395. doi:10.1037/0033-295X.115.2.357
  • Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences, 23, 299–325. doi:10.1017/S0140525X00003241
  • Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive Psychology, 47, 204–238. doi:10.1016/S0010-0285(03)00006-9
  • Oliver, B. M. (1952). Efficient coding. Bell System Technical Journal, 31, 724–750. doi:10.1002/j.1538-7305.1952.tb01403.x
  • Pece, A. (2007). On the computational rationale for generative models. Computer Vision and Image Understanding, 106, 130–143. doi:10.1016/j.cviu.2006.10.002
  • Pelli, D. G., Burns, C. W., Farell, B., & Moore-Page, D. C. (2006). Feature detection and letter identification. Vision Research, 46, 4646–4674. doi:10.1016/j.visres.2006.04.023
  • Pitt, M. A., & McQueen, J. M. (1998). Is compensation for coarticulation mediated by the lexicon? Journal of Memory and Language, 39, 347–370. doi:10.1006/jmla.1998.2571
  • Poeppel, D., Idsardi, W. J., & van Wassenhove, V. (2008). Speech perception at the interface of neurobiology and linguistics. Philosophical Transactions of the Royal Society B-Biological Sciences, 363, 1071–1086. doi:10.1098/rstb.2007.2160
  • Poeppel, D., & Monahan, P. (2010). Feedforward and feedback in speech perception: Revisiting analysis by synthesis. Language and Cognitive Processes, 26, 935–951. doi:10.1080/01690965.2010.493301
  • Pollack, I., Rubenstein, H., & Decker, L. (1959). Intelligibility of known and unknown message sets. Journal of the Acoustical Society of America, 31, 273–279. doi:10.1121/1.1907712
  • Price, C. J. (2012). A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. Neuroimage, 62, 816–847. doi:10.1016/j.neuroimage.2012.04.062
  • Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–87. doi:10.1038/4580
  • Rao, R. P. N., & Ballard, D. H. (1997). Dynamic model of visual recognition predicts neural response properties in the visual cortex. Neural Computation, 9, 721–763. doi:10.1162/neco.1997.9.4.721
  • Rubin, P., Turvey, M. T., & van Gelder, P. (1976). Initial phonemes are detected faster in spoken words than in nonwords. Perception & Psychophysics, 19, 394–398. doi:10.3758/BF03199398
  • Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context effects in letter perception: II. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89(1), 60–94. doi:10.1037/0033-295X.89.1.60
  • Samuel, A. G. (1981). Phonemic restoration: Insights from a new methodology. Journal of Experimental Psychology: General, 110, 474–494. doi:10.1037/0096-3445.110.4.474
  • Samuel, A. G., & Kraljic, T. (2009). Perceptual learning in speech perception. Attention, Perception & Psychophysics, 71, 1207–1218. doi:10.3758/APP.71.6.1207
  • Samuel, A. G., & Pitt, M. A. (2003). Lexical activation (and other factors) can mediate compensation for coarticulation. Journal of Memory and Language, 48, 416–434. doi:10.1016/S0749-596X(02)00514-4
  • Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward architecture accounts for rapid categorization. Proceedings of the National Academy of Sciences of the United States of America, 104(15), 6424–6429. doi:10.1073/pnas.0700622104
  • Sjerps, M. J., & McQueen, J. M. (2010). The bounds of flexibility in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 36(1), 195–211. doi:10.1037/a0016803
  • Sohoglu, E., Peelle, J. E., Carlyon, R. P., & Davis, M. H. (2012). Predictive top-down integration of prior knowledge during speech perception. Journal of Neuroscience, 32(25), 8443–8453. doi:10.1523/JNEUROSCI.5069-11.2012
  • Sohoglu, E., Peelle, J. E., Carlyon, R. P., & Davis, M. H. (2014). Top-down influences of written text on perceived clarity of degraded speech. Journal of Experimental Psychology. Human Perception and Performance, 40(1), 186–199. doi:10.1037/a0033206
  • Srinivasan, M. V., Laughlin, S. B., & Dubs, A. (1982). Predictive coding: A fresh view of inhibition in the retina. Philosophical Transactions of the Royal Society B: Biological Sciences, 216, 427–459. doi:10.1098/rspb.1982.0085
  • Travis, K. E., Leonard, M. K., Chan, A. M., Torres, C., Sizemore, M. L., … Halgren, E. (2013). Independence of early speech processing from word meaning. Cerebral Cortex, 23, 2370–2379. doi:10.1093/cercor/bhs228
  • Ueno, T., Saito, S., Rogers, T. T., & Lambon Ralph, M. A. (2011). Lichtheim 2: Synthesizing aphasia and the neural basis of language in a neurocomputational model of the dual dorsal-ventral language pathways. Neuron, 72, 385–396. doi:10.1016/j.neuron.2011.09.013
  • Van Alphen, P., & McQueen, J. M. (2001). The time-limited influence of sentential context on function word identification. Journal of Experimental Psychology: Human Perception and Performance, 27, 1057–1071. doi:10.1037/0096-1523.27.5.1057
  • Van Berkum, J. J. A., Brown, C. M., Zwitserlood, P., Kooijman, V., & Hagoort, P. (2005). Anticipating upcoming words in discourse: Evidence from ERPs and reading times. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 443–467. doi:10.1037/0278-7393.31.3.443
  • Van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102, 1181–1186. doi:10.1073/pnas.0408949102
  • Warren, R. M., Obusek, C. J., & Ackroff, J. M. (1972). Auditory induction: Perceptual synthesis of absent sounds. Science, 176, 1149–1151. doi:10.1126/science.176.4039.1149
  • Yildiz, I. B., & Kiebel, S. J. (2011). A hierarchical neuronal model for generation and online recognition of birdsongs. PloS Computational Biology, 7, e1002303. doi:10.1371/journal.pcbi.1002303
  • Yildiz, I. B., von Kriegstein, K., & Kiebel, S. J. (2013). From birdsong to human speech recognition: Bayesian inference on a hierarchy of nonlinear dynamical systems. PLoS Computational Biology, 9, e1003219. doi:10.1371/journal.pcbi.1003219
  • Yuille, A. L. (1991). Deformable templates for face recognition. Journal of Cognitive Neuroscience, 3, 59–70. doi:10.1162/jocn.1991.3.1.59
  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. Advances in Neural Information Processing, 27, 487–495.