443
Views
3
CrossRef citations to date
0
Altmetric
Forthcoming Special Issue on: Visual Search and Selective Attention

Modelling attention control using a convolutional neural network designed after the ventral visual pathway

, , &
Pages 416-434 | Received 04 Mar 2019, Accepted 12 Aug 2019, Published online: 05 Sep 2019

References

  • Adeli, H., Vitu, F., & Zelinsky, G. J. (2017). A model of the superior colliculus predicts fixation locations during scene viewing and visual search. The Journal of Neuroscience, 37(6), 1453–1467. doi: 10.1523/JNEUROSCI.0825-16.2016
  • Adeli, H., & Zelinsky, G. (2018). Deep-BCN: Deep networks meet biased competition to create a brain-inspired model of attention control. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRw) (pp. 1932–1942).
  • Allport, D. A. (1980). Attention and performance. In G. Claxton (Ed.), Cognitive psychology (pp. 112–153). London: Routledge & Kegan Paul.
  • Ballard, D. H., & Hayhoe, M. M. (2009). Modelling the role of task in the control of gaze. Visual Cognition, 17, 1185–1204. doi: 10.1080/13506280902978477
  • Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of computer vision and pattern recognition (CVPR 2017).
  • Beck, D., Pinsk, M., & Kastner, S. (2005). Symmetry perception in humans and macaques. Trends in Cognitive Sciences, 9, 405–406. doi: 10.1016/j.tics.2005.07.002
  • Brewer, A., Liu, J., Wade, A., & Wandell, B. (2005). Visual field maps and stimulus selectivity in human ventral occipital cortex. Nature Neuroscience, 8, 1102–1109. doi: 10.1038/nn1507
  • Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., & Durand, F. (2016). What do different evaluation metrics tell us about saliency models? arXiv:1604.03605.
  • Cadieu, C., Hong, H., Yamins, D., Pinto, N., Ardila, D., Solomon, E., … Bethge, M. (2014). Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Computational Biology, 10, e1003963. doi: 10.1371/journal.pcbi.1003963
  • Cadieu, C., Kouh, M., Pasupathy, A., Connor, C., Riesenhuber, M., & Poggio, T. (2007). A model of V4 shape selectivity and invariance. Journal of Neurophysiology, 98, 1733–1750. doi: 10.1152/jn.01265.2006
  • Canziani, A., Culurciello, E., & Paszke, A. (2017). An analysis of deep neural network models for practical applications. arXiv:1605.07678v4.
  • Cohen, M. A., Alvarez, G. A., Nakayama, K., & Konkle, T. (2016). Visual search for object categories is predicted by the representational architecture of high-level visual cortex. Journal of Neurophysiology, 117, 388–402. doi: 10.1152/jn.00569.2016
  • Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of the European conference on computer vision (pp. 1–22).
  • Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of Neuroscience, 18, 193–222. doi: 10.1146/annurev.ne.18.030195.001205
  • Desimone, R., Schein, S., Moran, J., & Ungerleider, L. (1985). Contour, color and shape analysis beyond the striate cortex. Vision Research, 25, 441–452. doi: 10.1016/0042-6989(85)90069-0
  • Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 36, 1827–1837. doi: 10.1016/0042-6989(95)00294-4
  • DiCarlo, J., & Cox, D. (2007). Untangling invariant object recognition. Trends in Cognitive Sciences, 11, 333–341. doi: 10.1016/j.tics.2007.06.010
  • Einhäuser, W., Spain, M., & Perona, P. (2008). Objects predict fixations better than early saliency. Journal of Vision, 8(14), 18–18. doi: 10.1167/8.14.18
  • Engel, S., Glover, G., & Wandell, B. (1997). Retinotopic organization in human visual cortex and the spatial precision of functional MRI. Cerebral Cortex, 7, 181–192. doi: 10.1093/cercor/7.2.181
  • Fize, D., Vandeffel, W., Nelissen, K., Denys, K., d’Hotel, C. C., Faugeras, O., & Orban, G. (2003). The retinotopic organization of primate dorsal v4 and surrounding areas: A functional magnetic resonance imaging study in awake monkeys. The Journal of Neuroscience, 23, 7395–7406. doi: 10.1523/JNEUROSCI.23-19-07395.2003
  • Freeman, J., & Simoncelli, E. (2011). Metamers of the ventral stream. Nature Neuroscience, 14, 1195–1201. doi: 10.1038/nn.2889
  • Gattas, R., Sousa, A., Mishkin, M., & Ungerleider, L. (1997). Cortical projections of area v2 in the macaque. Cerebral Cortex, 7, 110–129. doi: 10.1093/cercor/7.2.110
  • Grill-Spector, K., Weiner, K. S., Gomez, J., Stigliani, A., & Natu, V. S. (2018). The functional neuroanatomy of face perception: From brain measurements to deep neural networks. Interface Focus, 8, 20180013. doi: 10.1098/rsfs.2018.0013
  • Harvey, B., & Dumoulin, S. (2011). The relationship between cortical magnification factor and population receptive field size in human visual cortex: Constancies in cortical architecture. Journal of Neuroscience, 31, 13604–13612. doi: 10.1523/JNEUROSCI.2572-11.2011
  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: surpassing human-level performance on imagenet classification In Proceedings of the international conference on computer vision (CVPR) (pp. 1026–1034).
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition (CVPR 2016) (pp. 770–778).
  • Hong, H., Yamins, D., Majaj, N., & DiCarlo, J. (2016). Explicit information for category-orthogonal object properties increases along the ventral stream. Nature Neuroscience, 19, 613–622. doi: 10.1038/nn.4247
  • Hout, M. C., Robbins, A., Godwin, H. J., Fitzsimmons, G., & Scarince, C. (2017). Categorical templates are more useful when features are consistent: Evidence from eye-movements during search for societally important vehicles. Attention, Perception, & Psychophysics, 79, 1578–1592. doi: 10.3758/s13414-017-1354-1
  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700–4708).
  • Huang, X., Shen, C., Boix, X., & Zhao, Q. (2015). Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In Proceedings of the international conference on computer vision (ICCV).
  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning.
  • Kastner, S., Weerd, P., Desimone, R., & Ungerleider, L. (1998). Mechanisms of directed attention in the human extrastriate cortex as revealed by functional MRI. Science, 282, 108–111. doi: 10.1126/science.282.5386.108
  • Kastner, S., Weerd, P., Pinsk, M., Elizondo, M., Desimone, R., & Ungerleider, L. (2001). Modulation of sensory suppression: Implications for receptive field sizes in the human visual cortex. Journal of Neurophysiology, 86, 1398–1411. doi: 10.1152/jn.2001.86.3.1398
  • Khaligh-Razavi, S., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10, e1003915. doi: 10.1371/journal.pcbi.1003915
  • Kietzmann, T. C., McClure, P., & Kriegeskorte, N. (2019). Deep neural networks in computational neuroscience. Oxford Research Encyclopaedia of Neuroscience. doi: 10.1093/acrefore/9780190264086.013.46
  • Kobatake, E., & Tanaka, K. (1994). Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. Journal of Neurophysiology, 71, 856–867. doi: 10.1152/jn.1994.71.3.856
  • Kravitz, D., Kadharbatcha, S., Baker, C., Ungerleider, L., & Mishkin, M. (2013). The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17, 26–49. doi: 10.1016/j.tics.2012.10.011
  • Kriegeskorte, N. (2015). Deep neural networks: A new framework for modelling biological vision and brain information processing. Annual Review of Vision Science, 1, 417–446. doi: 10.1146/annurev-vision-082114-035447
  • Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105). Red Hook, NY: Curran Associates Inc.
  • Larsson, J., & Heeger, D. (2006). Two retinotopic visual areas in human lateral occipital cortex. Journal of Neuroscience, 26, 13128–13142. doi: 10.1523/JNEUROSCI.1657-06.2006
  • Li, M., & Tsien, J. Z. (2017). Neural code—neural self-information theory on how cell-assembly code rises from spike time and neuronal variability. Frontiers in Cellular Neuroscience, 11, 236. doi: 10.3389/fncel.2017.00236
  • Li, M., Xie, K., Kuang, H., Liu, J., Wang, D., & Fox, G. (2017). Spike-timing patterns conform to a gamma distribution with regional and cell type-specific characteristics. BioRxiv:145813.
  • Li, G., & Yu, Y. (2015). Visual saliency based on multiscale deep features. In IEEE conference on computer vision and pattern recognition (CVPR 2015).
  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110. doi: 10.1023/B:VISI.0000029664.99615.94
  • Maxfield, J. T., Stalder, W. D., & Zelinsky, G. J. (2014). Effects of target typicality on categorical search. Journal of Vision, 14(12), 1–11. doi: 10.1167/14.12.1
  • Maxfield, J. T., & Zelinsky, G. J. (2012). Searching through the hierarchy: How level of target categorization affects visual search. Visual Cognition, 20(10), 1153–1163. doi: 10.1080/13506285.2012.735718
  • McKeefry, D., & Zeki, S. (1997). The position and topography of the human colour centre as revealed by functional magnetic resonance imaging. Brain, 120, 2229–2242. doi: 10.1093/brain/120.12.2229
  • Mishkin, M., Ungerleider, L., & Macko, K. (1983). Object vision and spatial vision: Two cortical pathways. Trends in Neurosciences, 6, 414–417. doi: 10.1016/0166-2236(83)90190-X
  • Nakamura, H., Gattass, R., Desimone, R., & Ungerleider, L. (1993). The modular organization of projections from areas v1 and v2 to areas v4 and teo in macaques. The Journal of Neuroscience, 13, 3681–3691. doi: 10.1523/JNEUROSCI.13-09-03681.1993
  • Nako, R., Wu, R., & Eimer, M. (2014). Rapid guidance of visual search by object categories. Journal of Experimental Psychology: Human Perception and Performance, 40(1), 50–60.
  • Nassi, J. J., & Callaway, E. (2009). Parallel processing strategies of the primate visual system. Nature Reviews Neuroscience, 10, 360–372. doi: 10.1038/nrn2619
  • Neider, M. B., & Zelinsky, G. J. (2006). Scene context guides eye movements during visual search. Vision Research, 46, 614–621. doi: 10.1016/j.visres.2005.08.025
  • Nelson, W. W., & Loftus, G. R. (1980). The functional visual field during picture viewing. Journal of Experimental Psychology: Human Learning and Memory, 6(4), 391–399.
  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–175. doi: 10.1023/A:1011139631724
  • Orban, G., Zhu, Q., & Vanduffel, W. (2014). The transition in the ventral stream from feature to real-world entity representations. Frontiers in Psychology, 5, 695.
  • Pasupathy, A., & Connor, C. (2002). Population coding of shape in area V4. Nature Neuroscience, 5(12), 1332–1338. doi: 10.1038/972
  • Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences, 108(29), 12125–12130. doi: 10.1073/pnas.1101042108
  • Rajimehr, R., Young, J., & Tootell, R. (2009). An anterior temporal face patch in human cortex, predicted by macaque maps. Proceedings of the National Academy of Sciences, 106, 1995–2000. doi: 10.1073/pnas.0807304106
  • Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Computer vision and pattern recognition workshops.
  • Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. doi: 10.1038/14819
  • Rousselet, G., Thorpe, S., & Fabre-Thorpe, M. (2004). How parallel is visual processing in the ventral pathway? Trends in Cognitive Sciences, 8, 363–370. doi: 10.1016/j.tics.2004.06.003
  • Russakovsky, O., Deng, J., Su, H., Jrause, J., Satheesh, S., Ma, S., … Li, F. F. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211–252. doi: 10.1007/s11263-015-0816-y
  • Sanchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision, 105, 222–245. doi: 10.1007/s11263-013-0636-x
  • Schmidt, J., & Zelinsky, G. J. (2009). Search guidance is proportional to the categorical specificity of a target cue. Quarterly Journal of Experimental Psychology, 62(10), 1904–1914. doi: 10.1080/17470210902853530
  • Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., & Poggio, T. (2007). A quantitative theory of immediate visual recognition. Progress in Brain Research, 165, 33–56. doi: 10.1016/S0079-6123(06)65004-8
  • Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward architecture accounts for rapid categorization. Proceedings of the National Academy of Sciences, 104, 6424–6429. doi: 10.1073/pnas.0700622104
  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR 2015).
  • Smith, A., Williams, A., & Greenlee, M. (2001). Estimating receptive field size from fMRI data in human striate and extrastriate visual cortex. Cerebral Cortex, 11, 1182–1190. doi: 10.1093/cercor/11.12.1182
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., & Reed, S. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition (CVPR 2015).
  • Tanaka, K. (1997). Mechanisms of visual object recognition: Monkey and human studies. Current Opinion in Neurobiology, 7, 523–529. doi: 10.1016/S0959-4388(97)80032-3
  • Tarr, M. (1999). News on views: Pandemonium revisited. Nature Neuroscience, 2, 932–935. doi: 10.1038/14714
  • Thorpe, S. J., Gegenfurtner, K. R., Fabre-Thorpe, M., & Bülthoff, H. H. (2001). Detection of animals in natural images using far peripheral vision. European Journal of Neuroscience, 14, 869–876. doi: 10.1046/j.0953-816x.2001.01717.x
  • Tsotsos, J. K., Culhane, S. M., Wai, W. Y. K., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78, 507–545. doi: 10.1016/0004-3702(95)00025-9
  • Ungerleider, L., Galkin, T., Desimone, R., & Gattass, R. (2007). Cortical connections of area v4 in the macaque. Cerebral Cortex, 18, 477–499. doi: 10.1093/cercor/bhm061
  • Van Essen, D. C., Lewis, J., Drury, H., Hadjikhani, N., Tootell, R., Bakircioglu, M., & Miller, M. (2001). Mapping visual cortex in monkeys and humans using surface-based atlases. Vision Research, 41, 1359–1378. doi: 10.1016/S0042-6989(01)00045-1
  • Vicente, T., Hoai, M., & Samaras, D. (2015). Leave-one-out kernel optimization for shadow detection In Proceedings of the international conference on computer vision (ICCV) (pp. 3388–3396).
  • Wade, A., Brewer, A., Rieger, J., & Wandell, B. (2002). Functional measurements of human ventral occipital cortex: Retinotopy and colour. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 357, 963–973. doi: 10.1098/rstb.2002.1108
  • Wang, W., & Shen, J. (2017). Deep visual attention prediction. arXiv:1705.02544.
  • Wolfe, J. M. (1994). Guided search 2.0: A revised model of visual search. Psychonomic Bulletin and Review, 1, 202–238. doi: 10.3758/BF03200774
  • Yamins, D., Hong, H., Cadieu, C., Solomon, E., Seibert, D., & DiCarlo, J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111, 8619–8624. doi: 10.1073/pnas.1403112111
  • Yang, H., & Zelinsky, G. J. (2009). Visual search is guided to categorically-defined targets. Vision Research, 49, 2095–2103. doi: 10.1016/j.visres.2009.05.017
  • Yu, C.-P., Hua, W.-Y., Samaras, D., & Zelinsky, G. J. (2013). Modeling clutter perception using parametric proto-object partitioning. Proceedings of the 26th Conference on Advances in Neural Information Processing Systems (NIPS 2013).
  • Yu, C.-P., Le, H., Zelinsky, G. Z., & Samaras, D. (2015). Efficient video segmentation using parametric graph partitioning. International conference on computer vision (ICCV).
  • Yu, C.-P., Maxfield, J. T., & Zelinsky, G. J. (2016). Searching for category-consistent features: A computational approach to understanding visual category representation. Psychological Science, 27(6), 870–884. doi: 10.1177/0956797616640237
  • Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146.
  • Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of the European conference on computer vision (ECCV 2014).
  • Zelinsky, G. J. (2008). A theory of eye movements during target acquisition. Psychological Review, 115(4), 787–835. doi: 10.1037/a0013118
  • Zelinsky, G. J., Adeli, H., Peng, Y., & Samaras, D. (2013). Modelling eye movements in a categorical search task. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1628), 1–12. doi: 10.1098/rstb.2013.0058
  • Zelinsky, G. J., Peng, Y., Berg, A. C., & Samaras, D. (2013). Modeling guidance and recognition in categorical search: Bridging human and computer object detection. Journal of Vision, 13(3), 30, 1–20. doi: 10.1167/13.3.30
  • Zelinsky, G. J., Peng, Y., & Samaras, D. (2013). Eye can read your mind: Using eye fixations to classify search targets. Journal of Vision, 13(14), 10, 1–13. doi: 10.1167/13.14.10
  • Zhang, M., Feng, J., Ma, K. T., Lim, J. H., Zhao, Q., & Kreiman, G. (2018). Finding any Waldo: Zero-shot invariant and efficient visual search. Nature Communications, 9, 3730. doi: 10.1038/s41467-018-06217-x
  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2014). Object detectors emerge in deep scene CNNs. arXiv:1412.6856.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.