284
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

A predictive coding model of gaze shifts and the underlying neurophysiology

Pages 770-801 | Received 13 Dec 2016, Accepted 16 May 2017, Published online: 18 Jul 2017
 

ABSTRACT

A comprehensive model of gaze control must account for a number of empirical observations at both the behavioural and neurophysiological levels. The computational model presented in this article can simulate the coordinated movements of the eye, head, and body required to perform horizontal gaze shifts. In doing so it reproduces the predictable relationships between the movements performed by these different degrees of freedom (DOFs) in the primate. The model also accounts for the saccadic undershoot that accompanies large gaze shifts in the biological visual system. It can also account for our perception of a stable external world despite frequent gaze shifts and the ability to perform accurate memory-guided and double-step saccades. The proposed model also simulates peri-saccadic compression: the mis-localization of a briefly presented visual stimulus towards the location that is the target for a saccade. At the neurophysiological level, the proposed model is consistent with the existence of cortical neurons tuned to the retinal, head-centred, body-centred, and world-centred locations of visual stimuli and cortical neurons that have gain-modulated responses to visual stimuli. Finally, the model also successfully accounts for peri-saccadic receptive field (RF) remapping which results in reduced responses to stimuli in the current RF location and an increased sensitivity to stimuli appearing at the location that will be occupied by the RF after the saccade. The proposed model thus offers a unified explanation for this seemingly diverse range of phenomena. Furthermore, as the proposed model is an implementation of the predictive coding theory, it offers a single computational explanation for these phenomena and relates gaze shifts to a wider framework for understanding cortical function.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1 The proposed model is concerned purely with kinematics and does not consider dynamics. It is assumed that the movements planned by the proposed model are carried out by brain circuitry that has not been explicitly modelled here. For example, by the cerebellum (Houk, Buckingham, & Barto, Citation1996; Kawato, Citation1995), which appears to implement a closed-loop motor control system containing both forward and inverse models (Wolpert & Kawato, Citation1998; Wolpert, Miall, & Kawato, Citation1998). This split between action planning and execution is consistent with previous work modelling the biological basis of motor control (Flash & Sejnowski, Citation2001).

2 The synaptic weights of the PC/BC-DIM network have been hard-coded rather than learnt. While this article does not consider learning, it is possible to speculate as to how the weights could be learnt. Consider the first processing stage. The prediction neurons need to learn RFs that represent combinations of retinal input and eye position so as to tile this joint input space. This can be achieved using an unsupervised learning algorithm trained using randomly positioned visual targets and random eye movements, as has been shown in De Meyer and Spratling (Citation2011). The reconstruction neurons in the third partition of the first processing stage need to learn strong connections to all the prediction neurons that represent the same head-centred location. This could potentially be achieved using an unsupervised learning rule that forms associations across time (Földiák, Citation1991; O'Reilly & McClelland, Citation1992; Spratling, Citation2005; Templeman & Loew, Citation1989; Wallis, Citation1996). Specifically, one reconstruction neuron could be connected to all the prediction neurons that represent a single head-centred location by training with a stationary visual target and random eye movements, and this process repeated with different target positions in order to train all the reconstruction neurons. This has been demonstrated by Spratling (Citation2009) for a PC/BC-DIM architecture where the head-centred representation is learnt by pooling neurons, as shown in (a), rather than reconstruction neurons. Once the weights for the first processing stage have been learnt, an analogous method could be used to train the second and subsequent processing stages in a greedy layer-wise process, as is typically used to train deep neural networks (Hinton, Osindero, & Teh, Citation2006; Larochelle, Bengio, Louradour, & Lamblin, Citation2009). Such a process might be facilitated in the human visual system by the sequential development of motor control in young infants, in which eye movements occur before the mastery of head and body movements, and head movements are performed before body movements.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.