Abstract
Early vision is best understood in terms of two key information bottlenecks along the visual pathway — the optic nerve and, more severely, attention. Two effective strategies for sampling and representing visual inputs in the light of the bottlenecks are () data compression with minimum information loss and () data deletion. This paper reviews two lines of theoretical work which understand processes in retina and primary visual cortex (V1) in this framework. The first is an efficient coding principle which argues that early visual processes compress input into a more efficient form to transmit as much information as possible through channels of limited capacity. It can explain the properties of visual sampling and the nature of the receptive fields of retina and V1. It has also been argued to reveal the independent causes of the inputs. The second theoretical tack is the hypothesis that neural activities in V1 represent the bottom up saliencies of visual inputs, such that information can be selected for, or discarded from, detailed or attentive processing. This theory links V1 physiology with pre-attentive visual selection behavior. By making experimentally testable predictions, the potentials and limitations of both sets of theories can be explored.
Notes
Notes
[1] For correlation matrix Ro of output O and correlation matrix RN of the output noise , the transform changes the correlation matrix and . However, note from Equations (7) and (6) that cost = = Tr (Ro ), where Tr(.) denotes the trace of a matrix, and , where det (.) denotes the determinant of a matrix. Since for any matrix M, Tr (M) = Tr and for any rotation or unitary matrix (with ), E = cost − λ I (O; S) is invariant to .
[2] The symmetry holds when the cost is or H(O), but not ∑i H(Oi) except in the noiseless case. Given finite noise, the cost of ∑i H(Oi) would break the symmetry to a preferred as the identity matrix, giving zero second order correlation between output channels. The fact that early vision does not usually have the identity suggests that the cost is more likely output power than ∑i H(Oi). For instance, the retinal coding maximizes second order output correlation given and I(O;S) in Gaussian approximation, perhaps aiding signal recovery.
[3] As discussed in Li (Citation1996), V1 could have many different copies (where superscript p identifies the particular copy) of complete representation of S, such that each copy has as many cells (or dimensions) as the input S, and is associated with a particular choice of unitary matrix . Each choice specifies a particular set of preferred orientations, colors, motion directions, etc. of the resulting RFs whose responses constitute Op, such that the whole representation () covers a whole spectrum of feature selectivities to span these feature dimensions (although the gain matrix assigns different sensitivities, some very small, to different feature values and their combinations). In reality, the V1 representation is more like a tight frame of high redundant ratio (Daubechies Citation1992; Lee Citation1996; Salinas and Abbott Citation2000) than a collection of complete representations (from the degenerate class), which would require (Li and Atick Citation1994a), in addition to the oriented RFs, checker shaped RFs not typically observed physiologically.