Abstract
Reward-based learning in neural systems is challenging because a large number of parameters that affect network function must be optimized solely on the basis of a reward signal that indicates improved performance. Searching the parameter space for an optimal solution is particularly difficult if the network is large. We show that Hebbian forms of synaptic plasticity applied to synapses between a supervisor circuit and the network it is controlling can effectively reduce the dimension of the space of parameters being searched to support efficient reinforcement-based learning in large networks. The critical element is that the connections between the supervisor units and the network must be reciprocal. Once the appropriate connections have been set up by Hebbian plasticity, a reinforcement-based learning procedure leads to rapid learning in a function approximation task. Hebbian plasticity within the network being supervised ultimately allows the network to perform the task without input from the supervisor.