In the course of trial-and-error learning, the results of actions, manifested as rewards or punishments, occur often seconds after the actions that caused them. How can a reward be associated with an earlier action when the neural activity that caused that action is no longer present in the network? This problem is referred to as the distal reward problem. A recent computational study proposes a solution using modulated plasticity with spiking neurons and argues that precise firing patterns in the millisecond range are essential for such a solution. In contrast, the study reported in this letter shows that it is the rarity of correlating neural activity, and not the spike timing, that allows the network to solve the distal reward problem. In this study, rare correlations are detected in a standard rate-based computational model by means of a threshold-augmented Hebbian rule. The novel modulated plasticity rule allows a randomly connected network to learn in classical and instrumental conditioning scenarios with delayed rewards. The rarity of correlations is shown to be a pivotal factor in the learning and in handling various delays of the reward. This study additionally suggests the hypothesis that short-term synaptic plasticity may implement eligibility traces and thereby serve as a selection mechanism in promoting candidate synapses for long-term storage.

You do not currently have access to this content.