Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-8 of 8
Rafal Bogacz
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Neural Computation (2022) 34 (2): 307–337.
Published: 14 January 2022
Abstract
View article
PDF
Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (5): 1229–1262.
Published: 01 May 2017
FIGURES
| View All (7)
Abstract
View article
PDF
To efficiently learn from feedback, cortical networks need to update synaptic weights on multiple levels of cortical hierarchy. An effective and well-known algorithm for computing such changes in synaptic weights is the error backpropagation algorithm. However, in this algorithm, the change in synaptic weights is a complex function of weights and activities of neurons not directly connected with the synapse being modified, whereas the changes in biological synapses are determined only by the activity of presynaptic and postsynaptic neurons. Several models have been proposed that approximate the backpropagation algorithm with local synaptic plasticity, but these models require complex external control over the network or relatively complex plasticity rules. Here we show that a network developed in the predictive coding framework can efficiently perform supervised learning fully autonomously, employing only simple local Hebbian plasticity. Furthermore, for certain parameters, the weight change in the predictive coding model converges to that of the backpropagation algorithm. This suggests that it is possible for cortical networks with simple Hebbian synaptic plasticity to implement efficient learning algorithms in which synapses in areas on multiple levels of hierarchy are modified to minimize the error on the output.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2017) 29 (2): 368–393.
Published: 01 February 2017
FIGURES
| View All (43)
Abstract
View article
PDF
Much experimental evidence suggests that during decision making, neural circuits accumulate evidence supporting alternative options. A computational model well describing this accumulation for choices between two options assumes that the brain integrates the log ratios of the likelihoods of the sensory inputs given the two options. Several models have been proposed for how neural circuits can learn these log-likelihood ratios from experience, but all of these models introduced novel and specially dedicated synaptic plasticity rules. Here we show that for a certain wide class of tasks, the log-likelihood ratios are approximately linearly proportional to the expected rewards for selecting actions. Therefore, a simple model based on standard reinforcement learning rules is able to estimate the log-likelihood ratios from experience and on each trial accumulate the log-likelihood ratios associated with presented stimuli while selecting an action. The simulations of the model replicate experimental data on both behavior and neural activity in tasks requiring accumulation of probabilistic cues. Our results suggest that there is no need for the brain to support dedicated plasticity rules, as the standard mechanisms proposed to describe reinforcement learning can enable the neural circuits to perform efficient probabilistic inference.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (4): 909–926.
Published: 01 April 2011
FIGURES
| View All (4)
Abstract
View article
PDF
Psychological experiments have shown that the capacity of the brain for discriminating visual stimuli as novel or familiar is almost limitless. Neurobiological studies have established that the perirhinal cortex is critically involved in both familiarity discrimination and feature extraction. However, opinion is divided as to whether these two processes are performed by the same neurons. Previously proposed models have been unable to simultaneously extract features and discriminate familiarity for large numbers of stimuli. We show that a well-known model of visual feature extraction, Infomax, can simultaneously perform familiarity discrimination and feature extraction efficiently. This model has a significantly larger capacity than previously proposed models combining these two processes, particularly when correlation exists between inputs, as is the case in the perirhinal cortex. Furthermore, we show that once the model fully extracts features, its ability to perform familiarity discrimination increases markedly.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2011) 23 (4): 817–851.
Published: 01 April 2011
FIGURES
| View All (10)
Abstract
View article
PDF
This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of corico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (5): 1113–1148.
Published: 01 May 2010
FIGURES
| View All (6)
Abstract
View article
PDF
Experimental data indicate that perceptual decision making involves integration of sensory evidence in certain cortical areas. Theoretical studies have proposed that the computation in neural decision circuits approximates statistically optimal decision procedures (e.g., sequential probability ratio test) that maximize the reward rate in sequential choice tasks. However, these previous studies assumed that the sensory evidence was represented by continuous values from gaussian distributions with the same variance across alternatives. In this article, we make a more realistic assumption that sensory evidence is represented in spike trains described by the Poisson processes, which naturally satisfy the mean-variance relationship observed in sensory neurons. We show that for such a representation, the neural circuits involving cortical integrators and basal ganglia can approximate the optimal decision procedures for two and multiple alternative choice tasks.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2010) 22 (5): 1149–1179.
Published: 01 May 2010
FIGURES
| View All (7)
Abstract
View article
PDF
Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn from a distribution that depends on that state. However, in any natural environment, the stimulus is noisy. When there is state uncertainty, it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a state of the environment. This letter addresses the problem of incorporating state uncertainty in reinforcement learning models. We show that simply ignoring the uncertainty and allocating the reward to the most likely state of the environment results in incorrect value estimates. Furthermore, using only the information that is available before observing the reward also results in incorrect estimates. We therefore introduce a new technique, posterior weighted reinforcement learning, in which the estimates of state probabilities are updated according to the observed rewards (e.g., if a learner observes a reward usually associated with a particular state, this state becomes more likely). We show analytically that this modified algorithm can converge to correct reward estimates and confirm this with numerical experiments. The algorithm is shown to be a variant of the expectation-maximization algorithm, allowing rigorous convergence analyses to be carried out. A possible neural implementation of the algorithm in the cortico-basal-ganglia-thalamic network is presented, and experimental predictions of our model are discussed.
Journal Articles
Publisher: Journals Gateway
Neural Computation (2007) 19 (2): 442–477.
Published: 01 February 2007
Abstract
View article
PDF
Neurophysiological studies have identified a number of brain regions critically involved in solving the problem of action selection or decision making. In the case of highly practiced tasks, these regions include cortical areas hypothesized to integrate evidence supporting alternative actions and the basal ganglia, hypothesized to act as a central switch in gating behavioral requests. However, despite our relatively detailed knowledge of basal ganglia biology and its connectivity with the cortex and numerical simulation studies demonstrating selective function, no formal theoretical framework exists that supplies an algorithmic description of these circuits. This article shows how many aspects of the anatomy and physiology of the circuit involving the cortex and basal ganglia are exactly those required to implement the computation defined by an asymptotically optimal statistical test for decision making: the multihypothesis sequential probability ratio test (MSPRT). The resulting model of basal ganglia provides a new framework for understanding the computation in the basal ganglia during decision making in highly practiced tasks. The predictions of the theory concerning the properties of particular neuronal populations are validated in existing experimental data. Further, we show that this neurobiologically grounded implementation of MSPRT outperforms other candidates for neural decision making, that it is structurally and parametrically robust, and that it can accommodate cortical mechanisms for decision making in a way that complements those in basal ganglia.