## Abstract

The basal ganglia are a subcortical group of interconnected nuclei involved in mediating action selection within cortex. A recent proposal is that this selection leads to optimal decision making over multiple alternatives because the basal ganglia anatomy maps onto a network implementation of an optimal statistical method for hypothesis testing, assuming that cortical activity encodes evidence for constrained gaussian-distributed alternatives. This letter demonstrates that this model of the basal ganglia extends naturally to encompass general Bayesian sequential analysis over arbitrary probability distributions, which raises the proposal to a practically realizable theory over generic perceptual hypotheses. We also show that the evidence in this model can represent either log likelihoods, log-likelihood ratios, or log odds, all leading proposals for the cortical processing of sensory data. For these reasons, we claim that the basal ganglia optimize decision making over general perceptual hypotheses represented in cortex. The relation of this theory to cortical encoding, cortico-basal ganglia anatomy, and reinforcement learning is discussed.

## 1. Introduction

Two lines of evidence are converging on an understanding of animal perception as statistically optimal sensory processing by the brain. First, perception is considered Bayesian inference from noisy and ambiguous sensations (Knill & Pouget, 2004; Kersten, Mamassian, & Yuille, 2004). Second, the decisions resulting from this inference are considered optimal in terms of minimizing the costs of making mistakes plus the costs of waiting to gather more sensory data to improve the accuracy of the decision (Gold & Shadlen, 2007; Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006). For instance, a notable series of experiments considers neuronal activity in lateral intraparietal cortex as monkeys make perceptual judgments of the direction of motion for a group of random dots and finds individual neurons that noisily ramp up their firing rates until reaching a threshold when a decision is made (Platt & Glimcher, 1999; Huk & Shadlen, 2005). For two alternatives, these processes appear well described by a statistically optimal procedure known as the sequential probability ratio test (SPRT) (Gold & Shadlen, 2001, 2007; Bogacz et al., 2006), which accumulates evidence representing a log-likelihood ratio for a competing pair of perceptual hypotheses over the time series of sensory data until reaching a preset decision threshold (Wald & Wolfowitz, 1948).

Bogacz and Gurney (2007) showed that a key subset of the cortical and basal ganglia anatomy (see Figure 1) has the appropriate connectivity to implement an optimal statistical technique for hypothesis testing over multiple perceptual alternatives. They considered a simplified version of the multi-hypothesis sequential probability ratio test (MSPRT) (Dragalin, Tartakovsky, & Veeravalli, 1999) in which the samples of sensory data were gaussian distributed and the alternatives were such that one was supported if a sequence of sensory data had the highest sample mean and all others had equal lower means. This statistical test has been proved asymptotically optimal for fixed decision thresholds, in that as the error rates tended to zero (with increasing sample numbers), it minimized the overall cost of making mistakes and delaying the decisions (Dragalin et al., 1999). In consequence, Bogacz and Gurney (2007) claimed that the basal ganglia and cortex implement optimal decision making.

The aim of this letter is to demonstrate that this proposal applies to general distributions of sensory data, thereby raising it to a practically realizable theory of cortico-basal ganglia function over general perceptual hypotheses. Our arguments rest on two new mathematical results. First is that the simplified version of MSPRT derived by Bogacz and Gurney has an identical algebraic form to generic Bayesian sequential analysis over evidence representing arbitrary log-likelihood functions of the sensory data. Second, this decision-making algorithm is agnostic to whether sensory evidence represents log likelihoods or log-likelihood ratios. In consequence, the original relation between the basal ganglia and an optimal decision-making algorithm extends to decision making over evidence encoded by either general log likelihoods, log-likelihood ratios, or log odds, which includes both generic Bayesian sequential analysis and the general form of MSPRT. All of these types of evidence underlie leading proposals for the cortical encoding of sensory data: Gold and Shadlen (2001) and Yang and Shadlen (2007) for log-likelihood ratios; Jazayeri and Movshon (2006), Graf, Kohn, Jazayeri, and Movshon (2011) for log likelihoods, and Deneve (2008) and Zhang and Maloney (2012) for log odds. For these reasons, we propose that the function of the basal ganglia is to optimize decision making over general perceptual hypotheses represented in cortex.

## 2. Optimal Decision Making by the Basal Ganglia for Gaussian-Distributed Alternatives

### 2.1. A Simplified MSPRT for Gaussian Distributions.

**= (**

*s**s*

_{1}, …,

*s*) sampled simultaneously over time, with the decision to find which channel was activated by a sensory stimulus. The evidence for each alternative was given by a log-likelihood ratio between two equal-width gaussian distributions,

_{K}*N*(μ

_{1}, σ) and

*N*(μ

_{2}, σ), for whether a channel was activated or not, which reduced to a gain

*g*and offset

*c*of the channel signal, as used originally by Gold and Shadlen (2001) for two-choice optimal decision making with SPRT. For application to the basal ganglia, these parameters were assumed identical over all alternatives, so that all channels represented a choice between the same two gaussian distributions. Then the overall decision was to infer which of the

*K*channels of data

*s*(1), …,

_{k}*s*(

_{k}*T*) had sample mean μ

_{1}, assuming that all other

*K*− 1 channels had mean μ

_{2}(and the same standard deviation σ). Considering an example of populations of cortical neurons tuned to distinct perceptual features, such as directions of aggregate motion in a visual drifting dots task (Britten, Shadlen, Newsome, & Movshon, 1993; Kim & Shadlen, 1999), perception then becomes the inference of which population fires with the highest mean (Gold & Shadlen, 2001). An appropriate unit of evidence for each alternative percept is the log-likelihood ratio (see equation 2.1) that a cortical population is responding versus not responding to the particular perceptual feature to which it is tuned (Bogacz & Gurney, 2007).

*H*is supported to a given reliability determined by whether its associated posterior is the first to rise above a threshold where we follow a convention in which the minus log posterior (which is positive) falls below a minus log probability threshold Θ

_{k}_{k}(which is also positive) that is specified for each channel. (Note that Bogacz and Gurney considered a single, common threshold, but their model and optimality arguments extend straightforwardly to multiple, distinct thresholds.)

^{1}The sign convention, equation 2.3, is consistent with the physiology of the basal ganglia outputs, which influence cortex by a process of focused disinhibition, corresponding to the channel with

*O*< Θ

_{k}_{k}at the level of the output nuclei. Using Bayes’ rule and assuming independent samples distributed according to the assumptions in the preceding paragraph, Bogacz and Gurney showed that these log posteriors are given by a simple formula based on accumulating the log-likelihood ratio, equation 2.1: This simple equation has two key components: an evidence accumulation term

*y*for each channel that implements a noisy race to threshold (Stone, 1960; Vickers, 1970) and a log-sum-exponential competition term that decreases the activity of all channels when there is no clear winner (Bogacz & Gurney, 2007). Both contributions are necessary for multiple alternatives, as then the decision making is known to be asymptotically optimal for fixed thresholds (Dragalin et al., 1999). Then the cost of making errors plus the cost per sample of delaying the decision is minimized as the sample number approaches infinity, to appropriately balance reaction speed against accuracy (Bogacz & Gurney, 2007).

_{k}### 2.2. Map Between the Basal Ganglia and MSPRT.

The functional architecture of the basal ganglia has been long understood to influence cortex by a process of focused disinhibition (Albin, Young, & Penney, 1989; Chevalier & Deniau, 1990; Mink, 1996; Redgrave, Prescott, & Gurney, 1999). In this process, the basal ganglia output nuclei fire tonically to inhibit cortical activity, but strong, focused activation of the striatum can locally reduce this tonic firing to release target thalamic areas and their corresponding cortical areas from inhibitory control. Within the basal ganglia, the direct pathway projects from striatum to the output nuclei via one population of inhibitory medium spiny neurons, while the indirect striatal pathway involves physiologically distinct population of inhibitory medium spiny neurons and passes through an inhibitory relay in the external globus pallidus (GPe). More recently, this architecture has been revealed as an oversimplification, with significant reciprocal connectivity between the various internal nuclei, including a pathway from STN onto GPe (see Redgrave et al., 2010). Furthermore, the subthalamic nucleus is now considered a major input station of the basal ganglia that receives projections from across cortex and routes through to the basal ganglia output nuclei via the so-called hyperdirect pathway (Nambu, Tokuno, & Takada, 2002), which we emphasize in this letter by considering the diffuse pathways from the STN to the internal (GPi) and external globus pallidus.

The main point Bogacz and Gurney (2007) make is that their instantiation of the MSPRT in equations 2.3 to 2.5 matches with various aspects of cortico-basal ganglia anatomy and function (see Figure 1). First, the cortex could compute the accumulated evidence representing a log-likelihood ratio, as proposed by Gold and Shadlen (2001). Second, the log-sum-exponential competition term could be computed via the STN-GPe complex in accordance with the STN's taking inputs from across cortex and projecting diffusely to the basal ganglia output nuclei (we refer to the original reference for further derivation and discussion of this correspondence: Bogacz & Gurney, 2007). Third, the striatum could then route the accumulated evidence to GPi/SNr for combination with the competition term from the STN-GPe complex, giving a centralized selection mechanism similar to that proposed previously (Redgrave et al., 1999; Prescott, Redgrave, & Gurney, 1999; Gurney, Prescott, & Redgrave, 2001). Finally, when these basal ganglia outputs fall below a fixed threshold, they represent disinhibition of thalamo-cortical motor area targets responsible for initiating action, as emphasized below equation 2.3. Treating the thalamus and cortex as a single unit does oversimplify the anatomy, for example, with respect to subcortical inputs (McHaffie, Stanford, Stein, Coizet, & Redgrave, 2005), but allows a range of evidence accumulation mechanisms to be treated similarly, such as via thalamo-cortical loops (Humphries & Gurney, 2002) or recurrent cortical networks (Lo & Wang, 2006; Wang, 2008). In addition, those pathways not included in Figure 1, such as dopaminergic innervation from the substantia nigra pars compacta and ventral tegmental area, have been proposed to give added functionality, such as reinforcement learning of the appropriate action to be elicited by a stimulus (Bogacz & Larsen, 2011). We describe this in section 4.

## 3. Optimal Decision Making for Arbitrary Alternatives

### 3.1. The Basal Ganglia and Generic Bayesian Sequential Analysis.

A general framework for decision making over multiple alternatives is to use generic sequential analysis in which Bayes’ rule is applied sequentially to input data distributed according to arbitrary probability distributions (Siegmund, 1985). Mathematically, this approach differs from Bogacz and Gurney's in several important aspects. First, the decision making uses evidence encoded as log likelihoods rather than the log-likelihood ratios in MSPRT. Second, the sensory inputs are no longer restricted to gaussian variables but can have arbitrary statistics. Third, nonflat priors can be included if desired. Fourth, there are no constraints between the alternatives, so the losing alternatives are not assumed identically distributed. This last generalization allows more general perceptual hypotheses to be represented in the model, such as partial activation of multiple channels of evidence leading to hypothesis selection, as may happen in multimodal perception.

Sequential analysis formalizes the decision making between alternatives as a competition between *K* hypotheses based on sequentially sampling multiple sensory inputs ** s** = (

*s*

_{1}, …,

*s*). We emphasize that the number of sensory inputs

_{M}*M*does not necessarily equal the number

*K*of evidence channels associated with the hypotheses. When there are more evidence channels than inputs (

*K*>

*M*), the formalism allows a choice between an arbitrary number of perceptual hypotheses. For example, tactile sensing could be considered as encoding a time series of contact pressures, but there are multiple types of texture that could be perceived (Lepora et al., 2012). Conversely, when there are more inputs than evidence channels (

*M*>

*K*), the sensory evidence is aggregated over the inputs to make a decision. For example, in two-choice (e.g., left versus right) motion discrimination experiments, the representation of visual motion is over ensembles of MT neurons, with each neuron having a distinct direction tuning (Mazurek, Roitman, Ditterich, & Shadlen, 2003).

*T*samples. Following the arguments from section 2.1, we consider that the minus log posterior must fall below a minus log probability threshold to trigger a decision: A distinct threshold is associated with each evidence channel, so that different costs for each hypothesis can be considered. For example, the hypotheses could correspond to actions with differing potential rewards for the agent. Furthermore, these decision thresholds could also vary over time to give optimal decisions that minimize the cost of making errors plus the cost of delays. For example, in deadline-dependent tasks, the optimal thresholds decrease over time to ensure the decision must be made by the deadline (Frazier & Yu, 2008). For general Bayesian sequential analysis, it is unlikely that optimal decisions can be obtained with fixed thresholds, and so we employ the threshold-crossing rule with the proviso that optimality may depend on a history-dependent threshold. Some biological considerations of distinct, time-dependent thresholds are considered in section 4.

*p*

_{1}

*p*

_{2}= exp(log

*p*

_{1}+ log

*p*

_{2}): Now we replace the samples with an independent and identically distributed (i.i.d.) sequence over time, giving an accumulation of the log likelihood over these times, Replacing the single samples

**in expression 3.3 with the multiple samples**

*s***(1), …,**

*s***(**

*s**T*) and substituting the relation 3.4 for accumulating the log likelihood over these samples results in the following algorithm for general Bayesian sequential analysis: The key point is that this general algorithm for Bayesian sequential analysis, equation 3.5, is identical to the original Bogacz-Gurney result, equation 2.4, even though the inputs

*y*now represent evidence encoded as general log likelihoods, equation 3.6, of the sensory data (rather than log-likelihood ratios of gaussians). Therefore, this sequential analysis model of decision making maps onto the basal ganglia architecture in an identical way to the original by Bogacz and Gurney (see Figure 1).

_{k}As a consequence of these arguments, we obtain a version of the original Bogacz-Gurney result, now applied to arbitrary probability distributions. Furthermore, the temporal accumulation for each channel *y _{k}*(

*T*) now starts from a log prior, extending the results to nonflat priors. The principal difference from the original model of Bogacz and Gurney is that the accumulated evidence sent to the striatum and STN now represents the log likelihoods of sensory data and a log prior, which we assume are together estimated by sensory cortex. Indeed, that cortex can function as a log-likelihood estimator has been proposed already for optimal decoding of neural populations (Jazayeri & Movshon, 2006).

### 3.2. The Basal Ganglia and Generic MSPRT.

A key difference between many models of animal perception and general Bayesian sequential analysis is that the former uses evidence representing log-likelihood ratios while the latter uses evidence representing log likelihoods. Given this significant departure in how the evidence is represented, it is surprising that the decision-making algorithms for calculating the log posterior decision variables from the accumulated evidence are identical (see equations 2.4 and 3.5). The aim of this section is to show that there is a simple correspondence between evidence encoded as log likelihoods and log-likelihood ratios. As a consequence, we derive a general version of Bogacz and Gurney's original result for evidence representing arbitrary log-likelihood ratios in a generic MSPRT.

*K*channels

**= (**

*s**s*

_{1}, …,

*s*) of sensory data, with the decision to infer the winning channel that has samples drawn from a probability distribution,

_{K}*p*

_{1,k}(

*s*) while the losing channels are drawn from distributions

_{k}*p*

_{2,j}(

*s*) with

_{j}*j*≠

*k*. Originally, these were the gaussian distributions reviewed in section 2.1. Assuming conditional independence, the likelihood that channel

*k*is the winner (denoted as hypothesis

*H*) with samples

_{k}*s*drawn from distribution

_{k}*p*

_{1,k}is When we use some straightforward algebra, the log likelihood for each hypothesis can be rewritten as a log-likelihood ratio and a hypothesis-independent term, This relation between log likelihoods and their ratios is sufficient to ensure the identity of the expressions 2.4 and 3.5 in the original Bogacz-Gurney version of MSPRT and general Bayesian sequential analysis. This is because the term

*C*(

**) is ignorable due to an invariance of the expressions, under shifting the baselines of all channels. Since the term**

*s**C*(

**) can be ignored in equation 3.10, evidence can be represented interchangeably as a log likelihood or log-likelihood ratio for hypotheses pertaining to probabilities of the form 3.7.**

*s**y*representing log-likelihood ratios and outputs

_{k}*O*that are log posteriors. Decision making via threshold crossing then gives a general MSPRT over arbitrary log-likelihood ratios. These decisions are also asymptotically optimal for the same reasons as the original (gaussian) Bogacz-Gurney model (Dragalin et al., 1999).

_{k}*p*

_{1,k}and losing distributions

*p*

_{2,k}for channel

*k*are gaussian but have channel-dependent means μ

_{1,k}and μ

_{2,k}respectively (for simplicity, we maintain the assumption of equal standard deviations σ), and we assume that only one channel is in a winning state. By the log-likelihood relation for two gaussians, equation 2.1, considered by Gold and Shadlen (2001), the appropriate input evidence for the decision-making algorithm, equations 3.11 and 3.12, is then where

*g*= (μ

_{j}_{1,j}+ μ

_{2,j})/σ

^{2}and

*t*>0. Note that there is also a channel-dependent prior

*x*(0) = log

_{j}*p*(

*H*), which when flat (equal) can be dropped from

_{j}*x*(

_{j}*t*) by the identity 3.10. The decision-making algorithm has thus reduced to that considered in equations 2.4 and 2.5 but with a channel-dependent gain

*g*. Further restricting the winning μ

_{j}_{1}and losing μ

_{2}means as channel-independent causes this gain

*g*to become equal for all channels, and we reobtain the original model of Bogacz and Gurney.

*p*

_{2,j}(

*s*) = 1 −

_{j}*p*

_{1,j}(

*s*) in equation 3.7, leads to the input evidence for the decision-making algorithm, equations 3.11 and 3.12, becoming which is the log odds for the samples

_{j}*s*being drawn from distribution

_{j}*p*

_{1,j}. This represents an equivalent formulation of the general Bayesian sequential analysis considered in section 3.1, but with the evidence accumulated over the log odds rather than the log likelihood.

Therefore, Bogacz and Gurney's original mapping between the cortico-basal ganglia anatomy and a network implementation of decision making by a simplified MSPRT is a special case of a more general functional mapping with the generic MSPRT in equations 3.11 and 3.12. In particular, the original assumption that the evidence in all channels represents a common log-likelihood ratio of the same two gaussians can be relaxed to encompass distinct ratios over arbitrary probability distributions, including the special case of the log odds. For the specific example of where these two distributions are both gaussian, the input evidence for the decision-making algorithm reduces to a gain function of the sensory data, equation 3.13, as in the original Bogacz-Gurney model. Another aspect of the general case is that the evidence may also begin accumulating from initial log priors. This generality over both the probability distributions and the priors implies that the original proposal that the basal ganglia and cortex implement optimal decision making over multiple alternatives (Bogacz & Gurney, 2007) applies not just to artificial sensory data represented by restricted gaussians, but in principle to arbitrary distributions of statistical stimuli found in practically realizable situations with arbitrary priors representing a preceding history of sensory experience

## 4. Discussion

This letter argues that the basal ganglia architecture appears configured for optimal decision making over multiple channels of sensory evidence under a wide range of assumptions about what the evidence represents. Our arguments rely on extending a previous model of action selection by Bogacz and Gurney (2007), who mapped the basal ganglia anatomy onto a network implementation of an optimal statistical procedure for testing multiple hypotheses restricted to constrained gaussian variables compared via a likelihood ratio. We show that their anatomical mapping also holds for this statistical test with generic probability distributions and unconstrained alternatives. In addition, we show that the same mapping holds for general Bayesian sequential analysis with evidence encoding log likelihoods or the log odds rather than log-likelihood ratios. All of these sequential tests are optimal in the sense that applying a threshold-crossing decision rule can minimize the total cost of making errors and delaying decisions (Wald & Wolfowitz, 1948; Dragalin et al., 1999).

In considering generic representations of evidence represented in cortex, the interpretations of basal ganglia function considered here should hold under practically realizable situations, such as with realistic neuronal firing, tuning curves, and population coding. Moreover, this functionality applies not just to evidence from sensory cortex but also to any evidence aiding perceptual decision making, including information from the motor and limbic cortices about the state of the animal. Our brains make perceptual judgments over a broad variety of stimuli across a range of task demands, and therefore it seems reasonable that a generic model is necessary to capture the computational processes underlying decision making.

In this section, we describe the relation of these findings to other models of basal ganglia function and discuss some aspects of their biological interpretation and implications.

### 4.1. Relation to Other Models of the Basal Ganglia and Cortex.

Models of the basal ganglia performing action selection conventionally assume there are multiple parallel channels representing evidence for distinct alternatives (Gurney et al., 2001; Frank, 2005, 2006; Humphries, Stewart, & Gurney, 2006; Bogacz & Gurney, 2007; Bogacz & Larsen, 2011; Ratcliff & Frank, 2012). The evidence is processed by the cortico-basal ganglia network to reach a decision about the appropriate action, which is enacted by focused disinhibition of the basal ganglia output nuclei onto the appropriate motor regions of thalamus and cortex (Albin et al., 1989; Chevalier & Deniau, 1990; Mink, 1996; Redgrave et al., 1999). Some of these models place emphasis on two particular pathways from cortex through the basal ganglia to its output nuclei (Gurney et al., 2001; Bogacz & Gurney, 2007): the direct striatal pathway, where activity in each channel gives a local off-center inhibition of the tonically firing output nuclei, and the STN hyperdirect pathway (Mink, 1996; Nambu et al., 2002), where conflict between channels gives a global on-surround excitation of the output nuclei. The convergence of these two pathways at the basal ganglia output nuclei gives a competitive off-center/on-surround mechanism for selection. This mechanism fits parsimoniously with the model of probabilistic selection considered in this letter because then the STN-GPe complex normalizes the probabilities represented at the basal ganglia output nuclei to sum to unity (Bogacz & Gurney, 2007; Bogacz & Larsen, 2011). Thus, hard selection with a single winner-take-all is obtained for probability thresholds greater than one-half, and soft selection with multiple possible winners is obtained for thresholds less than one-half.

The basal ganglia model in this letter follows an approach of modeling neural decision making that emphasizes optimal sequential tests for deciding between competing perceptual hypotheses with sensory evidence (Gold & Shadlen, 2001, 2007; McMillen & Holmes, 2006; Bogacz et al., 2006; Yang & Shadlen, 2007; Churchland, Kiani, & Shadlen, 2008; Beck et al., 2008). An underlying assumption in these models is that the samples of data are i.i.d. over time. Although this assumption limits the methods for inference, it also simplifies the decision process to a straightforward evidence accumulation over time. Similar models of evidence accumulation to threshold have been used widely in the neuroscience and psychology of perceptual decision making, for example the race model (Smith & Vickers, 1988) and diffusion model (Ratcliff, 1978; Smith & Ratcliff, 2004) of two-choice discrimination tasks. However, if neural decision making does use time dependencies in the sensory data, then these i.i.d. model assumptions should be reassessed. For example, one generalization would be to prefilter the sensory inputs to identify relevant statistical dependencies before applying the optimal sequential test to the filtered data.

Because this letter extends the optimal decision-making model of the basal ganglia by Bogacz and Gurney (2007) to general representations of sensory evidence, the formalism inherits many of the advantages of their model while having the potential to address further questions. For instance, these decision-making models relate to a previous neural network account of basal ganglia function that exhibited appropriate action selection and switching properties (Gurney et al., 2001), and was later extended to include physiological details of thalamo-cortical loops (Humphries & Gurney, 2002), populations of spiking neurons (Humphries et al., 2006), and the role of tonic dopamine (Humphries, Khamassi, & Gurney, 2012). In addition, the diffuse projection of the STN onto the GPi basal ganglia output nucleus relates to proposals that it dynamically modulates the propensity to elicit actions by the degree of conflict (Mink, 1996; Nambu et al., 2002; Frank, 2006; Forstmann et al., 2010). In our model of optimal decision making, the evidence in every channel is appropriately decreased to delay reaching the decision thresholds so that sufficient evidence can be collected to disambiguate conflicting perceptual hypotheses. Only for the particular log-sum-exponential functional form derived for the STN-GPe network can the decisions be optimal (Bogacz & Gurney, 2007), in that the evidence decrement due to ambiguity slows the decision sufficiently to minimize the overall cost of errors and delays.

### 4.2. Implications for Neural Encoding.

A key aspect of the optimal decision-making algorithm considered here is that it is agnostic to whether the input evidence is represented as a log likelihood, log-likelihood ratio, or log odds, provided the compared inputs are in the same representation. All of these types of evidence underlie leading proposals for the cortical encoding of sensory data: Gold and Shadlen (2001) and Yang and Shadlen (2007) for log-likelihood ratios; Jazayeri and Movshon (2006) and Graf et al. (2011) for log likelihoods; and Deneve (2008) and Zhang and Maloney (2012) for log odds. Thus, if it transpires that cortex uses all three methods to encode evidence, we have shown that the basal ganglia can lead to optimal decisions using all three representations. Fundamentally, if the basal ganglia and cortex make optimal decisions with a procedure based on sequential analysis, then it may be somewhat meaningless to draw a distinction between these probabilistic representations anyway. By the identity 3.8, a log-likelihood ratio (for a channel winning versus it losing) is equivalent to a log likelihood (for a channel winning and all others losing), plus a hypothesis-independent term that can be ignored due to an invariance of the decision-making algorithms (see below). Similarly, by equation 3.13, the log odds can be considered a special case of the log-likelihood ratio, which by the above identity is equivalent to a log likelihood. Hence, these probabilistic representations can be used interchangeably, and it is just a matter of convention which one is adopted as an interpretation.

If cortex represents evidence as log likelihoods, log-likelihood ratios, or log odds, then an apparent issue is that these quantities can be negative, whereas neural activity is a positive quantity. However, as observed originally by Bogacz and Gurney (2007), the specific form of the decision-making algorithm solves this problem: provided all input evidence is shifted up or down (renormalized) by the same amount, then by equation 3.10, the output decision variables remain unchanged. As a consequence, the input neural-activity can be considered a shifted version of the evidence, such that this renormalized quantity always remains positive. This mechanism also gives added flexibility in how signals are represented in cortex, because the shift in neural activity can change over time. Thus, if the neural activity in some channels is becoming small, then all channels can be shifted to renormalize the activity within an appropriate dynamic range.

### 4.3. Other Anatomical Mappings.

In this article, we concentrated on mapping sequential analysis onto the feedforward architecture for the basal ganglia considered by Bogacz and Gurney (2007). Sensory inputs are integrated in the cortex to enter the basal ganglia via the striatum and STN and leave via the basal ganglia output nuclei for action disinhibition (see Figure 1). This architecture is here intended as a basic sketch of the actual connectivity to give a simple foundation for constructing anatomically more realistic models. To this end, we now describe two alternative anatomical mappings that are consistent with the arguments presented in this article.

A recurrent basal ganglia architecture has been considered by Bogacz and Larsen (2011) based on the cortico-basal ganglia-thalamo-cortical loops found in anatomical tracing studies (Parent & Hazrati, 1995). Rather than integrating the evidence within cortex, as considered originally (Gold & Shadlen, 2001; Bogacz & Gurney, 2007), they employed the overall loop from the cortex through the basal ganglia back to cortex to implement an iterative update of the basal ganglia outputs up to a decision threshold (see Figure 3 in Bogacz & Larsen, 2011). They then showed that this recurrent circuit can implement optimal decision making (for gaussian inputs) and extended their model to include reinforcement learning. Note that their decision-making model uses the same gaussian-restricted MSPRT as the feedforward model by Bogacz and Gurney (2007), for which the general arguments presented here apply. Hence, general Bayesian sequential analysis also maps consistently onto their recurrent basal ganglia architecture, with similar considerations about the types of evidence representation and role of the STN-GPe complex.

A second alternative architecture proposed here is a feedforward network (see Figure 2) that differs from the original Bogacz-Gurney model in the following way. Rather than the evidence being sent via the striatum to combine with the STN-GPe competition term in the basal ganglia output nuclei, the evidence is routed to another area of cortex where this combination takes place. This frees the striatum up for another function, which we interpret as selectively biasing the decisions by shifting the thresholds in response to contextual information about the task demands. Our motivation for proposing this architecture is that the original model by Bogacz and Gurney (2007) lacks a structure for implementing decision thresholds, the primary task-dependent parameters for setting system behavior. A striatal pathway that sets decision thresholds would extend the scope of the basal ganglia model to take advantage of the full power of the sequential analysis formalism. A second benefit is that the accumulation mechanisms are now explicitly cortical, similar to evidence accumulation models of intracortical decision making (Gold & Shadlen, 2001, 2007); meanwhile, the biasing of the decision thresholds is striatal, as is consistent with cortico-striatal connections controlling speed and accuracy in perceptual decision making (Forstmann et al., 2008, 2010) and reward-related signals modulating decision processes by their effect on the striatum (Reynolds, Hyland, & Wickens, 2001; Schultz, 2002).

The connectivity of the basal ganglia and cortex in Figure 2 is consistent with a range of anatomical studies. For instance, the intrinsic basal ganglia connectivity is similar to that used throughout this letter and is founded on leading studies of the basal ganglia (see section 2.2). Meanwhile, excitatory forward projections from sensory to motor cortex are a standard feature of intracortical connectivity (Jones, Coulter, & Hendry, 1978), and inhibitory projections from the basal ganglia output nuclei to motor thalamus and then cortex are also standard (Albin et al., 1989; Chevalier & Deniau, 1990). In addition, the striatum is known to receive excitatory input from across cortex (Gerfen & Wilson, 1996), allowing it to process a range of sensory, motor, and contextual information.

Overall, Figure 2 models the basal ganglia as functioning together with cortex to ensure that decisions are optimal with respect to the costs for the performed task as represented by the decision thresholds. These thresholds have both cortical and striatal components, with the striatal part recalled from contextual sensory and motor information about the state of the animal and its environment (see section 4.5). Note that the role of the STN-GPe complex is key to ensuring consistent action selection with distributed decision making in cortex (Redgrave et al., 1999; Prescott et al., 1999). Continuing the reasoning from section 4.1, the pathway from cortex through STN-GPe to GPi and back to thalamus and cortex ensures that cortical representations of probability are self-consistently normalized to have unit sum. In consequence, this pathway can enforce hard winner-take-all selection even when the decision making is distributed across functionally separated regions of cortex.

The basal ganglia model in Figure 2 also contains the direct and indirect striatal pathways (Albin et al., 1989). In our model, the indirect pathway via the GPe is considered distinct from the STN-GPe network. Although there is debate about the anatomy of the globus pallidus, there is scant evidence for this separated connectivity, with a more integrative role for the nucleus considered likely. In this work, we follow previous interpretations of striatal modulation via the direct and indirect pathways as relating to a focused go and no-go mechanism for biasing the selection of individual actions (Frank, 2005). However, we do not rule out more elaborate mechanisms for setting decision thresholds, such as a striatal-GPe-STN global NoGO pathway for striatal suppression of all actions.

### 4.4. Integration with Reinforcement Learning.

Decision making and reinforcement learning can be viewed as complementary processes underlying action selection (Bogacz & Larsen, 2011). Reinforcement learning theories describe how to learn which action to select for a given stimulus to maximize reward (Montague, Dayan, & Sejnowski, 1996; Schultz et al., 1997), whereas decision-making theories describe how to select the action corresponding to the stimulus best supported by the incoming sensory evidence (Redgrave et al., 1999; Gurney et al., 2001). We now discuss some implications of integrating reinforcement learning and decision-making theories of the basal ganglia.

The recurrent model of optimal decision making (for gaussian inputs) considered by Bogacz and Larsen (2011) has been integrated with reinforcement learning. To summarize, their model uses an actor-critic formalism (Sutton & Barto, 1998) mapped on the matrix/patch physiology of the striatum (Doya, 2000). The actor implements an iterative update of the basal ganglia outputs via recurrent cortico-basal ganglia-thalamo-cortical loops, while the critic reinforces cortico-striatal weights to connect a sensory channel with a decision to act. Then the decision making is optimal when all weights reach zero or one. For more details we refer to Bogacz and Larsen (2011), noting that their recurrent architecture is compatible with the general formalism presented here on optimal decision making (see the previous section).

The feedforward architecture proposed in Figure 2 can also be integrated with reinforcement learning, but with a new interpretation of the cortico-striatal weights as connecting the task context with the decision thresholds. Reinforcement learning of these weights then biases the selection of one action over another, while the decision making remains optimal throughout learning. In sequential analysis, these decision thresholds are the primary parameters for setting system behavior, and thus it is appropriate that they be learned by reinforcement from the agent's environment. The direct striatal pathway is then interpreted as transmitting negative changes to the individual decision thresholds for the perceptual hypotheses, which learns biases toward appropriate actions; meanwhile, the indirect striatal pathway is interpreted as transmitting only positive changes, which learns biases against inappropriate actions.

This model of biasing the decision thresholds has much in common with a go/no-go interpretation of the direct and indirect pathways and its relation to reward (Frank, 2005, 2006; Simen, Cohen, & Holmes, 2006; Ratcliff & Frank, 2012). The go population of striatal neurons facilitates the selection of a particular response by inhibiting the associated channels in the basal ganglia output nuclei, thereby releasing the inhibition on the thalamus; meanwhile, the no-go striatal population suppresses the selection of particular responses by a double inhibitory pathway via the GPe onto the basal ganglia output nuclei, thereby preventing the release of inhibition on the thalamus. Hence, we expect that reinforcement learning in our basal ganglia model of optimal decision making proceeds in a qualitatively similar way to existing studies of phasic dopamine in cortico-striatal plasticity (Frank, 2005, 2006; Frank et al., 2009), although it remains an open question as to how these mechanisms should be applied to achieve decision optimality under general task demands.

### 4.5. Decision Thresholds and Optimality.

The decision thresholds are the primary task-dependent parameters for setting the system behavior in terms of specifying the balance between decision speed and accuracy in sequential analysis models of decision making (Gold & Shadlen, 2007; Bogacz et al., 2006; Lepora et al., 2012). Setting low thresholds gives quick but inaccurate decisions, for example, in urgent situations such as evading predators; conversely, high thresholds give slow decisions with fewer mistakes, for example, in dangerous situations such as climbing.

Optimal decision-making models minimize the overall cost of the outcome, such as the total cost of delaying decisions and making mistakes. However, there are some caveats to this optimality. For two alternatives, both SPRT and Bayesian sequential analysis (which are then equivalent) give optimal decisions for constant thresholds and no deadline (Wald & Wolfowitz, 1948), while with a deadline, the optimal decision threshold decreases over time to ensure a decision is made sufficiently early (Siegmund, 1985; Frazier & Yu, 2008). For multiple alternatives, MSPRT with fixed thresholds gives optimal decisions as the error rate tends to zero at infinite decision times (Dragalin et al., 1999). However, humans and animals make decisions from limited durations of sensory data and are error prone, for which it is unknown how close MSPRT with fixed thresholds is to optimality (and what would then be the form of the thresholds that give optimality). Also, to the best of our knowledge, there is no analogous result for the thresholds leading to optimal Bayesian sequential analysis. If animals do make near-optimal decisions over general perceptual hypotheses, this raises the question of how such history-dependent optimal thresholds could be implemented in the brain.

One possibility is that history-dependent thresholds occur via feedback mechanisms in the underlying biology. For example, reafferent cortico-basal ganglia-thalamo-cortical loops could naturally give feedback processes that lead to varying thresholds, while the sensory evidence itself could give a time-varying context for setting the threshold throughout the decision process. It is an open theoretical question how the thresholds should vary to achieve optimal decision making, and we conjecture that the diverse inputs to striatum convey the necessary sensory, motor, and other contextual information (shown by the three cortico-striatal connections on Figure 2). That the striatum uses many distinct sources of information is consistent with recordings from the caudate nucleus of multiple signals for influencing and assessing perceptual decisions (Ding & Gold, 2010) and the existence of diverse striatal inputs from across cortex (Jones, Coulter, Burton, & Porter, 1977). Understanding how these sources of information influence perceptual decision making and their relation to achieving decision optimality could shed light on the normal and pathological function of the basal ganglia.

## Acknowledgments

We thank the anonymous referees and are grateful to Tony Prescott, Javier Caballero, Mark Humphries, and other members of the Adaptive Behavior Research Group and Active Touch Laboratory at Sheffield for discussions. This work was supported by FP7 grants BIOTACT (ICT-215910), IM-CLEVER (ICT-231722), and EFAA (ICT-248986).

## References

## Note

^{1}

In general, our notation is based on Bogacz and Gurney's, but with minor differences such as their *x* is our *s*, we use *O* for their *OUT*, and we consider a distinct threshold Θ_{k} for each channel.