## Abstract

Experimental data indicate that perceptual decision making involves integration of sensory evidence in certain cortical areas. Theoretical studies have proposed that the computation in neural decision circuits approximates statistically optimal decision procedures (e.g., sequential probability ratio test) that maximize the reward rate in sequential choice tasks. However, these previous studies assumed that the sensory evidence was represented by continuous values from gaussian distributions with the same variance across alternatives. In this article, we make a more realistic assumption that sensory evidence is represented in spike trains described by the Poisson processes, which naturally satisfy the mean-variance relationship observed in sensory neurons. We show that for such a representation, the neural circuits involving cortical integrators and basal ganglia can approximate the optimal decision procedures for two and multiple alternative choice tasks.

## 1. Introduction

Making choices on the basis of sensory information is one of the critical abilities for animals and humans. Neurophysiological studies suggest that during perceptual decision making, the neurons in sensory areas provide noisy evidence represented in their firing rates (Britten, Shadlen, Newsome, & Movshon, 1993; Newsome & Paré 1988; Shadlen & Newsome, 1998). Experimental studies further indicate that during the decision process, this sensory evidence is accumulated over time by neural integrators in certain cortical areas (Kim & Shadlen, 1999; Schall, 2001; Shadlen & Newsome, 2001), and the decision is made as soon as the activity of the integrator neurons exceeds a certain threshold (Mazurek, Roitman, Ditterich, & Shadlen, 2003; Roitman & Shadlen, 2002; Schall, 2001, 2002).

On the basis of these and other data, theoretical studies have proposed that during the decision process, the brain approximates optimal statistical tests: the sequential probability ratio test (SPRT) (Barnard, 1946; Wald, 1947) during the choice between two alternatives (Bogacz et al., 2006; Gold & Shadlen, 2001, 2002, 2007; Laming, 1968) and the multihypothesis SPRT (MSPRT) (Baum & Veeravalli, 1994; Dragalin, Tartakovsky, & Veeravalli, 1999) during the choice between multiple alternatives (Bogacz & Gurney, 2007). These tests provide an optimal criterion for when to stop the integration of sensory information and execute an action. In this article, we refer to decision-making models as optimal if they minimize decision time for any level of accuracy (Dragalin et al., 1999; Wald & Wolfwitz, 1948; see section 2 for details).

All the theoretical studies mentioned above rely on two assumptions: (1) in each moment of time, sensory evidence in support of each alternative comes from a gaussian distribution, and (2) the distributions of the evidence supporting all alternatives have the same variance. However, both of these assumptions are oversimplified since sensory neurons transmit information as discrete spikes, and the variance of neuronal firing rate is proportional to its mean in sensory areas (Shadlen & Newsome, 1998).

This article analyzes whether the conclusions of the theoretical studies, suggesting that the neural decision circuits approximate the SPRT or the MSPRT, still hold when more realistic assumptions are made regarding the representation of evidence in sensory areas. It shows that indeed, simple integration strategies can implement the optimal statistical tests in two and multiple alternative tasks if it is assumed that evidence is represented by spikes of sensory neurons and the variance of neuronal firing rate is proportional to its mean.

The article is organized as follows. Section 2 reviews the current theories proposing that the brain approximates the optimal statistical tests for decision making. Section 3 formalizes the decision-making problem on the basis of spiking inputs. Sections 4 and 5 describe the models of neural circuits that implement the optimal decision procedures for two and multiple alternatives, respectively. Section 6 demonstrates that the simulations of decision networks that approximate spiking input with a gaussian variable with variance proportional to the mean may generate surprising spurious results under certain conditions. Section 7 discusses the directions of future work. Mathematical details are contained in appendixes.

## 2. Review of the Theory of Optimal Decision Making

### 2.1. Neurophysiology of Decision.

In a typical motion discrimination task used to study the neural basis of decision, an animal is presented with a stimulus consisting of a circular display filled with moving dots (Britten, Shadlen, Newsome, & Movshon, 1992, 1993). A certain proportion of the dots moves coherently in one direction, while the other dots move randomly. The animal has to identify the direction of coherent motion of the majority of dots and indicates its response by making a saccade in the corresponding direction. In this article, we focus on the reaction time version of this task (Roitman & Shadlen, 2002), in which the animal is free to indicate its choice at any time after stimulus onset.

During this task the firing rates of motion-sensitive neurons in area MT encode the amount of coherent motion in the neurons’ preferred direction at a particular moment of time (Britten et al., 1993). Hence, these neurons provide evidence supporting alternative choices. On the basis of experimental data, it has been proposed that the evidence for a particular direction of motion is integrated over time by neurons in cortical areas (including the lateral intraparietal area and the frontal eye field) involved in controlling eye movement (Schall, 2001; Shadlen & Newsome, 2001), and the decision is made as soon as the firing rate of integrator neurons corresponding to a particular alternative exceeds a certain decision threshold (Kiani, Hanks, & Shadlen, 2008; Roitman & Shadlen, 2002).

### 2.2. Decision Problem.

For a decision task with *N* alternatives (*N* ⩾ 2), let us denote the firing rate of sensory neurons (e.g., in area MT) selective for alternative *i* at time *t* by *x _{i}*(

*t*). Most of the theoretical work on decision making assumes that

*x*(

_{i}*t*) come from a gaussian distribution with mean μ

_{i}and the same standard deviation σ for all alternatives (Bogacz et al., 2006; Bogacz & Gurney, 2007; Bogacz, Usher, Zhang, & McClelland, 2007; Gold & Shadlen, 2001, 2002; McMillan & Holmes, 2006; Shadlen & Newsome, 1998; Usher & McClelland, 2001). The problem of decision making can be simply posed as the identification of which μ

_{i}is the highest. The closer μ

_{i}to each other and the higher σ, the more difficult the decision, and these theoretical works for simplicity assumed that the difficulty remains fixed between trials. The decision problem can be formulated as deciding between

*N*statistical hypotheses

*H*, each stating that μ

_{i}_{i}is the highest (see appendix A for a precise statement of the hypotheses).

*x*(

_{i}*t*) at any particular moment of time

*t*would produce many incorrect choices. Hence, solving decision problems accurately requires sampling of

*x*(

_{i}*t*) for a period of time. Such sampling may involve the accumulation of the evidence over time: and choosing the alternative with the highest

*y*(

_{i}*t*). Under this strategy, the longer the integration period, the more accurate the answer.

However, this description leaves open the central question of when to stop the integration and make a response. The simplest solution to this question is to make a decision as soon as any *y _{i}*(

*t*) exceeds a preassigned threshold. This strategy is known as the race model (Vickers, 1970), but it is not the optimal one.

### 2.3. Optimal Decision-Making Theory.

*N*= 2) is provided by the SPRT. Let us denote the entirety of sensory evidence available up to time

*t*by

*input*(

*t*) = {

*x*(

_{i}*s*):1 ⩽

*i*⩽

*N*, 1 ⩽

*s*⩽

*t*}. To implement the SPRT, the ratio of the conditional probabilities of

*input*(

*t*) given the two hypotheses being correct is computed at each time step: If in a given step the ratio of equation 2.2 is above threshold α, then hypothesis

*H*

_{1}is accepted; if it is below a lower threshold 1/α, then

*H*

_{2}is accepted; otherwise the decision process continues, and the next sample of sensory evidence is observed. It has been shown that for the assumptions about the distributions of

*x*(

_{i}*t*) given in section 2.2, the logarithm of equation 2.2 is proportional to the difference

*y*

_{1}(

*t*) −

*y*

_{2}(

*t*); hence, the SPRT can be implemented by a simple procedure in which a decision is made as soon as

*y*

_{1}(

*t*) −

*y*

_{2}(

*t*) exceeds a positive or a negative threshold

*z*or −

*z*(Gold & Shadlen, 2001, 2002; Laming, 1968). This argument is reiterated in appendix A. It has been demonstrated that this simple procedure (and thus the SPRT) can be approximated in various simple neuronal architectures (Bogacz et al., 2006; Brown et al., 2005; Mazurek et al., 2003; Shadlen & Newsome, 1998; Usher & McClelland, 2001; Wong & Wang, 2006). Furthermore, the diffusion model, which assumes that the brain employs this procedure during decision making, has been very successful in describing patterns of reaction times from a wide range of 2AFC tasks (Laming, 1968; Ratcliff, 1978, 1988; Ratcliff & McKoon, 2008).

As mentioned in section 1, the SPRT is optimal in the following sense: among all the methods of decision making giving a certain probability of error, it minimizes the expected number of samples required (Wald & Wolfwitz, 1948), or, in other words, for any accuracy level, it minimizes the average decision time (DT). Hence, for example, when both the SPRT and the race model are used to identify which of two inputs has higher mean, and in both models decision thresholds are set to give the same accuracy, the SPRT will give the lower average DT.

What is the ecological relevance of this optimality property? In many situations, animals aim at maximizing their total reward and need to choose threshold *z*, which balances the demands of speed and accuracy. In a wide range of tasks involving sequences of choices in which rewards can be obtained for correct choices, there exists a unique threshold *z _{o}* maximizing the rate of receiving rewards (Bogacz et al., 2006). Importantly, the optimality property of the SPRT (i.e., minimization of DT for any accuracy) implies that the SPRT with threshold

*z*maximizes the reward rate (Bogacz et al., 2006; Bogacz, 2009). Thus, for example, the SPRT with threshold

_{o}*z*yields a higher reward rate than the race model with any threshold. An algorithm allowing rapid learning of

_{o}*z*has been proposed (Simen, Cohen, & Holmes, 2006), and recent experimental data suggest that a significant fraction of human participants indeed set the threshold close to

_{o}*z*(Simen et al., in press; Bogacz, Hu, Holmes, & Cohen, in press).

_{o}Let us describe how the above implementation of SPRT can account for the observation that accuracy decreases with choice difficulty. In the general formulation of SPRT in equation 2.2, the value of threshold α alone determines the accuracy (Wald, 1947). However, in the SPRT implementation in which a choice is made when *y*_{1}(*t*) − *y*_{2}(*t*) exceeds thresholds *z* or −*z*, the accuracy depends on both *z* and the difficulty of the choice. This is because *y*_{1}(*t*) − *y*_{2}(*t*) is proportional to the logarithm of equation 2.2 with a proportionality constant depending on task difficulty (see appendix A). Thus, making a choice when *y*_{1}(*t*) − *y*_{2}(*t*) exceeds a fixed threshold *z* or −*z* corresponds to making a choice when the likelihood ratio of equation 2.2 exceeds a threshold that depends on task difficulty.

Finally, it is interesting to point out that to minimize DT for any accuracy (or to maximize the reward rate), the exact value of the likelihood ratio of equation 2.2 does not need to be known—only a quantity (*y*_{1}(*t*) − *y*_{2}(*t*)) that is proportional to it with an unknown proportionality constant (depending on task difficulty).

*N*-alternative forced-choice (NAFC) tasks (

*N*>2), there are two tests known as the MSPRT that are asymptotically optimal; they minimize DT for a fixed probability of error when this probability is asymptotically small (Dragalin et al., 1999). Furthermore, it has been demonstrated numerically that both MSPRTs give faster DT than simpler decision strategies for larger probabilities of error (Bogacz & Gurney, 2007; McMillan and Holmes, 2006). One of the MSPRTs computes the conditional probability

*p*(

_{i}*t*) =

*P*(

*H*|

_{i}*input*(

*t*)) and the choice is made as soon as any of

*p*(

_{i}*t*) exceeds a threshold (Dragalin et al., 1999). From Bayes’ theorem, we have For the assumptions about the distributions of

*x*(

_{i}*t*) given in section 2.2, the logarithm of

*p*(

_{i}*t*), which we denote

*l*(

_{i}*t*) = log

*p*(

_{i}*t*), is equal to (Bogacz & Gurney, 2007) where

*g*is a constant depending on task difficulty (however, its value little affects the performance; see Bogacz & Gurney, 2007). Thus, the MSPRT can be implemented by a procedure that at each moment of time

*t*and for each alternative

*i*computes

*l*(

_{i}*t*), and the decision is made as soon as any

*l*(

_{i}*t*) exceeds a certain decision threshold.

Bogacz and Gurney (2007) have shown that if *y _{i}*(

*t*) are provided by cortical integrators, then equation 2.4 may be computed in a neuronal network with striking similarities to the anatomy of the basal ganglia. In particular, in their model, the second term in equation 2.4 is computed by a network involving the subthalamic nucleus and the external part of the globus pallidus. This model is supported by data showing that the experimentally observed relationships between the input and the firing rate of neurons in these two basal nuclei closely match those required to compute the second term in equation 2.4 (Hallworth, Wilson, & Bevan, 2003; Nambu & Llinas, 1994; Wilson, Weyrick, Terman, Hallworth, & Bevan, 2004). Hence, Bogacz and Gurney (2007) proposed that the MSPRT may be approximated by the cortico-basal ganglia circuit.

## 3. Decision Problem for Spiking Inputs in a General Task

For an NAFC task, let us assume that there exist *N* populations of sensory neurons supporting *N* alternatives, and each population *i* (*i* = 1, 2, 3, …, *N*) consists of *M* (*M* ⩾ 1) homogeneous sensory neurons with the same mean firing rate μ_{i}. For consistency with section 2, let us denote the total number of spikes produced by the *j*th (*j* = 1, 2, 3, …, *M*) neuron supporting alternative *i* in the time interval [0, *t*] by *Y*_{i,j}(*t*).

Let us further assume that *Y*_{i,j}(*t*) come from independent Poisson processes with intensities λ_{i}. Recall that for a Poisson process with intensity λ_{i}, the probability of the total number of spikes during [0, *t*] has Poisson distribution with mean λ_{i}*t* and variance λ_{i}*t*, which is consistent with the observation that the variance of the neural firing rate is proportional to its mean. As the ideal observer of sensory neurons knows the timing of all spikes, the problem of decision making can be simply defined as finding which λ_{i} is the highest on the basis of the observed spikes of all sensory neurons. The closer λ_{i} to each other, the more difficult the decision is. Following the previous theoretical works, we assume for simplicity that the difficulty remains fixed between trials.

## 4. Optimal Choice Between Two Alternatives

For 2AFC tasks (*N* = 2) with spiking inputs, the decision problem is simplified to identifying which one of the two sensory populations has the higher mean firing rate, that is, which of λ_{1} and λ_{2} is higher. Since the interspike interval of a Poisson spike train is irregular, any decision strategy based on a single spike would produce many incorrect choices. Alternatively, an integration strategy similar to that for continuous gaussian inputs may be applied that counts the number of spikes for a period of time to obtain a more accurate estimation of the sensory evidence.

Here *Y*_{1}(*t*) and *Y*_{2}(*t*) denote the total number of spikes from the two sensory populations until time *t*, respectively. The decision is made when one of the integrators reaches a decision threshold. Since the Poisson counter-model is based on accumulating evidence separately for each alternative, it can be considered as a discrete analog of the race model (Vickers, 1970). Hence for consistency, we will refer to the Poisson counter model as the spiking race model. However, the spiking race model is not optimal, as it does not implement the SPRT.

*Y*

_{1}(

*t*) and

*Y*

_{2}(

*t*) are defined in equation 4.1. The procedure continues until

*Y*(

*t*) exceeds a positive or a negative threshold ±

*z*(

*z*>0), and selects the first alternative if

*Y*(

*t*) exceeds

*z*, or the second alternative if

*Y*(

*t*) is below −

*z*. The decision time (DT) is defined as the time latency to reach one of the thresholds.

The SPRT for spiking inputs can be performed in an incremental fashion. The decision process starts with *Y*(0) = 0 at *t* = 0. When any neuron from the first population produces a spike, *Y*(*t*) is increased by 1. Analogously, if any neuron from the second population produces a spike, *Y*(*t*) is decreased by 1. Hence the SPRT for spiking inputs can be implemented by accumulating the difference of spike counts supporting the two alternatives, which is exactly the same procedure as for the gaussian inputs. Hence if the sensory evidence is represented by the Poisson spike trains, the optimal decision strategy can still be approximated by the previously proposed neural networks (Bogacz et al., 2006; Brown et al., 2005; Mazurek et al., 2003; Shadlen & Newsome, 2001; Usher & McClelland, 2001; Wong & Wang, 2006). We hereafter refer to the model in which *Y*(*t*) is integrated until it reaches one of the two thresholds as the spiking SPRT model. In the rest of this section, we investigate the performance of this model.

The time courses of the decision variable *Y*(*t*) on correct and error trials are shown in Figure 1a. Note that *Y*(*t*) has discrete values since it represents the difference of spike counts of the two sensory populations. For λ_{1}>λ_{2}, the population supporting the first alternative is likely to produce more spikes than the population supporting the second alternative in a unit of time. Hence, *Y*(*t*) is more likely to be positive and reach the upper threshold *z*, resulting in correct choices. But the irregular spike trains from the Poisson processes could also drive *Y*(*t*) to the lower threshold −*z*, resulting in incorrect choices. The DT distributions are heavily skewed toward longer times, as observed in experimental data (Luce, 1986; Ratcliff, 1978; Ratcliff & Smith, 2004).

To evaluate the performance of the spiking SPRT model, Figure 1b compares the mean DT of the spiking SPRT model with that of the spiking race model for different levels of accuracy. The spiking SPRT model indeed achieves shorter DT than the spiking race model, as predicted by the optimality of the SPRT (Barnard, 1946; Wald & Wolfwitz, 1948).

*z*: where λ

_{+}= max(λ

_{1}, λ

_{2}), and λ

_{−}= min(λ

_{1}, λ

_{2}). The accuracy of the spiking SPRT model increases as the task becomes easier (i.e., λ

_{−}/λ

_{+}decreases) or as the threshold

*z*becomes larger. Note that as

*z*→ ∞, the accuracy of the spiking SPRT model converges to 100%, and DT goes to infinity. Moreover, the DT of the spiking SPRT model decreases monotonically as

*M*increases. By contrast, the accuracy is independent of the size of the sensory populations

*M*. This counterintuitive result comes from the fact that although for larger sensory populations the decision threshold would be reached faster, the probability of reaching one particular threshold (i.e., the probability of making a correct or incorrect decision) remains unchanged. However, it is important to caution that this independence of accuracy from

*M*holds for the simplified assumption we made that the sensory neurons are independent. For a more realistic situation of correlated firing of sensory neurons (Zohary, Shadlen, & Newsome, 1994) the accuracy is unlikely to be independent of

*M*, but we were unable to find the analytic expressions for the accuracy and DT for correlated inputs.

Figure 2 shows the accuracy and DT of the spiking SPRT model with different levels of sensory inputs. In order to model more biologically realistic inputs, the mean firing rates of the two sensory populations are selected on the basis of neural recording data during the motion discrimination task (Figure 2 in Britten et al., 1993; and cf. section 2.1). For each motion coherence level, λ_{1} and λ_{2} are set to be the mean firing rates of the selected MT neuron when the stimulus moves toward the neuron's preferred and null directions, respectively. The simulation results are shown to correspond closely to the theoretical predictions from equations 4.3 and 4.4. As the motion coherence increases from 2.0% to 51.2%, the change of the accuracy and mean DT of the spiking SPRT model shows the same pattern as observed in the experimental data (Roitman & Shadlen, 2002). For low coherence, since λ_{1} and λ_{2} are similar, the spike counts supporting the two alternatives would also be similar. As a result, *Y*(*t*) needs a relatively long time to reach the thresholds, producing more errors and large DT. Conversely, if λ_{1} and λ_{2} are distinct from each other (i.e., large coherence), the difference of the spike counts supporting the two alternatives quickly reaches the thresholds, resulting in more accurate choices and small DT.

## 5. Optimal Choice Between Multiple Alternatives

### 5.1. Optimal Decision Procedure for a General Task.

In this section we consider an NAFC task (*N*>2) and assume that the population supporting the correct alternative has mean firing rate λ_{+}, while all other populations supporting the *N* − 1 incorrect alternatives have the same mean firing rate λ_{−}, where λ_{+}>λ_{−}. An analogous assumption has been made in previous models of NAFC task with gaussian inputs (Bogacz & Gurney, 2007; Bogacz et al., 2007; McMillan & Holmes, 2006). For consistency, we hereafter set λ_{1} = λ_{+} (i.e., λ_{i} = λ_{−} for *i* ≠ 1) and assume the first alternative is the correct choice.

_{i}, the MSPRT can be implemented by the same procedure as for gaussian inputs. Specifically, the MSPRT requires

*N*neural integrators that count the total number of spikes from neurons supporting each alternative. Let us denote the activity of the integrator

*i*at time

*t*by

*y*(

_{i}*t*), where

*Y*

_{i,j}(

*t*) has the same definition as in equation 4.1. Then the MSPRT requires computing and the decision should be made as soon as any

*l*(

_{i}*t*) exceeds a decision threshold

*z*. Here the gain parameter

*g*is a constant scaling factor. To faithfully implement the MSPRT, the gain needs to be set to the following optimal value: Note that equation 5.2 describes the same function of the activity of cortical integrators as equation 2.4 that has been proposed to be approximated in the basal ganglia (Bogacz & Gurney, 2007). Therefore, the MSPRT can also be implemented in the model of the cortico-basal ganglia circuit (Bogacz & Gurney, 2007) when the sensory evidence is carried by discrete spikes. We hereafter refer to the model described by equations 5.1 and 5.2 as the spiking MSPRT model. In the rest of this section we investigate the performance of this model and the effect of the gain parameter.

Figure 3a compares the DT of the spiking race model, the leaky competing accumulator (LCA) model proposed by Usher and McClelland (2001), and the spiking MSPRT model, for a fixed accuracy. As the number of alternatives increases, all the models produce longer DT. For *N* = 2, the LCA model and the spiking MSPRT model achieve similar DT. This was not unexpected because for gaussian inputs, both the LCA model and the MSPRT model can be reduced to the SPRT for *N* = 2 (Bogacz et al., 2006). Hence, the simulation result supports that this relationship still holds when sensory evidence is represented by spike trains. For *N*>2, the spiking MSPRT model achieves the shortest DT among all the models.

The spiking MSPRT model includes the ratio λ_{+}/λ_{−} in the gain parameter *g*, which suggests that it requires the task difficulty (expressed by parameters λ_{+} and λ_{−}) to be known a priori. There are, however, two solutions to this problem. First, Bogacz and Gurney (2007) have shown that for gaussian inputs, to minimize DT, the gain parameter *g* has to be set to either the optimal value *g* = *g** or to a higher value. When *g* has lower values (*g* < *g**), the performance of their model decreases to that of the LCA model. Figure 3b shows that the spiking MSPRT model follows the same pattern. If *g*>*g**, the model achieves similar DT. If *g* < *g**, the DT increases as *g* decreases. For relatively small *g* (*g*/*g** < 0.02 in Figure 3b), the DT of the spiking MSPRT model converges to the DT of the LCA model with the same level of inputs. Hence, if *g* is set above the value determined by λ_{+} and λ_{−}, the performance of the model does not deteriorate. Thus, the optimal performance may achieved by setting *g* sufficiently high, even if the precise value of λ_{+} and λ_{−} is not known.

The second alternative solution is to follow a very interesting approach proposed recently by Beck et al. (2008): a model of decision making that does not require the knowledge of task difficulty. They achieved it by making very clever assumptions about information coding by sensory neurons. In appendix D we show that the assumption of Beck et al. corresponds in the spiking MSPRT model to assuming that the ratio λ_{+}/λ_{−} is always fixed for all levels of difficulty (this would imply that on difficult trials, both λ_{+} and λ_{−} are lower than on easy trials, but their ratio always remains constant). If the ratio λ_{+}/λ_{−} is always constant, it is reasonable to assume that its value can be well estimated by the neural decision circuit. In the spiking MSPRT model, only the ratio λ_{+}/λ_{−} (rather than λ_{+} and λ_{−} individually) influences the gain parameter *g* (see equation 5.3). Hence if we follow the approach of Beck et al. and assume that the ratio λ_{+}/λ_{−} is known, then the parameter *g* can always be set appropriately (no matter what the individual values of λ_{+} and λ_{−} are).

### 5.2. Optimal Decision Procedure for Spiking Inputs from a Tuning Curve.

The previous section assumed that neurons supporting all the incorrect alternatives have the same firing rate. However, in many situations, sensory neurons have different firing rates for different stimuli. For example, in the motion discrimination task described in section 2, the MT neurons selective for rightward motion would have the highest firing rates when a stimulus with coherent rightward motion is presented, but would also have increased activity for stimuli with motion in similar directions, for example, toward the upper-right corner of the screen (Britten et al., 1993; Dubner & Zeki, 1971; Maunsell & Van Essen, 1983a, 1983b; Zeki, 1980). More specifically, the mean firing rate of sensory neurons varies as a function of the stimulus, and this function is called the tuning curve. Figure 4a illustrates the tuning curve of one sensory neuron selective for rightward motion. The neuron has the maximum firing rate λ_{+} when the coherent motion direction is the neuron's preferred direction (equal to 0 degrees for this neuron); otherwise, the firing rates vary depending on the difference between its preferred direction and the stimulus direction.

*N*directions equally distributed around 360 degrees (e.g., Churchland, Kiani, & Shadlen, 2008) and denote the direction of the coherent motion by θ

_{0}(indicating the correct alternative). For a sensory neuron in area MT with preferred direction θ, its tuning curve can be approximately described by a gaussian function (Snowden, Treue, & Andersen, 1992): where λ

_{θ}(θ

_{0}) is the mean firing rate of the neuron specified by the tuning function Φ(θ, θ

_{0}). λ

_{+}and λ

_{−}denote the firing rates of the neuron when the coherent direction is the neuron's preferred or null direction, and σ describes the width of the tuning curve.

*d*(θ, θ

_{0}) denotes the distance between θ and θ

_{0}and is defined as The preferred direction of the neurons supporting alternative

*i*is θ

_{i}= 360°(

*i*− 1)/

*N*, and hence their mean firing rate for stimulus θ

_{0}is For the NAFC task described above, appendix C.2 shows that the MSPRT can be implemented by computing Here

*Y*(

_{k}*t*) is the total number of spikes from the

*k*th population up to time

*t*, which is given by equation 5.1. To implement the MSPRT, alternative

*i*should be selected as soon as any

*l*(

_{i}*t*) exceeds a decision threshold.

When the spike trains are generated from tuning curves, the neural integrator selective for alternative *i* needs to compute equation 5.8, which involves the sum of the spike counts from all the *N* sensory populations and is weighted by the logarithm of their firing rate if alternative *i* were correct. Such a computation can be performed if the neural integrators selective for a particular direction receive the strongest input from sensory neurons selective for this direction, and also smaller inputs from sensory neurons preferring similar directions. Exactly the same weighting as in equation 5.8 was proposed by Jazayeri and Movshon (2006), who showed that it also allows optimal representation of sensory information by neural integrators. Note that equation 5.7 describes the same function of the activity of cortical integrators as equation 2.4 that has been proposed to be evaluated in the basal ganglia (Bogacz & Gurney, 2007). Hence a network involving cortical integrators connected with sensory neurons as described above and the model of the basal ganglia (Bogacz & Gurney, 2007) can implement the MSPRT when the spiking inputs are generated from tuning curves. We refer to the model described by equations 5.7 and 5.8 as the tuned MSPRT model.

To illustrate the optimality of the tuned MSPRT model, Figure 4b compares DT (for a fixed accuracy) of the tuned MSPRT model and simpler strategies in which the integrators accumulate the input from only one corresponding population of sensory neurons: the spiking race model and the LCA model. The spiking inputs of all the models are generated from the tuning curve defined in equation 5.4. All three models produce longer DT as *N* increases, as in Figure 3a. However, Figure 4b shows that for inputs generated from the tuning curve, the spiking race and the LCA models produce much longer DT for *N* ⩾ 8. This is due to the fact that for large *N*, the difference between the populations’ preferred directions θ_{i} are small (as the *N* directions are equally distributed around 360 degrees) so that the firing rates of the adjacent populations are similar, which makes it more difficult to discriminate the population with the highest mean firing rate (i.e., the correct alternative) and results in long DT. Importantly, for large *N*, the tuned MSPRT model achieves DTs several times shorter than those of the simpler models.

In the tuned MSPRT model, λ_{+} and λ_{−} define the range of firing rates of the MT neurons for a given task difficulty. For example, in a difficult task, λ_{+} and λ_{−} will be close to each other, and according to equation 5.4, the MT neurons will have similar firing rates for different directions of motion. The weights of connections between MT and integrator neurons in equation 5.8 depend on λ_{+} and λ_{−} via equations 5.4 and 5.6, which suggests that the model requires the task difficulty to be known a priori. However, as for the spiking MSPRT model, following the method of Beck et al. (2008), appendix D shows that the task difficulty does not need to be known in the tuned MSPRT model when the ratio λ_{+}/λ_{−} is assumed to be constant.

## 6. Can We Approximate Sensory Input by Gaussian Distribution?

Many experimental studies have reported that the variance in firing rate of sensory neurons is proportional to its mean (Britten et al., 1993; Shadlen & Newsome, 1998; Tolhurst, Movshon, & Dean, 1983). Consequently, several previous studies have used gaussian inputs with a similar variance-mean relationship in simulations of decision processes (Bogacz et al., 2007; Diederich & Busemeyer, 2006; Niwa & Ditterich, 2008), but we show that such types of inputs may lead to spurious conclusions from the simulations.

Appendix A shows that for gaussian inputs with variance proportional to the mean, the SPRT cannot be implemented by linear integration of evidence. Instead the SPRT is implemented for such inputs by a procedure in which the difference of squares of sensory evidence is integrated until it exceeds a positive or a negative threshold. This procedure is unusual from a biological computation point of view and has not yet been supported by either physiological or behavioral data.

To illustrate the problems that may occur in the simulations using gaussian inputs with variance proportional to the mean, we simulate two decision procedures for 2AFC tasks with such inputs: one involving the integration of the difference between inputs and a second involving the integration of the difference between the squares of inputs. In the simulation, the inputs were accumulated in integrators in time steps of δ. Hence, at every time step, the evidence supporting the first alternative was generated from a gaussian distribution with mean and variance μ_{1}δ, while the evidence supporting the second alternative was generated from a gaussian distribution with mean and variance μ_{2}δ.

Figure 5 illustrates the mean DT at a certain accuracy for the two procedures with different δ. For δ above 0.2 s, both procedures perform equally, but note that this value of δ is close to DT, so on the majority of simulated trials, the threshold is reached after a single integration step. When the decision is made on the basis of a single sample from each sensory input, the squaring of the sensory inputs does not change the outcome of decision, and therefore the two procedures perform equally. By contrast, for smaller δ, the integration of squares is superior. Furthermore, as δ decreases, DT of the procedure involving the integration of squares converges to 0, which is not biologically realistic because it implies that a decision with a high accuracy may be made after an arbitrary short time on the basis of noisy inputs.

Let us provide an intuition for this surprising numerical phenomenon. When the variance is proportional to the mean, each sample carries a significant amount of information even if δ decreases to 0. For example, if the evidence supporting the first alternative is particularly high or low in a given step, it indicates that the first alternative has higher variance, and hence is the correct one. This information is extracted by the procedure integrating squares. Since DT is equal to the product of the number of samples and δ and each sample carries significant information, the DT required to gain a desired accuracy decreases to 0 as δ decreases to 0.

If one simulates a nonlinear decision network sharing properties with the model integrating squares of inputs (e.g., in which at each step, the inputs with the higher values have larger influence over the final choice), then one may underestimate DT (for fixed accuracy) if one uses the gaussian inputs with variance proportional to the mean.^{1}

For spiking inputs or for gaussian inputs with constant variance across alternatives, DT (for fixed accuracy) little depends on δ. Paradoxically, for the gaussian inputs with variance proportional to the mean, which seem to be closer to biology than gaussian inputs with constant variance, the DT may strongly depend on δ, which may produce unrealistic results of simulations.

For Poisson spike trains, the implementation of SPRT is to integrate the difference between the spike counts. On the other hand, for gaussian inputs with variance proportional to the mean, the implementation of SPRT would be to integrate the differences between the squares of sensory evidence. How can the two distinct procedures both be optimal, given the fact that a Poisson distribution can be approximated by a gaussian distribution for sufficiently large values of the mean? To investigate this question, Figure 6 shows histograms of three types of sensory evidence from one time step δ = 0.01 s: the difference of two Poisson spike counts, the difference of two gaussian inputs, and the difference of the squares of two gaussian inputs. For large population size (*M* = 100 in Figure 6), the distribution of Poisson inputs approaches a gaussian distribution as expected. In addition, for large *M*, the difference of the squares of gaussian inputs has a similar (but scaled) distribution to the difference of the gaussian inputs. Therefore, the two SPRT procedures for the gaussian inputs and the Poisson inputs converge for large *M*.

## 7. Conclusion

### 7.1. Summary.

This study has considered the decision problem under an assumption that the sensory evidence is represented by discrete Poisson spike trains. We have shown that for two alternatives, the SPRT can be implemented by a simple integration strategy as for gaussian inputs. For multiple alternatives, we have shown that the MSPRT can also be implemented by the same network involving cortical integrators and the basal ganglia model, as for gaussian inputs. Finally, if the firing rates of sensory neurons come from tuning curves, we have shown that each neural integrator in the cortex needs to receive the sum of the spike trains from all sensory neurons, weighted by the similarity between the preferred directions of the integrator and given sensory populations.

### 7.2. Variability of Difficulty Between Trials.

As mentioned in section 2.2, all previous work on neural implementations of the SPRT and the MSPRT for simplicity assumed that the task difficulty remains constant between trials, and the same simplifying assumption is made throughout this article. In many experimental studies of decision making, the choice difficulty was indeed constant between trials (e.g., Forstmann et al., 2008; Simen et al., in press; Bogacz, Hu et al., in press). However, in many other experimental paradigms, the choice difficulty varied between trials (e.g., Roitman & Shadlen, 2002).

It is interesting to point out that although we showed in sections 5.1 and 5.2 that the task difficulty does not need to be known for the models described in this article to work, the models do not implement the SPRT or the MSPRT for tasks with difficulty varying across trials. This is because the statistical hypotheses about the probability distributions of sensory inputs in each of the models (see equations B.1and C.1 assume that λ_{+} and λ_{−} are constant. To derive the procedures maximizing the reward rate for tasks with difficulty varying across trials, one would have to define more general hypotheses. This can be one of the directions of future work.

### 7.3. Future Work.

One extension of this study is to consider more realistic spike train statistics. In this article, we assumed that the spike trains of sensory neurons are generated from the Poisson processes, in which the mean spike count is equal to its variance. However, data from in vivo experiments suggested that the variance-mean ratio of the spike trains from cortical neurons is slightly larger than 1 (1–1.5) (Shadlen & Newsome, 1998). This larger variance relative to the mean may arise from synaptic background activity from other cortical areas (Softky & Koch, 1993; Wolfart, Debay, Le Masson, Destexhe, & Bal, 2005), or it may also be caused by the intrinsic irregularity of neurons. One may consider a more general class of stochastic processes (e.g., the renewal processes), whose variance-mean ratio is a constant (Smith, 1959). Another extension would be to consider the correlation between firing rate of sensory neurons (Zohary et al., 1994).

## Appendix A: SPRT for Gaussian Inputs

This appendix derives the procedures implementing the SPRT for the assumption that the evidence *x _{i}*(

*t*) supporting alternative

*i*comes from gaussian distribution with mean μ

_{i}and standard deviation σ

_{i}, which we denote . Initially we loosen the assumptions of section 2.2, as we do not assume that σ

_{1}= σ

_{2}. After deriving the procedure for these generalized assumptions, we will consider two cases: (1) σ

_{1}= σ

_{2}, that is, both integrators receive equally noisy evidence, as in section 2.2 and (2) σ

^{2}

_{i}∼ μ

_{i}, the variance in firing rate of sensory neurons is proportional to its mean, as reported by Shadlen and Newsome (1998).

_{+}and σ

_{+}, and by μ

_{−}and σ

_{−}for the incorrect alternative. As mentioned in section 2.3, the process of decision making can be defined as distinguishing between two hypotheses

*H*

_{1}and

*H*

_{2}, corresponding to the first and second alternatives being correct: We now derive the SPRT (see equation 2.2) for the hypotheses defined above. By taking the logarithm of equation 2.2, we infer that the sampling should continue as long as the difference between the logs of the conditional probabilities remains between the (new) decision thresholds: Let us calculate the log of the conditional probability of

*input*(

*t*) given that the first alternative is correct: In the above equation,

*f*

_{μ,σ}denotes gaussian probability density function with mean μ and standard deviation σ, hence: Analogously: The difference of the logs of the probabilities becomes

_{+}= σ

_{−}= σ, then the first term in the square bracket in the last line of the above equation cancels and it simplifies to: Thus, for assumption 1, the SPRT is implemented by a simple procedure of integrating the difference of sensory evidence until it exceeds a positive or negative threshold (Gold & Shadlen, 2001, 2002; Laming, 1968).

^{2}

_{i}∼ μ

_{i}, then the second term in the square brackets in the last line of equation A.7cancels, and it simplifies to Thus for assumption 2, the SPRT is implemented by the procedure in which the difference of squares of sensory evidence is integrated until it exceeds a positive or negative threshold.

## Appendix B: The Spiking SPRT Model

This appendix describes the optimal decision procedure for 2AFC tasks when the sensory evidence is carried by the Poisson spike trains. Appendix B.1 derives the procedures implementing the SPRT. Appendixes B.2 and B.3 calculate the accuracy and mean DT of the spiking SPRT model. The calculation in appendixes B.2 and B.3 is analogous to the derivation for gaussian inputs by Shadlen, Hanks, Churchland, Kiani, and Yang (2006).

### B.1> The SPRT for Spiking Inputs.

For a 2AFC task described in section 3.1, let us assume that the spike trains of the neurons supporting the two alternatives are generated from independent Poisson processes *Y*_{1, j}(*t*) and *Y*_{2, j}(*t*), with intensities λ_{1}>0 and λ_{2}>0, respectively. Let two random variables *Y*_{1}(*t*) and *Y*_{2}(*t*) be the total number of incoming spikes from the *M* neurons in the two populations up to time *t*. According to the property of the Poisson process, they follow the Poisson distribution with associated parameters λ_{1}*Mt* and λ_{2}*Mt*.

_{+}= max(λ

_{1}, λ

_{2}), and denote the firing rate of the neurons in the other population by λ

_{−}= min(λ

_{1}, λ

_{2}), and hence λ

_{+}>λ

_{−}>0. The decision problem can be defined as distinguishing between two hypotheses

*H*

_{1}and

*H*

_{2}: Denote the entirety of sensory evidence available up to time

*t*by

*input*(

*t*) = {

*Y*

_{1}(

*t*),

*Y*

_{2}(

*t*)}. Follow the same calculation as in appendix A, the conditional probability of

*input*(

*t*) given the

*H*

_{1}being correct is given by (cf. equation A.4): where

*f*

_{λ}denotes the Poisson probability density function with mean and variance λ: Substituting equation B.3in B.2 we obtain: where

*C*is a constant independent to the hypothesis. Analogously: Hence the difference of the logs of the probabilities becomes as

### B.2. The Probability of a Correct Decision.

We divide the decision time into discrete intervals with tiny length δ>0. Let *X ^{k}*

_{1}and

*X*

^{k}_{2}be the total number of incoming spikes from the

*M*neurons in the two populations during the interval [(

*k*− 1)δ,

*k*δ].

*X*

^{k}_{1}and

*X*

^{k}_{2}follow the Poisson distributions with associated parameters λ

_{1}

*M*δ and λ

_{2}

*M*δ. Let us denote the difference between

*X*

^{k}_{1}and

*X*

^{k}_{2}by

*X*(

^{k}*X*=

^{k}*X*

^{k}_{1}−

*X*

^{k}_{2}), and denote the difference between the total spike counts between the two sensory populations during [0,

*k*δ] by

*Y*. Note that

^{k}*Y*is a discrete version of the decision variable of the SPRT procedure (see equation B.8 and can be calculated by taking the sum of

^{k}*X*at each time interval.

^{k}*X*. The MGF of a random variable υ is defined by (Kenney & Keeping, 1951): where

^{k}*E*denotes the expectation and θ is the argument of the MGF

**M**(θ). For

*X*

^{k}_{1}and

*X*

^{k}_{2}which follow the Poisson distribution, we have (Papoulis, 1986): Since

*X*is a sequence of independent and identically distributed random variables, we can drop the superscript, and the MGF of

^{k}*X*is We assume after

^{k}*n*time steps the decision variable {

*Y*} exactly hits one of the thresholds ±

^{k}*z*(i.e., no overshoot), and denote the terminal state by

*Y*. We also assume {

^{n}*Y*} hits

^{k}*z*with probability

*P*

_{1}(i.e., the hypothesis

*H*

_{1}is selected), while {

*Y*} hits −

^{k}*z*with probability

*P*

_{2}= 1 −

*P*

_{1}. Since

*Y*can be equal to only one of the two threshold values, the mean of

^{n}*Y*is

^{n}*Y*also has its MGF: For a sequential procedure that computes the decision variable {

^{n}*Y*} by taking the sum of the random variables

^{k}*X*(i.e., the SPRT), it must satisfy Wald's fundamental identity (Wald, 1947): Equation B.15must be satisfied for all values of θ. We seek θ that satisfies

^{k}**M**

_{X}(θ) = 1 (because for this value , and hence we will be able to solve equation B.14for

*P*

_{1}). Hence from equation B.12 we need to solve and the solutions are As there exists θ = θ

_{1}that makes

**M**

_{X}(θ) = 1, substitute θ

_{1}from equation B.17in B.15 and we have: Relating equations B.14and B.18 we have a function of

*P*

_{1}: Solving equation B.19 we obtain the solution for

*P*

_{1}: Recall the initial hypothesis: if λ

_{1}>λ

_{2}, then λ

_{1}= λ

_{+}, λ

_{2}= λ

_{−}, and the probability of making a correct decision (i.e., select

*H*

_{1}) is Similarly, if λ

_{1}< λ

_{2}, then the correct alternative is

*H*

_{2}, and λ

_{1}= λ

_{−}, λ

_{2}= λ

_{+}. The probability of making a correct decision is Therefore the accuracy of the spiking SPRT model is independent of which alternative is actually correct and is given by equation B.21

### B.3. The Expected Decision Time.

*n*when {

*Y*} reaches the terminal state

^{k}*Y*= ±

^{n}*z*. To calculate

*E*[

*n*], we again start from Wald's identity: We first differentiate equation B.23with respect to θ, and obtain Let θ = θ

_{0}= 0; then

**M**

_{X}(θ) = 1 and

**M**′

_{X}(θ) =

*E*[

*X*] = λ

_{1}

*M*δ − λ

_{2}

*M*δ. Substituting them in equation B.24 we have and hence: When the decision process terminates at step

*n*,

*Y*=

^{k}*Y*. Substituting equation B.13 into B.26 we have Then substituting the expression for

^{n}*P*

_{1}from equation B.20in B.27 we have If λ

_{1}>λ

_{2},

*H*

_{1}is true and λ

_{1}= λ

_{+}, λ

_{2}= λ

_{−}. The mean DT is Similarly, if λ

_{1}< λ

_{2}, λ

_{1}= λ

_{−}, and λ

_{2}= λ

_{+}, then Hence the mean number of observation

*n*is given by equation B.29 regardless of which alternative is actually correct, and the mean DT is

## Appendix C: The Spiking and Tuned MSPRT Models

This appendix derives the asymptotically optimal decision procedures (the MSPRT) for NAFC tasks when the sensory evidence is carried by the Poisson spike trains. Appendix C.1 considers a general case that the populations supporting all incorrect alternatives have the same mean firing rates. In appendix C.2 we assume that the mean firing rates of sensory populations are generated from tuning curves. The calculation is analogous to the derivation of the MSPRT for gaussian inputs by Bogacz and Gurney (2007).

### C.1. The MSPRT for Spiking Inputs.

_{+}(i.e., the true alternative), given that all other populations have the same mean firing rate λ

_{−}(λ

_{+}>λ

_{−}). The process of decision making can be defined as distinguishing between

*N*hypotheses

*H*(1 ⩽

_{i}*i*⩽

*N*) corresponding to the

*i*th alternative being correct: The asymptotically optimal decision procedure for such a decision task is given by the MSPRT (Dragalin et al., 1999). The MSPRT is to compute the conditional probability

*p*(

_{i}*t*) =

*P*(

*H*|

_{i}*input*(

*t*)) and selects the alternative

*i*as soon as

*p*(

_{i}*t*) exceeds a decision threshold (Dragalin et al., 1999, and cf. Bogacz & Gurney, 2007), as described in equation 2.3. If there is no prior knowledge about the decision task, all hypotheses have the same prior probability

*P*(

*H*) = 1/

_{i}*N*. Then equation 2.3 can be simplified to (Bogacz & Gurney, 2007) and the logarithm of

*p*(

_{i}*t*) is The conditional probability of

*input*(

*t*) given

*H*being correct is given by where

_{i}*f*

_{λ}denotes the Poisson probability density function with mean and variance λ, given in equation B.3 Substituting equation B.3 in C.4 we obtain

### C.2. The MSPRT for Spiking Inputs from Tuning Curve.

_{0}is one of the

*N*directions that are equally distributed around 360 degrees. Hence, the

*N*hypotheses

*H*(1 ⩽

_{i}*i*⩽

*N*) of the decision task are As described in section 5.2, for a given coherent direction θ

_{0}= θ

_{i}, the neurons selective for the direction θ

_{l}(1 ⩽

*l*⩽

*N*) have the firing rate , which is specified by the tuning curve defined in equation 5.4. Hence the hypotheses can be represented as and the decision task is to distinguish between the

*N*hypothesis on the basis of the Poisson spike trains. Let us denote the entirety of sensory evidence available up to time

*t*by

*input*(

*t*) = {

*Y*(

_{l}*t*): 1 ⩽

*l*⩽

*N*}, where

*Y*has the same deftinition as in appendix B.1. Similar to the calculation in appendix C.1, the MSPRT requires us to compute the logarithm of the conditional probability of

_{l}*input*(

*t*) given

*H*being correct (cf. Jazayeri & Movshon, 2006): Note that the last three terms on the right-hand side of equation C.11do not depend on the hypotheses, and hence denoting their sum by

_{i}*C*, we have: Following the same calculation as in appendix C.1, the procedure implementing the MSPRT with spiking inputs from tuning curves is to compute and make a choice as soon as any log

*p*(

_{i}*t*) exceeds the decision threshold.

## Appendix D: Optimal Decision Making Without Knowledge of Task Difficulty

This appendix considers the problem of whether the task difficulty has to be known a priori in the spiking and tuned MSPRT models. Following the notation in appendix B.2, let *X ^{k}_{i}* be the numbers of incoming spikes from one of

*M*neurons in the

*i*th population during the interval [(

*k*− 1)δ,

*k*δ], and let vector

**X**

^{k}(

**X**

^{k}= {

*X*

^{k}_{1},

*X*

^{k}_{2}, …,

*X*}) be the total number of spikes from all sensory neurons during the

^{k}_{N}*k*th interval.

*i*being correct can be decomposed into the following product (see equation 4 in Beck et al., 2008), where

**c**is a collection of nuisance parameters (e.g., motion coherence, population size). Φ is a function that does not depend on alternative

*i*being considered, and

*h*is a function that depends on

*i*, but not on parameters related to task difficulty (i.e., λ

_{−}and λ

_{+}).

**X**

^{k}in the

*k*th interval given is equal to We now show under what assumptions equation D.2can be decomposed according to equation D.1 Algebraic manipulations on D.2 give Note that the terms in the braces do not depend on

*i*, so they can form function Φ in equation D.1. The last term of equation D.3can be decomposed as in equation D.1, only if one assumes that λ

_{+}/λ

_{−}is constant (denoted by η) and does not vary with task difficulty. Then the function

*h*can be defined as In the tuned MSPRT model, the probability of observing

**X**

^{k}in the

*k*th interval is equal to According to equations 5.4 and 5.5, can be written as where

*F*(

*i*,

*j*) is a function of

*i*and

*j*describing the shape of the tuning curve, independently from its maximum and minimum values (that depend on λ

_{+}and λ

_{−}). Let us transform the right term of equation D.5Substituting equation D.7in D.5 we have If we keep the assumption that the tuning curves of sensory neurons have identical shape (similar to the assumption made by Ma, Beck, Latham, & Pouget, 2006), is a constant independent of

*i*, and then the terms in the braces in equation D.8can form function Φ in equation D.2 Similarly to the spiking MSPRT model in equation D.3 when λ

_{+}/λ

_{−}is a constant, equation D.8can be decomposed according to D.1and hence the tuned MSPRT model does not require the task difficulty to be known a priori.

## Acknowledgments

This work was supported by EPSRC grant EP/C514416/1. We thank Angela Onslow for reading an earlier version of the manuscript and very useful comments.