## Abstract

Experimental data indicate that perceptual decision making involves integration of sensory evidence in certain cortical areas. Theoretical studies have proposed that the computation in neural decision circuits approximates statistically optimal decision procedures (e.g., sequential probability ratio test) that maximize the reward rate in sequential choice tasks. However, these previous studies assumed that the sensory evidence was represented by continuous values from gaussian distributions with the same variance across alternatives. In this article, we make a more realistic assumption that sensory evidence is represented in spike trains described by the Poisson processes, which naturally satisfy the mean-variance relationship observed in sensory neurons. We show that for such a representation, the neural circuits involving cortical integrators and basal ganglia can approximate the optimal decision procedures for two and multiple alternative choice tasks.

## 1.  Introduction

Making choices on the basis of sensory information is one of the critical abilities for animals and humans. Neurophysiological studies suggest that during perceptual decision making, the neurons in sensory areas provide noisy evidence represented in their firing rates (Britten, Shadlen, Newsome, & Movshon, 1993; Newsome & Paré 1988; Shadlen & Newsome, 1998). Experimental studies further indicate that during the decision process, this sensory evidence is accumulated over time by neural integrators in certain cortical areas (Kim & Shadlen, 1999; Schall, 2001; Shadlen & Newsome, 2001), and the decision is made as soon as the activity of the integrator neurons exceeds a certain threshold (Mazurek, Roitman, Ditterich, & Shadlen, 2003; Roitman & Shadlen, 2002; Schall, 2001, 2002).

On the basis of these and other data, theoretical studies have proposed that during the decision process, the brain approximates optimal statistical tests: the sequential probability ratio test (SPRT) (Barnard, 1946; Wald, 1947) during the choice between two alternatives (Bogacz et al., 2006; Gold & Shadlen, 2001, 2002, 2007; Laming, 1968) and the multihypothesis SPRT (MSPRT) (Baum & Veeravalli, 1994; Dragalin, Tartakovsky, & Veeravalli, 1999) during the choice between multiple alternatives (Bogacz & Gurney, 2007). These tests provide an optimal criterion for when to stop the integration of sensory information and execute an action. In this article, we refer to decision-making models as optimal if they minimize decision time for any level of accuracy (Dragalin et al., 1999; Wald & Wolfwitz, 1948; see section 2 for details).

All the theoretical studies mentioned above rely on two assumptions: (1) in each moment of time, sensory evidence in support of each alternative comes from a gaussian distribution, and (2) the distributions of the evidence supporting all alternatives have the same variance. However, both of these assumptions are oversimplified since sensory neurons transmit information as discrete spikes, and the variance of neuronal firing rate is proportional to its mean in sensory areas (Shadlen & Newsome, 1998).

This article analyzes whether the conclusions of the theoretical studies, suggesting that the neural decision circuits approximate the SPRT or the MSPRT, still hold when more realistic assumptions are made regarding the representation of evidence in sensory areas. It shows that indeed, simple integration strategies can implement the optimal statistical tests in two and multiple alternative tasks if it is assumed that evidence is represented by spikes of sensory neurons and the variance of neuronal firing rate is proportional to its mean.

The article is organized as follows. Section 2 reviews the current theories proposing that the brain approximates the optimal statistical tests for decision making. Section 3 formalizes the decision-making problem on the basis of spiking inputs. Sections 4 and 5 describe the models of neural circuits that implement the optimal decision procedures for two and multiple alternatives, respectively. Section 6 demonstrates that the simulations of decision networks that approximate spiking input with a gaussian variable with variance proportional to the mean may generate surprising spurious results under certain conditions. Section 7 discusses the directions of future work. Mathematical details are contained in appendixes.

## 2.  Review of the Theory of Optimal Decision Making

### 2.1.  Neurophysiology of Decision.

In a typical motion discrimination task used to study the neural basis of decision, an animal is presented with a stimulus consisting of a circular display filled with moving dots (Britten, Shadlen, Newsome, & Movshon, 1992, 1993). A certain proportion of the dots moves coherently in one direction, while the other dots move randomly. The animal has to identify the direction of coherent motion of the majority of dots and indicates its response by making a saccade in the corresponding direction. In this article, we focus on the reaction time version of this task (Roitman & Shadlen, 2002), in which the animal is free to indicate its choice at any time after stimulus onset.

During this task the firing rates of motion-sensitive neurons in area MT encode the amount of coherent motion in the neurons’ preferred direction at a particular moment of time (Britten et al., 1993). Hence, these neurons provide evidence supporting alternative choices. On the basis of experimental data, it has been proposed that the evidence for a particular direction of motion is integrated over time by neurons in cortical areas (including the lateral intraparietal area and the frontal eye field) involved in controlling eye movement (Schall, 2001; Shadlen & Newsome, 2001), and the decision is made as soon as the firing rate of integrator neurons corresponding to a particular alternative exceeds a certain decision threshold (Kiani, Hanks, & Shadlen, 2008; Roitman & Shadlen, 2002).

### 2.2.  Decision Problem.

For a decision task with N alternatives (N ⩾ 2), let us denote the firing rate of sensory neurons (e.g., in area MT) selective for alternative i at time t by xi(t). Most of the theoretical work on decision making assumes that xi(t) come from a gaussian distribution with mean μi and the same standard deviation σ for all alternatives (Bogacz et al., 2006; Bogacz & Gurney, 2007; Bogacz, Usher, Zhang, & McClelland, 2007; Gold & Shadlen, 2001, 2002; McMillan & Holmes, 2006; Shadlen & Newsome, 1998; Usher & McClelland, 2001). The problem of decision making can be simply posed as the identification of which μi is the highest. The closer μi to each other and the higher σ, the more difficult the decision, and these theoretical works for simplicity assumed that the difficulty remains fixed between trials. The decision problem can be formulated as deciding between N statistical hypotheses Hi, each stating that μi is the highest (see appendix A for a precise statement of the hypotheses).

This problem is, however, nontrivial, because the firing rates of sensory neurons are noisy and a simple strategy of choosing the alternative with the highest xi(t) at any particular moment of time t would produce many incorrect choices. Hence, solving decision problems accurately requires sampling of xi(t) for a period of time. Such sampling may involve the accumulation of the evidence over time:
2.1
and choosing the alternative with the highest yi(t). Under this strategy, the longer the integration period, the more accurate the answer.

However, this description leaves open the central question of when to stop the integration and make a response. The simplest solution to this question is to make a decision as soon as any yi(t) exceeds a preassigned threshold. This strategy is known as the race model (Vickers, 1970), but it is not the optimal one.

### 2.3.  Optimal Decision-Making Theory.

The optimal solution of the above decision problem for simple, two-alternative forced-choice (2AFC) tasks (N = 2) is provided by the SPRT. Let us denote the entirety of sensory evidence available up to time t by input(t) = {xi(s):1 ⩽ iN, 1 ⩽ st}. To implement the SPRT, the ratio of the conditional probabilities of input(t) given the two hypotheses being correct is computed at each time step:
2.2
If in a given step the ratio of equation 2.2 is above threshold α, then hypothesis H1 is accepted; if it is below a lower threshold 1/α, then H2 is accepted; otherwise the decision process continues, and the next sample of sensory evidence is observed. It has been shown that for the assumptions about the distributions of xi(t) given in section 2.2, the logarithm of equation 2.2 is proportional to the difference y1(t) − y2(t); hence, the SPRT can be implemented by a simple procedure in which a decision is made as soon as y1(t) − y2(t) exceeds a positive or a negative threshold z or −z (Gold & Shadlen, 2001, 2002; Laming, 1968). This argument is reiterated in appendix A. It has been demonstrated that this simple procedure (and thus the SPRT) can be approximated in various simple neuronal architectures (Bogacz et al., 2006; Brown et al., 2005; Mazurek et al., 2003; Shadlen & Newsome, 1998; Usher & McClelland, 2001; Wong & Wang, 2006). Furthermore, the diffusion model, which assumes that the brain employs this procedure during decision making, has been very successful in describing patterns of reaction times from a wide range of 2AFC tasks (Laming, 1968; Ratcliff, 1978, 1988; Ratcliff & McKoon, 2008).

As mentioned in section 1, the SPRT is optimal in the following sense: among all the methods of decision making giving a certain probability of error, it minimizes the expected number of samples required (Wald & Wolfwitz, 1948), or, in other words, for any accuracy level, it minimizes the average decision time (DT). Hence, for example, when both the SPRT and the race model are used to identify which of two inputs has higher mean, and in both models decision thresholds are set to give the same accuracy, the SPRT will give the lower average DT.

What is the ecological relevance of this optimality property? In many situations, animals aim at maximizing their total reward and need to choose threshold z, which balances the demands of speed and accuracy. In a wide range of tasks involving sequences of choices in which rewards can be obtained for correct choices, there exists a unique threshold zo maximizing the rate of receiving rewards (Bogacz et al., 2006). Importantly, the optimality property of the SPRT (i.e., minimization of DT for any accuracy) implies that the SPRT with threshold zo maximizes the reward rate (Bogacz et al., 2006; Bogacz, 2009). Thus, for example, the SPRT with threshold zo yields a higher reward rate than the race model with any threshold. An algorithm allowing rapid learning of zo has been proposed (Simen, Cohen, & Holmes, 2006), and recent experimental data suggest that a significant fraction of human participants indeed set the threshold close to zo (Simen et al., in press; Bogacz, Hu, Holmes, & Cohen, in press).

Let us describe how the above implementation of SPRT can account for the observation that accuracy decreases with choice difficulty. In the general formulation of SPRT in equation 2.2, the value of threshold α alone determines the accuracy (Wald, 1947). However, in the SPRT implementation in which a choice is made when y1(t) − y2(t) exceeds thresholds z or −z, the accuracy depends on both z and the difficulty of the choice. This is because y1(t) − y2(t) is proportional to the logarithm of equation 2.2 with a proportionality constant depending on task difficulty (see appendix A). Thus, making a choice when y1(t) − y2(t) exceeds a fixed threshold z or −z corresponds to making a choice when the likelihood ratio of equation 2.2 exceeds a threshold that depends on task difficulty.

Finally, it is interesting to point out that to minimize DT for any accuracy (or to maximize the reward rate), the exact value of the likelihood ratio of equation 2.2 does not need to be known—only a quantity (y1(t) − y2(t)) that is proportional to it with an unknown proportionality constant (depending on task difficulty).

For the case of N-alternative forced-choice (NAFC) tasks (N>2), there are two tests known as the MSPRT that are asymptotically optimal; they minimize DT for a fixed probability of error when this probability is asymptotically small (Dragalin et al., 1999). Furthermore, it has been demonstrated numerically that both MSPRTs give faster DT than simpler decision strategies for larger probabilities of error (Bogacz & Gurney, 2007; McMillan and Holmes, 2006). One of the MSPRTs computes the conditional probability pi(t) = P(Hi | input(t)) and the choice is made as soon as any of pi(t) exceeds a threshold (Dragalin et al., 1999). From Bayes’ theorem, we have
2.3
For the assumptions about the distributions of xi(t) given in section 2.2, the logarithm of pi(t), which we denote li(t) = log pi(t), is equal to (Bogacz & Gurney, 2007)
2.4
where g is a constant depending on task difficulty (however, its value little affects the performance; see Bogacz & Gurney, 2007). Thus, the MSPRT can be implemented by a procedure that at each moment of time t and for each alternative i computes li(t), and the decision is made as soon as any li(t) exceeds a certain decision threshold.

Bogacz and Gurney (2007) have shown that if yi(t) are provided by cortical integrators, then equation 2.4 may be computed in a neuronal network with striking similarities to the anatomy of the basal ganglia. In particular, in their model, the second term in equation 2.4 is computed by a network involving the subthalamic nucleus and the external part of the globus pallidus. This model is supported by data showing that the experimentally observed relationships between the input and the firing rate of neurons in these two basal nuclei closely match those required to compute the second term in equation 2.4 (Hallworth, Wilson, & Bevan, 2003; Nambu & Llinas, 1994; Wilson, Weyrick, Terman, Hallworth, & Bevan, 2004). Hence, Bogacz and Gurney (2007) proposed that the MSPRT may be approximated by the cortico-basal ganglia circuit.

## 3.  Decision Problem for Spiking Inputs in a General Task

For an NAFC task, let us assume that there exist N populations of sensory neurons supporting N alternatives, and each population i (i = 1, 2, 3, …, N) consists of M (M ⩾ 1) homogeneous sensory neurons with the same mean firing rate μi. For consistency with section 2, let us denote the total number of spikes produced by the jth (j = 1, 2, 3, …, M) neuron supporting alternative i in the time interval [0, t] by Yi,j(t).

Let us further assume that Yi,j(t) come from independent Poisson processes with intensities λi. Recall that for a Poisson process with intensity λi, the probability of the total number of spikes during [0, t] has Poisson distribution with mean λit and variance λit, which is consistent with the observation that the variance of the neural firing rate is proportional to its mean. As the ideal observer of sensory neurons knows the timing of all spikes, the problem of decision making can be simply defined as finding which λi is the highest on the basis of the observed spikes of all sensory neurons. The closer λi to each other, the more difficult the decision is. Following the previous theoretical works, we assume for simplicity that the difficulty remains fixed between trials.

## 4.  Optimal Choice Between Two Alternatives

For 2AFC tasks (N = 2) with spiking inputs, the decision problem is simplified to identifying which one of the two sensory populations has the higher mean firing rate, that is, which of λ1 and λ2 is higher. Since the interspike interval of a Poisson spike train is irregular, any decision strategy based on a single spike would produce many incorrect choices. Alternatively, an integration strategy similar to that for continuous gaussian inputs may be applied that counts the number of spikes for a period of time to obtain a more accurate estimation of the sensory evidence.

The above strategy is employed in the Poisson counter model (Pike, 1966; Smith & Van Zandt, 2000). The model assumes that spikes from sensory neurons supporting the two alternatives are counted by two separate integrators:
4.1

Here Y1(t) and Y2(t) denote the total number of spikes from the two sensory populations until time t, respectively. The decision is made when one of the integrators reaches a decision threshold. Since the Poisson counter-model is based on accumulating evidence separately for each alternative, it can be considered as a discrete analog of the race model (Vickers, 1970). Hence for consistency, we will refer to the Poisson counter model as the spiking race model. However, the spiking race model is not optimal, as it does not implement the SPRT.

Appendix B.1 shows that for spiking inputs, the SPRT can be implemented by computing the difference between the total numbers of spikes generated by the two sensory populations. More specifically, the SPRT requires computing
4.2
where Y1(t) and Y2(t) are defined in equation 4.1. The procedure continues until Y(t) exceeds a positive or a negative threshold ±z (z>0), and selects the first alternative if Y(t) exceeds z, or the second alternative if Y(t) is below −z. The decision time (DT) is defined as the time latency to reach one of the thresholds.

The SPRT for spiking inputs can be performed in an incremental fashion. The decision process starts with Y(0) = 0 at t = 0. When any neuron from the first population produces a spike, Y(t) is increased by 1. Analogously, if any neuron from the second population produces a spike, Y(t) is decreased by 1. Hence the SPRT for spiking inputs can be implemented by accumulating the difference of spike counts supporting the two alternatives, which is exactly the same procedure as for the gaussian inputs. Hence if the sensory evidence is represented by the Poisson spike trains, the optimal decision strategy can still be approximated by the previously proposed neural networks (Bogacz et al., 2006; Brown et al., 2005; Mazurek et al., 2003; Shadlen & Newsome, 2001; Usher & McClelland, 2001; Wong & Wang, 2006). We hereafter refer to the model in which Y(t) is integrated until it reaches one of the two thresholds as the spiking SPRT model. In the rest of this section, we investigate the performance of this model.

The time courses of the decision variable Y(t) on correct and error trials are shown in Figure 1a. Note that Y(t) has discrete values since it represents the difference of spike counts of the two sensory populations. For λ12, the population supporting the first alternative is likely to produce more spikes than the population supporting the second alternative in a unit of time. Hence, Y(t) is more likely to be positive and reach the upper threshold z, resulting in correct choices. But the irregular spike trains from the Poisson processes could also drive Y(t) to the lower threshold −z, resulting in incorrect choices. The DT distributions are heavily skewed toward longer times, as observed in experimental data (Luce, 1986; Ratcliff, 1978; Ratcliff & Smith, 2004).

Figure 1:

Spiking SPRT model. (a). Evolutions of the decision variable and DT distributions of the spiking SPRT model. The model was simulated with the following parameters: λ1 = 50.75 Hz, λ2 = 41.25 Hz, M = 1, z = 9, and Y(0) = 0. The solid curves denote correct trials, while the broken curve shows one error trial. The two histograms illustrate the distribution of the DT from correct (top) and error (bottom) trials from a total of 10,000 trials. (b) Mean DT of the spiking SPRT model and the spiking race model against the corresponding accuracy. The models were simulated with the same parameters as in panel a. Different points on the curves correspond to different values of the decision threshold. The decision thresholds for both models were varied from z = 1 to the maximum value that gives 99% accuracy. Data are averaged over 10,000 trials.

Figure 1:

Spiking SPRT model. (a). Evolutions of the decision variable and DT distributions of the spiking SPRT model. The model was simulated with the following parameters: λ1 = 50.75 Hz, λ2 = 41.25 Hz, M = 1, z = 9, and Y(0) = 0. The solid curves denote correct trials, while the broken curve shows one error trial. The two histograms illustrate the distribution of the DT from correct (top) and error (bottom) trials from a total of 10,000 trials. (b) Mean DT of the spiking SPRT model and the spiking race model against the corresponding accuracy. The models were simulated with the same parameters as in panel a. Different points on the curves correspond to different values of the decision threshold. The decision thresholds for both models were varied from z = 1 to the maximum value that gives 99% accuracy. Data are averaged over 10,000 trials.

To evaluate the performance of the spiking SPRT model, Figure 1b compares the mean DT of the spiking SPRT model with that of the spiking race model for different levels of accuracy. The spiking SPRT model indeed achieves shorter DT than the spiking race model, as predicted by the optimality of the SPRT (Barnard, 1946; Wald & Wolfwitz, 1948).

Appendixes B.2 and B.3 derive the analytical expressions of the accuracy and associated DT of the spiking SPRT model with threshold z:
4.3
4.4
where λ+ = max(λ1, λ2), and λ = min(λ1, λ2). The accuracy of the spiking SPRT model increases as the task becomes easier (i.e., λ+ decreases) or as the threshold z becomes larger. Note that as z → ∞, the accuracy of the spiking SPRT model converges to 100%, and DT goes to infinity. Moreover, the DT of the spiking SPRT model decreases monotonically as M increases. By contrast, the accuracy is independent of the size of the sensory populations M. This counterintuitive result comes from the fact that although for larger sensory populations the decision threshold would be reached faster, the probability of reaching one particular threshold (i.e., the probability of making a correct or incorrect decision) remains unchanged. However, it is important to caution that this independence of accuracy from M holds for the simplified assumption we made that the sensory neurons are independent. For a more realistic situation of correlated firing of sensory neurons (Zohary, Shadlen, & Newsome, 1994) the accuracy is unlikely to be independent of M, but we were unable to find the analytic expressions for the accuracy and DT for correlated inputs.

Figure 2 shows the accuracy and DT of the spiking SPRT model with different levels of sensory inputs. In order to model more biologically realistic inputs, the mean firing rates of the two sensory populations are selected on the basis of neural recording data during the motion discrimination task (Figure 2 in Britten et al., 1993; and cf. section 2.1). For each motion coherence level, λ1 and λ2 are set to be the mean firing rates of the selected MT neuron when the stimulus moves toward the neuron's preferred and null directions, respectively. The simulation results are shown to correspond closely to the theoretical predictions from equations 4.3 and 4.4. As the motion coherence increases from 2.0% to 51.2%, the change of the accuracy and mean DT of the spiking SPRT model shows the same pattern as observed in the experimental data (Roitman & Shadlen, 2002). For low coherence, since λ1 and λ2 are similar, the spike counts supporting the two alternatives would also be similar. As a result, Y(t) needs a relatively long time to reach the thresholds, producing more errors and large DT. Conversely, if λ1 and λ2 are distinct from each other (i.e., large coherence), the difference of the spike counts supporting the two alternatives quickly reaches the thresholds, resulting in more accurate choices and small DT.

Figure 2:

Behavioral performance of the spiking SPRT model in the motion discrimination task. The model was simulated for 5000 trials with parameters M = 1, z = 9, and the values of λ1 and λ2 for different motion coherence levels estimated from the neural recording data in the study by Britten et al. (1993). (a) Accuracy is plotted against the motion coherence. (b) Mean DT is plotted against the motion coherence. Points with error bars show the results of simulations. The error bars show the standard error of the mean. The circle points are the theoretical predictions from equations 4.3 and 4.4.

Figure 2:

Behavioral performance of the spiking SPRT model in the motion discrimination task. The model was simulated for 5000 trials with parameters M = 1, z = 9, and the values of λ1 and λ2 for different motion coherence levels estimated from the neural recording data in the study by Britten et al. (1993). (a) Accuracy is plotted against the motion coherence. (b) Mean DT is plotted against the motion coherence. Points with error bars show the results of simulations. The error bars show the standard error of the mean. The circle points are the theoretical predictions from equations 4.3 and 4.4.

## 5.  Optimal Choice Between Multiple Alternatives

### 5.1.  Optimal Decision Procedure for a General Task.

In this section we consider an NAFC task (N>2) and assume that the population supporting the correct alternative has mean firing rate λ+, while all other populations supporting the N − 1 incorrect alternatives have the same mean firing rate λ, where λ+. An analogous assumption has been made in previous models of NAFC task with gaussian inputs (Bogacz & Gurney, 2007; Bogacz et al., 2007; McMillan & Holmes, 2006). For consistency, we hereafter set λ1 = λ+ (i.e., λi = λ for i ≠ 1) and assume the first alternative is the correct choice.

Recall that for NAFC tasks, an asymptotically optimal decision procedure is provided by the MSPRT. In appendix C.1 we show that for the above assumption about λi, the MSPRT can be implemented by the same procedure as for gaussian inputs. Specifically, the MSPRT requires N neural integrators that count the total number of spikes from neurons supporting each alternative. Let us denote the activity of the integrator i at time t by yi(t),
5.1
where Yi,j(t) has the same definition as in equation 4.1. Then the MSPRT requires computing
5.2
and the decision should be made as soon as any li(t) exceeds a decision threshold z. Here the gain parameter g is a constant scaling factor. To faithfully implement the MSPRT, the gain needs to be set to the following optimal value:
5.3
Note that equation 5.2 describes the same function of the activity of cortical integrators as equation 2.4 that has been proposed to be approximated in the basal ganglia (Bogacz & Gurney, 2007). Therefore, the MSPRT can also be implemented in the model of the cortico-basal ganglia circuit (Bogacz & Gurney, 2007) when the sensory evidence is carried by discrete spikes. We hereafter refer to the model described by equations 5.1 and 5.2 as the spiking MSPRT model. In the rest of this section we investigate the performance of this model and the effect of the gain parameter.

Figure 3a compares the DT of the spiking race model, the leaky competing accumulator (LCA) model proposed by Usher and McClelland (2001), and the spiking MSPRT model, for a fixed accuracy. As the number of alternatives increases, all the models produce longer DT. For N = 2, the LCA model and the spiking MSPRT model achieve similar DT. This was not unexpected because for gaussian inputs, both the LCA model and the MSPRT model can be reduced to the SPRT for N = 2 (Bogacz et al., 2006). Hence, the simulation result supports that this relationship still holds when sensory evidence is represented by spike trains. For N>2, the spiking MSPRT model achieves the shortest DT among all the models.

Figure 3:

Mean DT of decision-making models with spiking inputs in the general NAFC task. (a) Mean DT of the spiking race model, the LCA model, and the spiking MSPRT model for different numbers of alternatives. For all the models, the mean firing rates of the N sensory populations are λ1 = 50.75 Hz, λ2 = ⋅ ⋅ ⋅ = λN = 41.25 Hz (values estimated from Britten et al., 1993), and the population size M = 3. The spiking MSPRT model was simulated with the optimal gain g* = 0.21. The LCA model was simulated with inhibition and decay set to 10. For each value of N, the thresholds of the models were found numerically that gave 90% ± 0.2% accuracy. The search for the threshold was repeated 20 times, and the DT from the 20 iterations was averaged to construct the data points. The standard error bars are smaller than the data points. (b) Effect of the value of the gain parameter of the spiking MSPRT model. The spiking MSPRT model was simulated with the following parameters: N = 10, λ1 = 56.49 Hz, and λ2 = ⋅ ⋅ ⋅ = λN = 37.50 Hz (values estimated from Britten et al., 1993). The mean DT is plotted against the ratio of the gain value g and the optimal gain g* with M= 1. The dashed line shows the mean DT produced by the LCA model with the same inputs and inhibition and decay set to 10.

Figure 3:

Mean DT of decision-making models with spiking inputs in the general NAFC task. (a) Mean DT of the spiking race model, the LCA model, and the spiking MSPRT model for different numbers of alternatives. For all the models, the mean firing rates of the N sensory populations are λ1 = 50.75 Hz, λ2 = ⋅ ⋅ ⋅ = λN = 41.25 Hz (values estimated from Britten et al., 1993), and the population size M = 3. The spiking MSPRT model was simulated with the optimal gain g* = 0.21. The LCA model was simulated with inhibition and decay set to 10. For each value of N, the thresholds of the models were found numerically that gave 90% ± 0.2% accuracy. The search for the threshold was repeated 20 times, and the DT from the 20 iterations was averaged to construct the data points. The standard error bars are smaller than the data points. (b) Effect of the value of the gain parameter of the spiking MSPRT model. The spiking MSPRT model was simulated with the following parameters: N = 10, λ1 = 56.49 Hz, and λ2 = ⋅ ⋅ ⋅ = λN = 37.50 Hz (values estimated from Britten et al., 1993). The mean DT is plotted against the ratio of the gain value g and the optimal gain g* with M= 1. The dashed line shows the mean DT produced by the LCA model with the same inputs and inhibition and decay set to 10.

The spiking MSPRT model includes the ratio λ+ in the gain parameter g, which suggests that it requires the task difficulty (expressed by parameters λ+ and λ) to be known a priori. There are, however, two solutions to this problem. First, Bogacz and Gurney (2007) have shown that for gaussian inputs, to minimize DT, the gain parameter g has to be set to either the optimal value g = g* or to a higher value. When g has lower values (g < g*), the performance of their model decreases to that of the LCA model. Figure 3b shows that the spiking MSPRT model follows the same pattern. If g>g*, the model achieves similar DT. If g < g*, the DT increases as g decreases. For relatively small g (g/g* < 0.02 in Figure 3b), the DT of the spiking MSPRT model converges to the DT of the LCA model with the same level of inputs. Hence, if g is set above the value determined by λ+ and λ, the performance of the model does not deteriorate. Thus, the optimal performance may achieved by setting g sufficiently high, even if the precise value of λ+ and λ is not known.

The second alternative solution is to follow a very interesting approach proposed recently by Beck et al. (2008): a model of decision making that does not require the knowledge of task difficulty. They achieved it by making very clever assumptions about information coding by sensory neurons. In appendix D we show that the assumption of Beck et al. corresponds in the spiking MSPRT model to assuming that the ratio λ+ is always fixed for all levels of difficulty (this would imply that on difficult trials, both λ+ and λ are lower than on easy trials, but their ratio always remains constant). If the ratio λ+ is always constant, it is reasonable to assume that its value can be well estimated by the neural decision circuit. In the spiking MSPRT model, only the ratio λ+ (rather than λ+ and λ individually) influences the gain parameter g (see equation 5.3). Hence if we follow the approach of Beck et al. and assume that the ratio λ+ is known, then the parameter g can always be set appropriately (no matter what the individual values of λ+ and λ are).

### 5.2.  Optimal Decision Procedure for Spiking Inputs from a Tuning Curve.

The previous section assumed that neurons supporting all the incorrect alternatives have the same firing rate. However, in many situations, sensory neurons have different firing rates for different stimuli. For example, in the motion discrimination task described in section 2, the MT neurons selective for rightward motion would have the highest firing rates when a stimulus with coherent rightward motion is presented, but would also have increased activity for stimuli with motion in similar directions, for example, toward the upper-right corner of the screen (Britten et al., 1993; Dubner & Zeki, 1971; Maunsell & Van Essen, 1983a, 1983b; Zeki, 1980). More specifically, the mean firing rate of sensory neurons varies as a function of the stimulus, and this function is called the tuning curve. Figure 4a illustrates the tuning curve of one sensory neuron selective for rightward motion. The neuron has the maximum firing rate λ+ when the coherent motion direction is the neuron's preferred direction (equal to 0 degrees for this neuron); otherwise, the firing rates vary depending on the difference between its preferred direction and the stimulus direction.

Figure 4:

The tuned MSPRT model. (a) Tuning curve describing the mean firing rates of one sensory neuron as a function of the coherent motion direction. The tuning curves were generated from equation 5.4 with parameters: λ+= 68 Hz, λ= 30 Hz (values estimated from Britten et al., 1993), σ = 46.5°, (with values estimated from Snowden, Treue, & Andersen, 1992). The thick line marked with λθ(0) denotes the firing rate of the neurons when coherent direction is its preferred direction θ0 = 0°. (b) Mean DT of the spiking race model, the LCA model, and the tuned MSPRT model is plotted against the number of alternatives. All three models are subjected to the same spiking inputs from tuning curves shown in a, and M = 2. The LCA model was simulated with inhibition and decay set to 10. The mean DT for 90% ± 0.2% accuracy was obtained by the same method applied in Figure 3a. The standard error bars are smaller than the data points.

Figure 4:

The tuned MSPRT model. (a) Tuning curve describing the mean firing rates of one sensory neuron as a function of the coherent motion direction. The tuning curves were generated from equation 5.4 with parameters: λ+= 68 Hz, λ= 30 Hz (values estimated from Britten et al., 1993), σ = 46.5°, (with values estimated from Snowden, Treue, & Andersen, 1992). The thick line marked with λθ(0) denotes the firing rate of the neurons when coherent direction is its preferred direction θ0 = 0°. (b) Mean DT of the spiking race model, the LCA model, and the tuned MSPRT model is plotted against the number of alternatives. All three models are subjected to the same spiking inputs from tuning curves shown in a, and M = 2. The LCA model was simulated with inhibition and decay set to 10. The mean DT for 90% ± 0.2% accuracy was obtained by the same method applied in Figure 3a. The standard error bars are smaller than the data points.

This section extends the analysis of section 5.1 to a more realistic scenario in which the mean firing rates of sensory neurons come from a tuning curve. Let us consider a multiple alternative version of the motion discrimination task in which the dots may move in one of N directions equally distributed around 360 degrees (e.g., Churchland, Kiani, & Shadlen, 2008) and denote the direction of the coherent motion by θ0 (indicating the correct alternative). For a sensory neuron in area MT with preferred direction θ, its tuning curve can be approximately described by a gaussian function (Snowden, Treue, & Andersen, 1992):
5.4
where λθ0) is the mean firing rate of the neuron specified by the tuning function Φ(θ, θ0). λ+ and λ denote the firing rates of the neuron when the coherent direction is the neuron's preferred or null direction, and σ describes the width of the tuning curve. d(θ, θ0) denotes the distance between θ and θ0 and is defined as
5.5
The preferred direction of the neurons supporting alternative i is θi = 360°(i − 1)/N, and hence their mean firing rate for stimulus θ0 is
5.6
For the NAFC task described above, appendix C.2 shows that the MSPRT can be implemented by computing
5.7
5.8
Here Yk(t) is the total number of spikes from the kth population up to time t, which is given by equation 5.1. To implement the MSPRT, alternative i should be selected as soon as any li(t) exceeds a decision threshold.

When the spike trains are generated from tuning curves, the neural integrator selective for alternative i needs to compute equation 5.8, which involves the sum of the spike counts from all the N sensory populations and is weighted by the logarithm of their firing rate if alternative i were correct. Such a computation can be performed if the neural integrators selective for a particular direction receive the strongest input from sensory neurons selective for this direction, and also smaller inputs from sensory neurons preferring similar directions. Exactly the same weighting as in equation 5.8 was proposed by Jazayeri and Movshon (2006), who showed that it also allows optimal representation of sensory information by neural integrators. Note that equation 5.7 describes the same function of the activity of cortical integrators as equation 2.4 that has been proposed to be evaluated in the basal ganglia (Bogacz & Gurney, 2007). Hence a network involving cortical integrators connected with sensory neurons as described above and the model of the basal ganglia (Bogacz & Gurney, 2007) can implement the MSPRT when the spiking inputs are generated from tuning curves. We refer to the model described by equations 5.7 and 5.8 as the tuned MSPRT model.

To illustrate the optimality of the tuned MSPRT model, Figure 4b compares DT (for a fixed accuracy) of the tuned MSPRT model and simpler strategies in which the integrators accumulate the input from only one corresponding population of sensory neurons: the spiking race model and the LCA model. The spiking inputs of all the models are generated from the tuning curve defined in equation 5.4. All three models produce longer DT as N increases, as in Figure 3a. However, Figure 4b shows that for inputs generated from the tuning curve, the spiking race and the LCA models produce much longer DT for N ⩾ 8. This is due to the fact that for large N, the difference between the populations’ preferred directions θi are small (as the N directions are equally distributed around 360 degrees) so that the firing rates of the adjacent populations are similar, which makes it more difficult to discriminate the population with the highest mean firing rate (i.e., the correct alternative) and results in long DT. Importantly, for large N, the tuned MSPRT model achieves DTs several times shorter than those of the simpler models.

In the tuned MSPRT model, λ+ and λ define the range of firing rates of the MT neurons for a given task difficulty. For example, in a difficult task, λ+ and λ will be close to each other, and according to equation 5.4, the MT neurons will have similar firing rates for different directions of motion. The weights of connections between MT and integrator neurons in equation 5.8 depend on λ+ and λ via equations 5.4 and 5.6, which suggests that the model requires the task difficulty to be known a priori. However, as for the spiking MSPRT model, following the method of Beck et al. (2008), appendix D shows that the task difficulty does not need to be known in the tuned MSPRT model when the ratio λ+ is assumed to be constant.

## 6.  Can We Approximate Sensory Input by Gaussian Distribution?

Many experimental studies have reported that the variance in firing rate of sensory neurons is proportional to its mean (Britten et al., 1993; Shadlen & Newsome, 1998; Tolhurst, Movshon, & Dean, 1983). Consequently, several previous studies have used gaussian inputs with a similar variance-mean relationship in simulations of decision processes (Bogacz et al., 2007; Diederich & Busemeyer, 2006; Niwa & Ditterich, 2008), but we show that such types of inputs may lead to spurious conclusions from the simulations.

Appendix A shows that for gaussian inputs with variance proportional to the mean, the SPRT cannot be implemented by linear integration of evidence. Instead the SPRT is implemented for such inputs by a procedure in which the difference of squares of sensory evidence is integrated until it exceeds a positive or a negative threshold. This procedure is unusual from a biological computation point of view and has not yet been supported by either physiological or behavioral data.

To illustrate the problems that may occur in the simulations using gaussian inputs with variance proportional to the mean, we simulate two decision procedures for 2AFC tasks with such inputs: one involving the integration of the difference between inputs and a second involving the integration of the difference between the squares of inputs. In the simulation, the inputs were accumulated in integrators in time steps of δ. Hence, at every time step, the evidence supporting the first alternative was generated from a gaussian distribution with mean and variance μ1δ, while the evidence supporting the second alternative was generated from a gaussian distribution with mean and variance μ2δ.

Figure 5 illustrates the mean DT at a certain accuracy for the two procedures with different δ. For δ above 0.2 s, both procedures perform equally, but note that this value of δ is close to DT, so on the majority of simulated trials, the threshold is reached after a single integration step. When the decision is made on the basis of a single sample from each sensory input, the squaring of the sensory inputs does not change the outcome of decision, and therefore the two procedures perform equally. By contrast, for smaller δ, the integration of squares is superior. Furthermore, as δ decreases, DT of the procedure involving the integration of squares converges to 0, which is not biologically realistic because it implies that a decision with a high accuracy may be made after an arbitrary short time on the basis of noisy inputs.

Figure 5:

The mean DT of decision procedures with gaussian inputs with variance proportional to the mean. The models were simulated with parameter μ1 = 55 Hz, μ2 = 37 Hz, and the time step δ varies from 0.0001 s to 0.5 s. The thresholds of the models were found numerically that gives 90% ± 0.2% accuracy.

Figure 5:

The mean DT of decision procedures with gaussian inputs with variance proportional to the mean. The models were simulated with parameter μ1 = 55 Hz, μ2 = 37 Hz, and the time step δ varies from 0.0001 s to 0.5 s. The thresholds of the models were found numerically that gives 90% ± 0.2% accuracy.

Let us provide an intuition for this surprising numerical phenomenon. When the variance is proportional to the mean, each sample carries a significant amount of information even if δ decreases to 0. For example, if the evidence supporting the first alternative is particularly high or low in a given step, it indicates that the first alternative has higher variance, and hence is the correct one. This information is extracted by the procedure integrating squares. Since DT is equal to the product of the number of samples and δ and each sample carries significant information, the DT required to gain a desired accuracy decreases to 0 as δ decreases to 0.

If one simulates a nonlinear decision network sharing properties with the model integrating squares of inputs (e.g., in which at each step, the inputs with the higher values have larger influence over the final choice), then one may underestimate DT (for fixed accuracy) if one uses the gaussian inputs with variance proportional to the mean.1

For spiking inputs or for gaussian inputs with constant variance across alternatives, DT (for fixed accuracy) little depends on δ. Paradoxically, for the gaussian inputs with variance proportional to the mean, which seem to be closer to biology than gaussian inputs with constant variance, the DT may strongly depend on δ, which may produce unrealistic results of simulations.

For Poisson spike trains, the implementation of SPRT is to integrate the difference between the spike counts. On the other hand, for gaussian inputs with variance proportional to the mean, the implementation of SPRT would be to integrate the differences between the squares of sensory evidence. How can the two distinct procedures both be optimal, given the fact that a Poisson distribution can be approximated by a gaussian distribution for sufficiently large values of the mean? To investigate this question, Figure 6 shows histograms of three types of sensory evidence from one time step δ = 0.01 s: the difference of two Poisson spike counts, the difference of two gaussian inputs, and the difference of the squares of two gaussian inputs. For large population size (M = 100 in Figure 6), the distribution of Poisson inputs approaches a gaussian distribution as expected. In addition, for large M, the difference of the squares of gaussian inputs has a similar (but scaled) distribution to the difference of the gaussian inputs. Therefore, the two SPRT procedures for the gaussian inputs and the Poisson inputs converge for large M.

Figure 6:

The histograms of (a) the difference of Poisson inputs, (b) the difference of gaussian inputs, and (c) the difference of the squares of gaussian inputs. The models were simulated with the time step δ = 0.01 s and M = 100. The means of the two alternatives are 55 Hz and 45 Hz, and the variances of the two alternatives are equal to the means. For each model, the histogram was obtained from 1 million trials.

Figure 6:

The histograms of (a) the difference of Poisson inputs, (b) the difference of gaussian inputs, and (c) the difference of the squares of gaussian inputs. The models were simulated with the time step δ = 0.01 s and M = 100. The means of the two alternatives are 55 Hz and 45 Hz, and the variances of the two alternatives are equal to the means. For each model, the histogram was obtained from 1 million trials.

## 7.  Conclusion

### 7.1.  Summary.

This study has considered the decision problem under an assumption that the sensory evidence is represented by discrete Poisson spike trains. We have shown that for two alternatives, the SPRT can be implemented by a simple integration strategy as for gaussian inputs. For multiple alternatives, we have shown that the MSPRT can also be implemented by the same network involving cortical integrators and the basal ganglia model, as for gaussian inputs. Finally, if the firing rates of sensory neurons come from tuning curves, we have shown that each neural integrator in the cortex needs to receive the sum of the spike trains from all sensory neurons, weighted by the similarity between the preferred directions of the integrator and given sensory populations.

### 7.2.  Variability of Difficulty Between Trials.

As mentioned in section 2.2, all previous work on neural implementations of the SPRT and the MSPRT for simplicity assumed that the task difficulty remains constant between trials, and the same simplifying assumption is made throughout this article. In many experimental studies of decision making, the choice difficulty was indeed constant between trials (e.g., Forstmann et al., 2008; Simen et al., in press; Bogacz, Hu et al., in press). However, in many other experimental paradigms, the choice difficulty varied between trials (e.g., Roitman & Shadlen, 2002).

It is interesting to point out that although we showed in sections 5.1 and 5.2 that the task difficulty does not need to be known for the models described in this article to work, the models do not implement the SPRT or the MSPRT for tasks with difficulty varying across trials. This is because the statistical hypotheses about the probability distributions of sensory inputs in each of the models (see equations B.1and C.1 assume that λ+ and λ are constant. To derive the procedures maximizing the reward rate for tasks with difficulty varying across trials, one would have to define more general hypotheses. This can be one of the directions of future work.

### 7.3.  Future Work.

One extension of this study is to consider more realistic spike train statistics. In this article, we assumed that the spike trains of sensory neurons are generated from the Poisson processes, in which the mean spike count is equal to its variance. However, data from in vivo experiments suggested that the variance-mean ratio of the spike trains from cortical neurons is slightly larger than 1 (1–1.5) (Shadlen & Newsome, 1998). This larger variance relative to the mean may arise from synaptic background activity from other cortical areas (Softky & Koch, 1993; Wolfart, Debay, Le Masson, Destexhe, & Bal, 2005), or it may also be caused by the intrinsic irregularity of neurons. One may consider a more general class of stochastic processes (e.g., the renewal processes), whose variance-mean ratio is a constant (Smith, 1959). Another extension would be to consider the correlation between firing rate of sensory neurons (Zohary et al., 1994).

## Appendix A:  SPRT for Gaussian Inputs

This appendix derives the procedures implementing the SPRT for the assumption that the evidence xi(t) supporting alternative i comes from gaussian distribution with mean μi and standard deviation σi, which we denote . Initially we loosen the assumptions of section 2.2, as we do not assume that σ1 = σ2. After deriving the procedure for these generalized assumptions, we will consider two cases: (1) σ1 = σ2, that is, both integrators receive equally noisy evidence, as in section 2.2 and (2) σ2i ∼ μi, the variance in firing rate of sensory neurons is proportional to its mean, as reported by Shadlen and Newsome (1998).

Let us denote the mean and the standard deviation of the evidence supporting the correct alternative by μ+ and σ+, and by μ and σ for the incorrect alternative. As mentioned in section 2.3, the process of decision making can be defined as distinguishing between two hypotheses H1 and H2, corresponding to the first and second alternatives being correct:
A.1
A.2
We now derive the SPRT (see equation 2.2) for the hypotheses defined above. By taking the logarithm of equation 2.2, we infer that the sampling should continue as long as the difference between the logs of the conditional probabilities remains between the (new) decision thresholds:
A.3
Let us calculate the log of the conditional probability of input(t) given that the first alternative is correct:
A.4
In the above equation, fμ,σ denotes gaussian probability density function with mean μ and standard deviation σ, hence:
A.5
Analogously:
A.6
The difference of the logs of the probabilities becomes
A.7
Let us now consider the two sets of assumptions stated in the beginning of the appendix. When we assume (1) that both integrators receive equally noisy evidence, σ+ = σ = σ, then the first term in the square bracket in the last line of the above equation cancels and it simplifies to:
A.8
Thus, for assumption 1, the SPRT is implemented by a simple procedure of integrating the difference of sensory evidence until it exceeds a positive or negative threshold (Gold & Shadlen, 2001, 2002; Laming, 1968).
Surprisingly, when we assume (2) that the variance in firing rate of sensory neurons is proportional to its mean, σ2i ∼ μi, then the second term in the square brackets in the last line of equation A.7cancels, and it simplifies to
A.9
Thus for assumption 2, the SPRT is implemented by the procedure in which the difference of squares of sensory evidence is integrated until it exceeds a positive or negative threshold.

## Appendix B:  The Spiking SPRT Model

This appendix describes the optimal decision procedure for 2AFC tasks when the sensory evidence is carried by the Poisson spike trains. Appendix B.1 derives the procedures implementing the SPRT. Appendixes B.2 and B.3 calculate the accuracy and mean DT of the spiking SPRT model. The calculation in appendixes B.2 and B.3 is analogous to the derivation for gaussian inputs by Shadlen, Hanks, Churchland, Kiani, and Yang (2006).

### B.1>  The SPRT for Spiking Inputs.

For a 2AFC task described in section 3.1, let us assume that the spike trains of the neurons supporting the two alternatives are generated from independent Poisson processes Y1, j(t) and Y2, j(t), with intensities λ1>0 and λ2>0, respectively. Let two random variables Y1(t) and Y2(t) be the total number of incoming spikes from the M neurons in the two populations up to time t. According to the property of the Poisson process, they follow the Poisson distribution with associated parameters λ1Mt and λ2Mt.

Let us first denote the higher of the two mean firing rates of sensory neurons by λ+ = max(λ1, λ2), and denote the firing rate of the neurons in the other population by λ = min(λ1, λ2), and hence λ+>0. The decision problem can be defined as distinguishing between two hypotheses H1 and H2:
B.1
Denote the entirety of sensory evidence available up to time t by input(t) = {Y1(t), Y2(t)}. Follow the same calculation as in appendix A, the conditional probability of input(t) given the H1 being correct is given by (cf. equation A.4):
B.2
where fλ denotes the Poisson probability density function with mean and variance λ:
B.3
Substituting equation B.3in B.2 we obtain:
B.4
where C is a constant independent to the hypothesis. Analogously:
B.5
Hence the difference of the logs of the probabilities becomes as
B.6
Note that the first term of the right side of equation B.6does not depend on the hypothesis. The log ratio and the threshold in equation A.3 can be transformed to
B.7
Hence the SPRT with spiking inputs can be implemented as
B.8

### B.2.  The Probability of a Correct Decision.

We divide the decision time into discrete intervals with tiny length δ>0. Let Xk1 and Xk2 be the total number of incoming spikes from the M neurons in the two populations during the interval [(k − 1)δ, kδ]. Xk1 and Xk2 follow the Poisson distributions with associated parameters λ1Mδ and λ2Mδ. Let us denote the difference between Xk1 and Xk2 by Xk (Xk = Xk1Xk2), and denote the difference between the total spike counts between the two sensory populations during [0, kδ] by Yk. Note that Yk is a discrete version of the decision variable of the SPRT procedure (see equation B.8 and can be calculated by taking the sum of Xk at each time interval.

To calculate the accuracy of the spiking SPRT model, it is necessary to obtain the moment generating function (MGF) of Xk. The MGF of a random variable υ is defined by (Kenney & Keeping, 1951):
B.9
where E denotes the expectation and θ is the argument of the MGF M(θ). For Xk1 and Xk2 which follow the Poisson distribution, we have (Papoulis, 1986):
B.10
B.11
Since Xk is a sequence of independent and identically distributed random variables, we can drop the superscript, and the MGF of Xk is
B.12
We assume after n time steps the decision variable {Yk} exactly hits one of the thresholds ±z (i.e., no overshoot), and denote the terminal state by Yn. We also assume {Yk} hits z with probability P1 (i.e., the hypothesis H1 is selected), while {Yk} hits −z with probability P2 = 1 − P1. Since Yn can be equal to only one of the two threshold values, the mean of Yn is
B.13
Yn also has its MGF:
B.14
For a sequential procedure that computes the decision variable {Yk} by taking the sum of the random variables Xk (i.e., the SPRT), it must satisfy Wald's fundamental identity (Wald, 1947):
B.15
Equation B.15must be satisfied for all values of θ. We seek θ that satisfies MX(θ) = 1 (because for this value , and hence we will be able to solve equation B.14for P1). Hence from equation B.12 we need to solve
B.16
and the solutions are
B.17
As there exists θ = θ1 that makes MX(θ) = 1, substitute θ1 from equation B.17in B.15 and we have:
B.18
Relating equations B.14and B.18 we have a function of P1:
B.19
Solving equation B.19 we obtain the solution for P1:
B.20
Recall the initial hypothesis: if λ12, then λ1 = λ+, λ2 = λ, and the probability of making a correct decision (i.e., select H1) is
B.21
Similarly, if λ1 < λ2, then the correct alternative is H2, and λ1 = λ, λ2 = λ+. The probability of making a correct decision is
B.22
Therefore the accuracy of the spiking SPRT model is independent of which alternative is actually correct and is given by equation B.21

### B.3.  The Expected Decision Time.

The mean of DT is proportional to the expectation of the number of samples n when {Yk} reaches the terminal state Yn = ±z. To calculate E[n], we again start from Wald's identity:
B.23
We first differentiate equation B.23with respect to θ, and obtain
B.24
Let θ = θ0 = 0; then MX(θ) = 1 and MX(θ) = E[X] = λ1Mδ − λ2Mδ. Substituting them in equation B.24 we have
B.25
and hence:
B.26
When the decision process terminates at step n, Yk = Yn. Substituting equation B.13 into B.26 we have
B.27
Then substituting the expression for P1 from equation B.20in B.27 we have
B.28
If λ12, H1 is true and λ1 = λ+, λ2 = λ. The mean DT is
B.29
Similarly, if λ1 < λ2, λ1 = λ, and λ2 = λ+, then
B.30
Hence the mean number of observation n is given by equation B.29 regardless of which alternative is actually correct, and the mean DT is
B.31

## Appendix C:  The Spiking and Tuned MSPRT Models

This appendix derives the asymptotically optimal decision procedures (the MSPRT) for NAFC tasks when the sensory evidence is carried by the Poisson spike trains. Appendix C.1 considers a general case that the populations supporting all incorrect alternatives have the same mean firing rates. In appendix C.2 we assume that the mean firing rates of sensory populations are generated from tuning curves. The calculation is analogous to the derivation of the MSPRT for gaussian inputs by Bogacz and Gurney (2007).

### C.1.  The MSPRT for Spiking Inputs.

For the general NAFC task described in section 5.1, the decision task is to identify which sensory population has the higher mean firing rate λ+ (i.e., the true alternative), given that all other populations have the same mean firing rate λ+). The process of decision making can be defined as distinguishing between N hypotheses Hi (1 ⩽ iN) corresponding to the ith alternative being correct:
C.1
The asymptotically optimal decision procedure for such a decision task is given by the MSPRT (Dragalin et al., 1999). The MSPRT is to compute the conditional probability pi(t) = P(Hi | input(t)) and selects the alternative i as soon as pi(t) exceeds a decision threshold (Dragalin et al., 1999, and cf. Bogacz & Gurney, 2007), as described in equation 2.3. If there is no prior knowledge about the decision task, all hypotheses have the same prior probability P(Hi) = 1/N. Then equation 2.3 can be simplified to (Bogacz & Gurney, 2007)
C.2
and the logarithm of pi(t) is
C.3
The conditional probability of input(t) given Hi being correct is given by
C.4
where fλ denotes the Poisson probability density function with mean and variance λ, given in equation B.3 Substituting equation B.3 in C.4 we obtain
C.5
The last two terms on the right-hand side of equation C.5do not depend on i, and hence denoting their sum by C, we have
C.6
where
C.7
Substituting equation C.7in C.3 we obtain the procedure of the spiking MSPRT model:
C.8

### C.2.  The MSPRT for Spiking Inputs from Tuning Curve.

In this appendix we derive the procedure implementing the MSPRT when the firing rates of sensory neurons are generated from tuning curves. For the multiple alternative task described in section 5.2, the coherent direction of the stimulus θ0 is one of the N directions that are equally distributed around 360 degrees. Hence, the N hypotheses Hi (1 ⩽ iN) of the decision task are
C.9
As described in section 5.2, for a given coherent direction θ0 = θi, the neurons selective for the direction θl (1 ⩽ lN) have the firing rate , which is specified by the tuning curve defined in equation 5.4. Hence the hypotheses can be represented as
C.10
and the decision task is to distinguish between the N hypothesis on the basis of the Poisson spike trains. Let us denote the entirety of sensory evidence available up to time t by input(t) = {Yl(t): 1 ⩽ lN}, where Yl has the same deftinition as in appendix B.1. Similar to the calculation in appendix C.1, the MSPRT requires us to compute the logarithm of the conditional probability of input(t) given Hi being correct (cf. Jazayeri & Movshon, 2006):
C.11
Note that the last three terms on the right-hand side of equation C.11do not depend on the hypotheses, and hence denoting their sum by C, we have:
C.12
Following the same calculation as in appendix C.1, the procedure implementing the MSPRT with spiking inputs from tuning curves is to compute
C.13
and make a choice as soon as any log pi(t) exceeds the decision threshold.

## Appendix D:  Optimal Decision Making Without Knowledge of Task Difficulty

This appendix considers the problem of whether the task difficulty has to be known a priori in the spiking and tuned MSPRT models. Following the notation in appendix B.2, let Xki be the numbers of incoming spikes from one of M neurons in the ith population during the interval [(k − 1)δ, kδ], and let vector Xk (Xk = {Xk1, Xk2, …, XkN}) be the total number of spikes from all sensory neurons during the kth interval.

Beck et al. (2008) have shown that a decision process involving linear integration of sensory evidence will be independent of task difficulty if the conditional probability of the firing of sensory neurons given alternative i being correct can be decomposed into the following product (see equation 4 in Beck et al., 2008),
D.1
where c is a collection of nuisance parameters (e.g., motion coherence, population size). Φ is a function that does not depend on alternative i being considered, and h is a function that depends on i, but not on parameters related to task difficulty (i.e., λ and λ+).
In the spiking MSPRT model, the probability of observing Xk in the kth interval given is equal to
D.2
We now show under what assumptions equation D.2can be decomposed according to equation D.1 Algebraic manipulations on D.2 give
D.3
Note that the terms in the braces do not depend on i, so they can form function Φ in equation D.1. The last term of equation D.3can be decomposed as in equation D.1, only if one assumes that λ+ is constant (denoted by η) and does not vary with task difficulty. Then the function h can be defined as
D.4
In the tuned MSPRT model, the probability of observing Xk in the kth interval is equal to
D.5
According to equations 5.4 and 5.5, can be written as
D.6
where F(i,j) is a function of i and j describing the shape of the tuning curve, independently from its maximum and minimum values (that depend on λ+ and λ). Let us transform the right term of equation D.5
D.7
Substituting equation D.7in D.5 we have
D.8
If we keep the assumption that the tuning curves of sensory neurons have identical shape (similar to the assumption made by Ma, Beck, Latham, & Pouget, 2006), is a constant independent of i, and then the terms in the braces in equation D.8can form function Φ in equation D.2 Similarly to the spiking MSPRT model in equation D.3 when λ+ is a constant, equation D.8can be decomposed according to D.1and hence the tuned MSPRT model does not require the task difficulty to be known a priori.

## Acknowledgments

This work was supported by EPSRC grant EP/C514416/1. We thank Angela Onslow for reading an earlier version of the manuscript and very useful comments.

## Notes

1

Indeed, in our earlier study (Bogacz et al., 2007) we made such a simulation, but recently we repeated it for spiking inputs (Bogacz, Usher, Zhang, & McClelland, in press) and obtained a qualitatively different pattern of results.

## References

References
Barnard
,
G. A.
(
1946
).
Sequential tests in industrial statistics
.
Journal of the Royal Statistical Society
,
8
(
1
),
1
26
.
Baum
,
C.
, &
Veeravalli
,
V.
(
1994
).
A sequential procedure for multihypothesis testing
.
IEEE Transactions on Information Theory
,
40
,
1994
2007
.
Beck
,
J.
,
Ma
,
W. J.
,
Kiani
,
R.
,
Hanks
,
T.
,
Churchland
,
A. K.
,
Roitman
,
J.
, et al. (
2008
).
Probabilistic population codes for Bayesian decision making
.
Neuron
,
60
,
1142
1152
.
Bogacz
,
R.
(
2009
).
Optimal decision-making theories
. In
J.-C. Dreher & L. Tremblay
(Eds.),
Handbook of reward and decision making
(pp.
375
397
).
Orlando, FL
:
.
Bogacz
,
R.
,
Brown
,
E.
,
Moehlis
,
J.
,
Hu
,
P.
,
Holmes
,
P.
, &
Cohen
,
J.
(
2006
).
The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced choice
.
Psychological Review
,
113
,
700
765
.
Bogacz
,
R.
, &
Gurney
,
K.
(
2007
).
The basal ganglia and cortex implement optimal decision making between alternative actions
.
Neural Computation
,
19
,
442
477
.
Bogacz
,
R.
,
Hu
,
P.
,
Holmes
,
P.
, &
Cohen
,
J.
(in press).
Do humans produce the speed-accuracy tradeoff that maximizes reward rate?
Quarterly Journal of Experimental Psychology
.
Bogacz
,
R.
,
Usher
,
M.
,
Zhang
,
J.
, &
McClelland
,
J.
(
2007
).
Extending a biologically inspired model of choice: Multi-alternatives, nonlinearity and value-based multidimensional choice
.
Philosophical Transactions of the Royal Society, Series B
,
362
,
1655
1670
.
Bogacz
,
R.
,
Usher
,
M.
,
Zhang
,
J.
, &
McClelland
,
J.
(in press).
Extending a biologically inspired model of choice: Multi-alternatives, nonlinearity and value-based multidimensional choice
. In
A. Seth, T. Prescott, & J. Bryson
(Eds.),
Modelling natural action selection
.
Cambridge
:
Cambridge University Press
.
Britten
,
K.
,
,
M.
,
Newsome
,
W.
, &
Movshon
,
J.
(
1992
).
The analysis of visual motion: A comparison of neuronal and psychophysical performance
.
Journal of Neuroscience
,
12
,
4745
4765
.
Britten
,
K.
,
,
M.
,
Newsome
,
W.
, &
Movshon
,
J.
(
1993
).
Responses of neurons in macaque MT to stochastic motion signals
.
Visual Neuroscience
,
10
,
1157
1169
.
Brown
,
E.
,
Gao
,
J.
,
Holmes
,
P.
,
Bogacz
,
R.
,
Gilzenrat
,
M.
, &
Cohen
,
J. D.
(
2005
).
Simple neural networks that optimize decisions
.
International Journal of Bifurcation and Chaos
,
15
(
3
),
803
826
.
Churchland
,
A. K.
,
Kiani
,
R.
, &
,
M. N.
(
2008
).
Decision making with multiple alternatives
.
Nature Neuroscience
,
11
,
693
702
.
Diederich
,
A.
, &
Busemeyer
,
J.
(
2006
).
Modeling the effects of payoff on response bias in a perceptual discrimination task: Bound-change, drift-rate-change, or two-stage-processing hypothesis
.
Perception and Psychophysics
,
68
(
2
),
194
207
.
Dragalin
,
V. P.
,
Tartakovsky
,
A.
, &
Veeravalli
,
V. V.
(
1999
).
Multihypothesis sequential probability ratio tests. I. asymptoticoptimality
.
Information Theory, IEEE Transactions on Information Theory
,
45
(
7
),
2448
2461
.
Dubner
,
R.
, &
Zeki
,
Z.
(
1971
).
Response properties and receptive fields of cells in an anatomically defined region of the superior temporal sulcus in the monkey
.
Brain Research
,
35
(
2
),
528
532
.
Forstmann
,
B. U.
,
Dutilh
,
G.
,
Brown
,
S.
,
Neumann
,
J.
,
von Cramon
,
D. Y.
,
Ridderinkhof
,
K. R.
, et al. (
2008
).
Striatum and pre-SMA facilitate decision-making under time pressure
.
,
105
(
45
),
17538
17542
.
Gold
,
J.
, &
,
M.
(
2001
).
Neural computations that underlie decisions about sensory stimuli
.
Trends in Cognitive Sciences
,
5
,
10
16
.
Gold
,
J.
, &
,
M.
(
2002
).
Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions and reward
.
Neuron
,
36
,
299
308
.
Gold
,
J.
, &
,
M.
(
2007
).
The neural basis of decision making
.
Annual Review of Neuroscience
,
30
,
535
574
.
Hallworth
,
N.
,
Wilson
,
C.
, &
Bevan
,
M.
(
2003
).
Apamin-sensitive small conductance calcium-activated potassium channels, through their selective coupling to voltagegated calcium channels, are critical determinants of the precision, pace, and pattern of action potential generation in rat subthalamic nucleus neurons in vitro
.
Journal of Neuroscience
,
23
,
7525
7542
.
Jazayeri
,
M.
, &
Movshon
,
J. A.
(
2006
).
Optimal representation of sensory information by neural populations
.
Nature Neuroscience
,
9
,
690
696
.
Kenney
,
J. F.
, &
Keeping
,
E. S.
(
1951
).
Mathematics of statistics
(2nd ed.).
Princeton, NJ
:
Van Nostrand
.
Kiani
,
R.
,
Hanks
,
T.
, &
,
M.
(
2008
).
Bounded integration in parietal cortex underlies decisions even when viewing duration is dictated by the environment
.
Journal of Neuroscience
,
28
(
12
),
3017
3029
.
Kim
,
J. N.
, &
,
M. N.
(
1999
).
Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaque
.
Nature Neuroscience
,
2
,
176
185
.
Laming
,
D.
(
1968
).
Information theory of choice reaction time.
New York
:
Wiley
.
Luce
,
R.
(
1986
).
Response times and their role in inferring elementary mental organization.
New York
:
Oxford University Press
.
Ma
,
W. J.
,
Beck
,
J. M.
,
Latham
,
P. E.
, &
Pouget
,
A.
(
2006
).
Bayesian inference with probabilistic population codes
.
Nature Neuroscience
,
9
,
1432
1438
.
Maunsell
,
J.
, &
Van Essen
,
D. C.
(
1983a
).
Functional properties of neurons in middle temporal visual area of the macaque monkey. I. Selectivity for stimulus direction, speed, and orientation
.
Journal of Neurophysiology
,
49
(
5
),
1127
1147
.
Maunsell
,
J.
, &
Van Essen
,
D. C.
(
1983b
).
Functional properties of neurons in middle temporal visual area of the macaque monkey. II. Binocular interactions and sensitivity to binocular disparity
.
Journal of Neurophysiology
,
49
(
5
),
1148
1167
.
Mazurek
,
M.
,
Roitman
,
J.
,
Ditterich
,
J.
, &
,
M.
(
2003
).
A role for neural integrators in perceptual decision making
.
Cerebral Cortex
,
13
,
1257
1269
.
McMillan
,
T.
, &
Holmes
,
P.
(
2006
).
The dynamics of choice among multiple alternatives
.
Journal of Mathematical Psychology
,
50
,
30
57
.
Nambu
,
A.
, &
Llinas
,
R.
(
1994
).
Electrophysiology of globus pallidus neurons in vitro
.
Journal of Neurophysiology
,
72
,
1127
1139
.
Newsome
,
W. T.
, &
Paré
,
E. B.
(
1988
).
A selective impairment of motion perception following lesions of the middle temporal visual area (MT)
.
Journal of Neuroscience
,
8
,
2201
22711
.
Niwa
,
M.
, &
Ditterich
,
J.
(
2008
).
Perceptual decisions between multiple directions of visual motion
.
Journal of Neuroscience
,
28
(
17
),
4435
4445
.
Papoulis
,
A.
(
1986
).
Probability, random variables, and stochastic processes
(2nd ed.).
New York
:
McGraw-Hill
.
Pike
,
A. R.
(
1966
).
Stochastic models of choice behaviour: Response probabilities and latencies of finite Markov chain systems
.
British Journal of Mathematical and Statistical Psychology
,
19
(
1
),
15
32
.
Ratcliff
,
R.
(
1978
).
A theory of memory retrieval
.
Psychological Review
,
95
,
238
255
.
Ratcliff
,
R.
(
1988
).
Continuous versus discrete information processing: Modeling accumulation of partial information
.
Psychological Review
,
95
,
238
255
.
Ratcliff
,
R.
, &
McKoon
,
G.
(
2008
).
The diffusion decision model: Theory and data for two-choice decision tasks
.
Neural computation
,
20
(
4
),
873
922
.
Ratcliff
,
R.
, &
Smith
,
P. L.
(
2004
).
A comparison of sequential sampling models for two-choice reaction time
.
Psychology Review
,
111
,
333
367
.
Roitman
,
J. D.
, &
,
M. N.
(
2002
).
Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task
.
Journal of Neuroscience
,
22
,
9475
9489
.
Schall
,
J.
(
2001
).
Neural basis of deciding, choosing and acting
.
Nature Reviews Neuroscience
,
2
,
33
42
.
Schall
,
J.
(
2002
).
The neural selection and control of saccades by the frontal eye field
.
Philosophical Transactions of the Royal Society B: Biological Sciences
,
357
,
1073
1082
.
,
M.
,
Hanks
,
T.
,
Churchland
,
A.
,
Kiani
,
R.
, &
Yang
,
T.
(
2006
).
The speed and accuracy of a simple perceptual decision: A mathematical primer
. In
K. Doya, S. Ishii, R. Rao, & A. Pouget
(Eds.),
Bayesian brain: Probabilistic approaches to neural coding.
Cambridge, MA
:
MIT Press
.
,
M. N.
, &
Newsome
,
W. T.
(
1998
).
The variable discharge of cortical neurons: Implications for connectivity, computation, and information coding
.
Journal of Neuroscience
,
18
,
3870
3896
.
,
M. N.
, &
Newsome
,
W. T.
(
2001
).
Neural basis of a perceptual decision in the parietal cortex (area lip) of the rhesus monkey
.
Journal of Neurophysiology
,
86
,
1916
1936
.
Simen
,
P.
,
Cohen
,
J.
, &
Holmes
,
P.
(
2006
).
Rapid decision threshold modulation by reward rate in a neural network
.
Neural Networks
,
19
,
1013
1026
.
Simen
,
P.
,
Contreras
,
D.
,
Buck
,
C.
,
Hu
,
P.
,
Holmes
,
P.
, &
Cohen
,
J.
(in press).
Reward rate optimization in two-alternative decision making: Empirical tests of theoretical predictions
.
Journal of Experimental Psychology: Human Perception and Performance
.
Smith
,
P. L.
, &
Van Zandt
,
T.
(
2000
).
Time-dependent Poisson counter models of response latency in simple judgment
.
British Journal of Mathematical and Statistical Psychology
,
53
(
2
),
293
315
.
Smith
,
W. L.
(
1959
).
On the cumulants of renewal processes
.
Biometrika
,
46
(
1
),
1
29
.
Snowden
,
R. J.
,
Treue
,
S.
, &
Andersen
,
R. A.
(
1992
).
The response of neurons in areas V1 and MT of the alert rhesus monkey to moving random dot patterns
.
Experimental Brain Research
,
88
,
389
400
.
Softky
,
W. R.
, &
Koch
,
C.
(
1993
).
The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs
.
Journal of Neuroscience
,
113
,
334
350
.
Tolhurst
,
D. J.
,
Movshon
,
J. A.
, &
Dean
,
A. F.
(
1983
).
The statistical reliability of signals in single neurons in cat and monkey visual cortex
.
Vision Research
,
23
(
8
),
775
785
.
Usher
,
M.
, &
McClelland
,
J. L.
(
2001
).
On the time course of perceptual choice: The leaky competing accumulator model
.
Psychological Review
,
108
,
550
592
.
Vickers
,
D.
(
1970
).
Evidence for an accumulator model of psychophysical discrimination
.
Ergonomics
,
13
(
1
),
37
58
.
Wald
,
A.
(
1947
).
Sequential analysis
.
New York
:
Wiley
.
Wald
,
A.
, &
Wolfwitz
,
J.
(
1948
).
Optimum character of the sequential probability ratio test
.
Annals of Mathematical Statistics
,
19
,
326
339
.
Wilson
,
C.
,
Weyrick
,
A.
,
Terman
,
D.
,
Hallworth
,
N.
, &
Bevan
,
M.
(
2004
).
A model of reverse spike frequency adaptation and repetitive firing of subthalamic nucleus neurons
.
Journal of Neurophysiology
,
91
,
1963
1980
.
Wolfart
,
J.
,
Debay
,
D.
,
Le Masson
,
G.
,
Destexhe
,
A.
, &
Bal
,
T.
(
2005
).
Synaptic background activity controls spike transfer from thalamus to cortex
.
Nature Neuroscience
,
8
,
1760
1767
.
Wong
,
K.
, &
Wang
,
X.
(
2006
).
A recurrent network mechanism of time integration in perceptual decisions
.
Journal of Neuroscience
,
26
(
4
),
1314
1328
.
Zeki
,
S. M.
(
1980
).
The response properties of cells in the middle temporal area (area MT) of owl monkey visual cortex
.
Proceedings of the Royal Society of London, Series B
,
207
(
1167
),
239
248
.
Zohary
,
E.
,
,
M.
, &
Newsome
,
W.
(
1994
).
Correlated neuronal discharge rate and its implications for psychophysical performance
.
Nature
,
370
,
140
143
.