## Abstract

Stimulus from the environment that guides behavior and informs decisions is encoded in the firing rates of neural populations. Neurons in the populations, however, do not spike independently: spike events are correlated from cell to cell. To what degree does this apparent redundancy have an impact on the accuracy with which decisions can be made and the computations required to optimally decide? We explore these questions for two illustrative models of correlation among cells. Each model is statistically identical at the level of pairwise correlations but differs in higher-order statistics that describe the simultaneous activity of larger cell groups. We find that the presence of correlations can diminish the performance attained by an ideal decision maker to either a small or large extent, depending on the nature of the higher-order correlations. Moreover, although this optimal performance can in some cases be obtained using the standard integration-to-bound operation, in others it requires a nonlinear computation on incoming spikes. Overall, we conclude that a given level of pairwise correlations, even when restricted to identical neural populations, may not always indicate redundancies that diminish decision-making performance.

## 1. Introduction

Sensory information is often encoded in irregularly spiking neural populations. One well-studied example is given by direction-selective cells in area MT, whose firing rates depend on the degree and direction of coherent motion in the visual field (Britten, Shadlen, Newsome, & Movshon, 1992, 1993; Newsome, Britten, Movshon, & Shadlen, 1989; Salzman, Murasugi, Britten, & Newsome, 1992; Shadlen, Britten, Newsome, & Movshon, 1996). Individual neurons in MT, as in many other brain areas, exhibit noisy and variable spiking (Newsome et al., 1989) and can be modeled by Poisson point processes (Softky & Koch, 1993; Tuckwell, 1989). Moreover, this variable spiking is generally not independent from cell to cell. Returning to our example, a number of studies have measured pairwise correlations in MT during direction discrimination tasks as well as smooth-pursuit eye movements (Huang & Lisberger, 2009; Bair, Zohary, & Newsome, 2001; Zohary, Shadlen, & Newsome, 1994; Cohen & Newsome, 2008). Although this measurement is a subtle endeavor experimentally, a number of studies suggest a value near (Cohen & Kohn, 2011, summarize these observations for a number of brain areas.)

What are the consequences of correlated spike variability for the speed and accuracy of sensory decisions? The role of pairwise correlations in stimulus encoding has been the subject of many prior studies (Salinas & Sejnowski, 2001; Latham & Roudi, 2011; Averbeck, Latham, & Pouget, 2006; Abbott & Dayan, 1999). The results are rich, showing that correlations can have positive, negative, or neutral effects on encoded information. Our study serves to extend this body of work in two ways. First (as done in a different context by Ganmor, Segev, & Schneidman, 2011, and Montani et al., 2009), we contrast the effect of correlations that have the same pairwise level but a different structure at higher orders.

Second (as in Cohen & Newsome, 2008, and Beck et al., 2008), we consider the impact of correlations on decisions that unfold over time by combining a sequence of samples observed over time in the sensory populations. A classic example that we use to describe and motivate our studies is the moving dots direction discrimination task. Here, a fraction of dots in a visual display move coherently in a given direction, while the remainder exhibit random motion; the task is to identify the direction from two possible alternatives. Decisions become increasingly accurate as subjects take (or are given) longer to make the decision.

In analyzing decisions that develop over time, we use a central result from sequential analysis. This is the sequential probability ratio Test (SPRT; Wald & Wolfowitz, 1948; Gold & Shadlen, 2002), which linearly sums the log odds of independent observations from a sampling distribution until a predetermined evidence threshold is reached. The SPRT is the optimal statistical test in that it gives the minimum expected number of samples for a required level of accuracy in deciding among two task alternatives.

We pose two related questions based on the SPRT. First, how does the presence of correlated spiking in the sampled pools affect the speed and accuracy of decisions produced by the SPRT? Our focus is on how the structure of population-wide correlations determines the answer. Second, how does the presence of correlated spiking affect the computations that are necessary to perform the SPRT? This question is intriguing, because the SPRT may be performed using the simple, linear computation of integrating spikes over time and across the populations for a surprisingly broad class of inputs, including independent Poisson spike trains (Zhang & Bogacz, 2010; Bogacz, Brown, Holmes, & Cohen, 2006). Thus, in this setting, optimal decisions can be made by integrator circuits (Bogacz et al., 2006; Goldman, Compte, & Wang, 2009; Cain & Shea-Brown, 2012). Our goal here is to determine whether and when this continues to hold true for correlated neural populations.

We answer these questions for two illustrative models of correlated, Poissonian spiking. We emphasize that the spikes that these models produce are indistinguishable at the level of both single cells and pairs of cells. However, they differ in higher-order correlations in that they can be distinguished only by examining the statistics of three or more neurons. In the first model, correlations are introduced using shared spike events across the entire pool. In this case, optimal inference via the SPRT produces fast and accurate decisions but depends on a nonlinear computation. As a result, the simpler computation of spike integration requires, on average, longer times to reach the same level of accuracy. In contrast, when shared spiking events are more frequent but are common to fewer neurons within a pool, performance under the SPRT is significantly diminished. However, in this case, both SPRT and spike integration perform comparably, so a linear computation can produce decisions that are close to optimal.

## 2. Models of Evidence Accumulation and Encoding

### 2.1. Model Neural Populations and the Decision Task.

*C*via their firing rates and ; here the subscripts indicate, respectively, the “preferred” and “null” populations, which correspond to the motion direction of the visual stimulus versus the alternate direction. In this way, the firing rate of neurons encoding the preferred direction will be higher than the null direction, . Following Wang (2002) (see also Mazurek, Roitman, Ditterich, & Shadlen, 2003; Britten et al., 1993), we model this relationship as linear:

Throughout the text, we present results at *C*=6.4; however, the results do not depend on this particular value of dot motion or its precise relationship to firing rate.

*N*neurons firing spikes via a homogeneous Poisson process, with rate or . We use the notation

*x*(

_{k}*t*) to indicate a spike train. Integrating these processes over a time interval provides two time series of

*N*-dimensional vectors of Poisson random variables; these independent vectors provide the input to the decision making-models. Specifically, for the

*k*th neuron in a pool, on the

*i*th time step, The properties of Poisson processes imply that

*S*is independent from

^{i}_{k}*S*(), that is, for different time steps.

^{j}_{k}*H*

_{1}is accepted. In this study, we consider decision-making tasks at a fixed level of difficulty, so that and do not vary from trial to trial (i.e., this hypothesis test is simple, not composite).

### 2.2. Accumulating Spikes and Evidence Over Time.

*W*. We specify this distribution below; for now, we note that the random walk takes the general form In an unbiased drift-diffusion model of decision making, accumulation continues as long as (Ratcliff, 1978; Gold & Shadlen, 2002; Bogacz et al., 2006). The number of increments necessary to cross one of the two thresholds, multiplied by its duration , defines the decision time; this is a random variable, as it varies from trial to trial. Crossing the threshold corresponding to

_{i}*H*

_{1}is interpreted as a correct trial; the fraction of correct (FC) trials defines the accuracy of the model. Together, the expected (mean) decision time (

*DT*) and accuracy (

*FC*) determine the performance of a decision-making model.

*h*

_{0}of the moment-generating function of the sampling distribution. Importantly, these formulas are exact under the assumption that the final increment in

*E*does not overshoot the threshold, a point we return to below. Given the moment-generating function (MGF) for the sampling distribution, the value

_{n}*s*=

*h*

_{0}is defined by the implicit relationship To intuitively understand why this value is key to computing the DT and FC, we follow Shadlen, Hanks, Churchland, Kiani, and Yang (2006). First recall that the MGF of the sum of

*n*i.i.d. random variables is the MGF of the sampling distribution of that variable to the

*n*th power. In a drift-diffusion model, the process is terminated after a random number of steps. Therefore, it stands to reason that some quantity, based on the sampling distribution, that does not change based on the specific number of samples drawn on a given trial would play a special role. fits that description, by construction. Informally (for more details, see Shadlen et al., 2006), denoting by the random number of steps taken to reach threshold, we have that However, under the assumption of no overshoot, with probabilities

*FC*and 1−

*FC*, respectively, yielding Solving this relationship for the accuracy FC, we arrive at a formula depending on

*h*

_{0}and the diffusion threshold: Similarly, one arrives at which can be rearranged and combined with equation 2.14 to yield a formula for : (see Bruss & Robertson, 1991, and Shadlen et al., 2006, for more details). We notice here that as increases (and assuming

*h*

_{0}<0), both

*FC*and

*RT*will increase—the so-called speed-accuracy trade-off. Because the remaining two parameters

*h*

_{0}and

*E*[

*W*] are computed directly from the increment distribution, we will compare decision-making models by plotting

*DT*versus

*FC*parametrically in . We call these speed-accuracy trade-off plots because they describe, for a given model, how speed and accuracy trade by a modification of the decision threshold.

*E*[

*W*], and the nontrivial root of its moment-generating function

*h*

_{0}. We now return to the definition of the random increments

*W*for the different models we wish to compare. First, in the spike integration (SI) model, increments are constructed by counting the spikes emitted in a window by the preferred pool and subtracting the number emitted by the null pool. This is equivalent to the time evolution of a neural integrator model that receives spikes as impulses with opposite signs from the preferred and null populations. This integrate-to-bound model is an analog of the drift-diffusion model (DDM) with inputs that are not white noise but rather Poisson spikes: (Ratcliff, 1978; Bogacz et al., 2006; Zhang & Bogacz, 2010; Beck et al., 2008; see also Mazurek et al., 2003).

### 2.3. The Case of Independent Neurons.

### 2.4. Correlated Neural Populations: The Additive and Subtractive Models.

We next describe two models for introducing correlations into the Poisson spike trains of each neural population. These models have identical first-order and pairwise statistics but differ in their higher-order correlations; that is, the models cannot be distinguished by examining pairs of neurons alone but require simultaneous observation of larger groups of cells. The distinction between the models is that one features common spiking events that occur across the entire population, and the other only random subsets of neurons. Both models are studied in Kuhn, Aertsen, and Rotter (2003) and Staude, Grün, and Rotter (2010) and rely on shared input from a single correlating process to generate the correlations in each pool. These authors termed the two models SIP and MIP for, respectively, single- and multiple-interaction process; here we use the descriptors *additive* and *subtractive*. In both models, a realization of correlated spike trains that provide the input to the accumulation models is achieved with a common correlating train.

Before describing the models in detail, we note that in this study, these models are statistical approaches chosen to illustrate a range of impacts that correlations can have on decision making (see also Gutnisky & Josíc, 2010, and Niebur, 2007). In contrast, in neurobiological networks, correlated spiking arises through a complex interplay of many mechanisms, including recurrent connectivity and shared feedforward interactions (e.g., Aertsen, Gerstein, Habib, & Palm, 1989; Shadlen & Newsome, 1998; Smith & Kohn, 2008). While beyond the scope of this article, avenues for bridging the gap between statistical and network-based models of correlations in the context of decision making are considered in section 6.

The first case is the additive model, in which the spike train for each neuron is generated as the sum of two homogeneous Poisson point processes. This model might, for example, capture the effect of shared pulses of activity arising from common sensory or modulatory events upstream. The first Poisson train is generated with an overall firing rate of , where is the intended firing rate of the neuron and is the intended pairwise spike count correlation between any two neurons in the pool. The second train, with a rate of , is added to every neuron in the pool and serves as the common source of correlations. An example of this model of spike train generation is depicted in the rastergrams in Figures 1A and 1B; the common spike events are evident as shared spikes across the entire population.

The second case is the subtractive correlations model, in which correlated spikes are generated through random, independent deletions from an original “mother” spike train; we refer to this as the correlating spike train (Kuhn et al., 2003). Correlations with this structure might arise, for example, as a consequence of synaptic failure in connections from a common pool of upstream neurons. In order to achieve an overall firing rate for the pool of spikes per second, with a pairwise correlation between any two individual neurons, the correlating train has a rate of spikes per second. Then, for each neuron in the pool, a spike is included from this train i.i.d. with a probability of . In our model, there is a separate correlating spike train for each of the two independent populations. An example of the subtractive model of spike train generation is depicted in the rastergrams in Figures 1D and 1E.

In summary, the two models include correlated spike events that originate from a single mother train. Although they produce identical correlations among cell pairs, these events are distributed in different ways across the entire population. We note that the results of Zhang and Bogacz (2010) can be seen as a limiting case as of either the additive or subtractive models.

## 3. Subtractive Correlations and Decision-Making Performance

### 3.1. The SPRT Decision-Making Model.

*h*

_{0}and

*E*[

*W*] that define the speed and accuracy of the SPRT (see equations 2.14–2.16) for two pools with subtractive correlations. As this computation is done in continuous time, it is natural to take . Doing so, we find

Comparing these values against those of the independent SPRT given in equations 2.19 and 2.20, we see that the only effect of correlations is a scaling of the expected increment via . In the limit as , this scale factor approaches *N*, which in turn reduces decision time (the scale factor is inversely proportional to *DT* via equation 2.16). However, as , the scale factor itself approaches 1; this agrees with the intuition that as all neurons become perfectly redundant, the performance should resemble that of a single neuron. In fact, the mechanism of the SPRT on a given sample can be seen as inferring the firing rate of the correlating train from a derived vector of noisy random variables. As *N* gets large, then, performance should be limited by performing an SPRT on the correlating mother trains themselves. This is precisely what happens when in equation 3.2: we obtain corresponding to decision making based on mother spikes of rate and .

One consequence of this interpretation is that the particular realization of a spike vector (in a sufficiently small time bin ) carries no evidence about the decision of *H*_{1} versus *H*_{0}, beyond its identity as either the zero vector **0** or not. Of course, this is a consequence of the construction of the subtractive correlations model, as the spike deletions that create the realization of the spike vector have no dependence on the firing rate of the population. Concretely, then, the increments (or decrements) are based solely on whether the vector of spikes in the preferred (or null) pool contains any spikes at all; the actual number of spikes is irrelevant in the SPRT.

It follows that the accumulation process *E _{n}* is a discrete-space random walk, with steps . To see this, note that for sufficiently small , there are only three possibilities for how spikes will be emitted from the two populations. First, both the preferred and null pools could produce no spikes. This event provides no information to distinguish the firing rates of the pools, so the increment is 0. Second, one of the pools could produce a vector of spikes caused by i.i.d. deletions from the mother spike train. If the spiking pool is the preferred one, each possible nonzero spike vector will increment the accumulator by the log of the ratio ; the opposite sign occurs if the null pool spikes. Events in which both pools spike are of higher order in and thus become negligible for small time windows.

The discrete nature of the SPRT effect causes the *FC* curve in Figure 2A to take on only discrete values of accuracy; a small increase in above a multiple of will not improve accuracy because *E _{n}* on the final, threshold-crossing-step will overshoot the threshold. This also explains why some of the

*FC*values at a given do not lie on the theoretical line defined by equation 2.14; that equation is exactly true only in the case of zero overshoot past the threshold. We return to this point later in the main text and also in appendix C.

We next insert the values for *h*_{0} and *E*[*W*] computed above into equations 2.14 and 2.16 and plot the resulting speed-accuracy curves relating *DT* and *FC* parametrically in the threshold (see Figure 2B). (We plot the full *FC* and *RT* functions, although only discrete values of performance along each of the lines are achievable in practice, as indicated by the dots for the case; see the caption). By comparing speed-accuracy curves for different values of ranging from 0 to 0.3, we see our first main result: introducing subtractive correlations within neural populations substantially diminishes the best possible decision performance: that obtained via the SPRT. We next derive the analogous results for the simpler spike integration model.

### 3.2. The Spike Integration Decision-Making Model.

*E*[

*W*]: The nontrivial root of the MGF

*h*

_{0}is found to be the implicit solution of Here we see that correlations affect the performance of the model only through changing

*h*

_{0}, as the expected increment is the same in the independent case (see equation 2.22). Moreover, performance under spike integration is diminished to a degree that is comparable to the performance loss of SPRT. To illustrate this, Figure 3A plots the speed-accuracy trade-off curves from both models of decision making under subtractive correlations, for the same values of . As we must (Wald & Wolfowitz, 1948), we see the optimal character of the SPRT in the fact that at a given level of accuracy, the SPRT requires, on average, fewer samples than spike integration. However, the difference is very slight. This yields our next main result: nearly optimal decisions are produced by the simple operation of linear integration over time for the subtractive correlations model of spike correlations across neural populations.

Having established this, we pause to note a subtlety in our analysis. Figures 3B and 3C show *FC* and *DT* as a function for both simulated data and plots of equations 2.14 and 2.16. The solid lines are the graphs of those equations as written (using the values for *h*_{0} and *E*[*W*] in equations 3.4 and 3.3), and the mismatch between the lines and the data is a consequence of overshoot past the threshold. The broken line is a graph of the same formulas, with a shift in , an offset computed as the sample mean of the overshoot distribution (see Figure 6 as well as the discussion in appendix C; also Ghosh & Sen, 1991; Lee et al., 1994). This correction term helps the *FC* and *DT* equations better approximate the data when there is potential overshoot. Interestingly, however, parametric plots like Figure 3A already take this effect into account.

## 4. Additive Correlations and Decision-Making Performance

### 4.1. The SPRT Decision-Making Model.

*FC*and

*DT*curves, as the window size : Comparing these with equations 2.19 and 2.20, we see that as in the subtractive correlations model, the only difference with the independent case is a scaling factor on the average increment

*E*[

*W*] in equation 4.2. To explain the form of the scale factor, note that the spike vector from each pool is composed of

*N*independent spike trains firing at rate and a single (highly redundant) spike train firing at a rate .

As in the subtractive correlations model, *E _{n}* here also becomes a discrete random walk with increment . This can be seen by noting that for either pool, in a sufficiently small window, only one of two events is possible: no spikes occur at all, or a single spike occurs in one neuron, in one of the two pools. The first case is uninformative about either

*H*

_{1}or

*H*

_{0}. The second case occurs with probability under

*H*

_{1}and under

*H*

_{0}(here, if the spike occurred in the preferred pool, for example). Taking the log ratio, we find our increment is independent of correlations. The resulting decision accuracy (FC) is plotted versus threshold in Figure 4A and is qualitatively similar to the subtractive correlations case, with plateaus following from the discrete nature of

*E*. However, the speed-accuracy trade-off pictured in Figure 4B is very different from that found in the subtractive correlations model.

_{n}In particular, we see our third main result: the impact of additive correlations on optimal (SPRT) decision performance is relatively minor. For example, in the presence of pairwise correlations as strong as , the mean decision time required to reach a typical value of accuracy is increased by only a few milliseconds compared with the independent case, instead of by hundreds of milliseconds for subtractive correlations. Equation 4.2 offers an intuitive explanation for this fact: *E*[*W*] is inversely proportional to *DT* and does not diminish nearly as fast for additive correlations as for subtractive correlations (see equation 3.2).

### 4.2. The Spike Integration Decision-Making Model.

*h*

_{0}, and the expected increment

*E*[

*W*]: By comparing with equation 2.22, we see that as for spike integration in the subtractive case, correlation affects only the value of

*h*

_{0}, not the expected increment. Substituting these values into equations 2.14 and 2.16, we then plot the speed-accuracy trade-off curves for this model under the assumption of no overshoot in Figure 5A. It appears that when decisions are made by spike integration, correlations have a significant impact on performance (black lines), in contrast to the SPRT case (solid gray lines, reproduced from Figure 4B). Overall, the degree of performance loss is comparable to that under subtractive correlations (broken gray lines, reproduced from Figure 3B). This is our fourth main result: for additive correlations, if decisions are made by spike integration instead of the SPRT, correlations have a significant impact on reducing decision performance.

However, the assumption that integrated spikes do not overshoot the decision threshold might seem suspect under the additive model of correlations, as there is a possibility that the threshold crossing step might occur as a result of every neuron in a pool simultaneously spiking at once. In fact, when the number of neurons in the pool is large (as in the cases we consider), additive correlations can indeed cause significant overshooting of thresholds. Importantly, and unlike for subtractive correlations, this effect cannot be compensated by a constant offset in the decision threshold.

Figure 5B demonstrates the consequences for the speed-accuracy trade-off. Here, when the spike integration model is simulated directly, we see a surprising nonmonotonic relationship between *FC* and *DT* in the presence of additive correlations of strength . This violates the usual intuition that accuracy should increase at slower decision speeds. The explanation comes from the fact that as the decision threshold is raised, *DT* correspondingly increases while accuracy suffers, a consequence of not finishing a trial before a (relatively rare) spike in a correlating spike train in one of the two pools causes the accumulator to jump far beyond the threshold.

For large thresholds, the sequential sampling theory of equations 2.14 and 2.16, which assumes no overshoot, accurately approximates the simulated data; however, for low values of , the approximation is poor. In fact, the inset to Figure 5B shows that in this regime, the decision-making performance of the spike integration model is far better described by the theory predicted by the SPRT. The intuition behind this observation is that for short reaction times, there is a small probability of a shared spike that will send the integrator significantly over the threshold. This allows accumulation to occur one spike at a time (for sufficiently small ), where each spike arrives from an independent spike train. As we have seen, the process of integrating independent spikes is equivalent to the SPRT. It is only at longer decision times, when the chances of having integrated a large common spike event are larger, that a significant impact of correlations appears.

Figure 6 provides further evidence for this scenario. Density plots of the distribution of the overshoot (conditioned on crossing the upper threshold) for both additive and subtractive correlations are shown as a function of the decision threshold, with particular overshoot distributions plotted at and 250. For the additive correlations model, a significant fraction of the trials terminate with zero overshoot at low values of (because, for example, large correlating events are relatively rare), implying that many trials underwent optimal accumulation of evidence without experiencing a common, correlating spike event, as discussed above.

Overall, the monotonic dependence of accuracy (FC) on decision time (*DT*) follows from the invariance of the moments of the overshoot distribution relative to changes in the threshold value ; this is particularly true for the first moment (see Appendix C). Figure 7 (Additive) demonstrates that these moments continue to fluctuate over a larger range of , and with larger magnitude, for the additive correlations model. This serves to explain the strange shape of the speed-accuracy trade-off curve pictured in Figure 5B that (unlike the subtractive correlations model) cannot be explained by a constant shift in .

## 5. Nonlinear Computations and Optimal Performance via the SPRT

When the neurons in each pool spike independently, Zhang and Bogacz (2010) demonstrated that linear summation of spikes across the two pools at each time step implements the SPRT. Because the SPRT is optimal in the sense of minimizing *DT* for a prescribed level of *FC*, the conclusion is that linear integration of spikes across pools, and then across time, provides an optimal decision-making strategy. However, is this optimality of linear integration confined to the case of independent activity within the pool?

Above, we showed that when correlations are introduced into this model, it is no longer true that each spike should be given the same “weight,” as in linear integration. Moreover, knowing only the pairwise correlations and firing rates alone does not allow one to write down a rule for the function that should be applied to incoming spikes in order to implement the SPRT, although in these cases, this function takes the form of the difference between the result of a nonlinearity applied to both pools. This dependence on higher-order statistics is demonstrated in Figure 7 by the fact that the nonlinearities for subtractive correlations (panel B) and additive correlations (panel C) take a significantly different form.

*N*) increases.

## 6. Discussion

Correlated spiking among the neurons that encode sensory evidence appears ubiquitous. Such correlations might arise arise from any number of neuroanatomical features, the simplest being overlapping feedforward connectivity, which can cause collective fluctuations across a population (Binder & Powers, 2001; Shadlen & Newsome, 1998; De La Rocha, Doiron, Shea-Brown, Josić, & Reyes, 2007; Mazurek et al., 2003). They can also result from sensory events that have an impact on an entire population or from rapid modulatory effects. Moreover, for large neural populations, it appears that accurate descriptions of population-wide activity can require more than the typically measured pairwise correlations, but higher-order correlations as well (Montani et al., 2009; Ganmor et al., 2011; Yu et al., 2011).

The aim of our study is to improve our understanding of how correlated activity in these populations can affect the speed and accuracy of decisions that require accumulating sensory information over time. Faced with the wide range of possible mechanisms and structures of correlations alluded to above, we chose to focus on two models for population-wide correlations that illustrate a key distinction in how correlations can occur. These models have identical first-order and pairwise statistics, but differ in how each common spiking event involves either a small subset of the neurons (the subtractive case) or each neuron in the pool (the additive correlations case) (Kuhn et al., 2003; Staude et al., 2010).

Figure 9 quantifies this difference. Based on calculations in appendix D, we plot the joint cumulant across *k* neurons in a pool under both subtractive and additive correlations. This statistic, computed over a subset of the neurons in a pool, provides a generalization of the notion of covariance to higher orders. In this way, it can be used to distinguish the collective activity resulting from the additive and subtractive models. While the additive model possesses a constant joint cumulant no matter how many neurons are included, the joint cumulant of *k* neurons falls off geometrically for the subtractive case. We conjecture that this is a statistical signature that could suggest when other, more general patterns of correlated activity—measured experimentally or arising in mechanistic models of neural circuits (Mazurek et al., 2003)—will produce similar effects on decisions. Exploring this conjecture using models and data is a target of our future research.

We summarize our main findings as follows. For both models of correlated spiking, decisions produced by a simple, linear spike integration model (i.e., a neural integrator) become slower and less accurate as correlations increase. However, a strong difference appears for decisions made using the optimal decision strategy (SPRT). Here, additive correlations have only a minor impact on decision performance, while subtractive correlations continue to strongly diminish this performance. The conclusion is that decision-making circuits, faced with subtractive correlated sensory populations, will invariably produce diminished decision performance and stand little to gain by implementing computations more complex than a simple integration of spikes over time and neurons. However, in the presence of additive correlations, circuit mechanisms that implement or approximate the SPRT—perhaps by a nonlinearity such as that shown in Figure 8 applied to the sum of incoming spikes—stand to produce substantially better decision performance than their linear counterparts.

In other contexts, nonlinear computations have also been shown to improve discrimination between two alternatives. Field and Rieke (2002; also see Field, Sampath, & Rieke, 2005) demonstrated the importance of a thresholding nonlinearity in pooling the responses of rod cells, where this nonlinearity served to reject background noise. Closer to the present setting, gating inhibition that prevents accumulation of noise samples before the onset of evidence-encoding stimulus can account for visual search performance (Purcell, Schall, Logan, & Palmeri, 2012), and recent results suggest that related nonlinearities can improve performance for mistuned neural integrators (Cain, Barreiro, Shadlen, & Shea-Brown, 2011; see also Cain & Shea-Brown, 2012).

Our cases in which correlations decrease performance—in particular, when spikes are linearly integrated—are consistent with several prior studies of the role of correlated activity in decision making (Zohary et al., 1994; Britten, Newsome, Shadlen, Celebrini, & Movshon, 1996; Cohen & Newsome, 2009). We note, however, two differences in our models. The first is the mechanism through which correlated spikes are generated: whereas we use additive and subtractive models based on Poisson processes, Britten et al. (1996) and Cohen and Newsome (2009) use a multivariate gaussian description of spike counts. The second is that in these studies, decisions are rendered after a duration that is fixed before the trial begins—either a single duration, (Britten et al., 1996) or one that is drawn from a distribution of reaction times (Cohen & Newsome, 2009). This is different from our setting, where incoming signals on each trial determine the reaction time through a bound crossing.

Our result, in the case of subtractive correlations, that linear integration of spikes closely approximates the optimal decision-making strategy is similar to findings of Beck et al. (2008). Specifically, they model a dense range of differently tuned populations and find that optimal Bayesian inference can be based on linear integration of inputs for a wide set of correlation models. Our additive case, however, behaves differently, as nonlinearities are needed to achieve the optimal strategy.

An aim of future work is extending the setting of our study to include orientation tuning curves as in Cohen and Newsome (2009) and Beck et al. (2008). This is more realistic for many decision tasks (including the direction discrimination task) and will also allow progress toward models with multiple decision alternatives. An important challenge will come from defining pairwise correlations that vary as a function of preferred tuning orientation (see Zohary et al., 1994, and Cohen & Newsome, 2008), while also including the full structure of correlating events across multiple cells in a realistic way. For example, in this article, additive correlating events occurred independently in the two populations; future work could take a more graded approach, in which only some events have an impact on the entire sensory population (as in an eyeblink or possibly an attentional shift during a visual task).

Moreover, cells with similar orientation preferences would be more likely to share additive common spike events (these will occur with a frequency determined by their pairwise correlations, which can be higher for cells with similar response properties: Zohary et al., 1994; Cohen & Newsome, 2008).

At least from the perspective of the simple class of decoders that arose in this article, which combines the total population output of all cells that favor either of the two task alternatives, this raises interesting complexities. Common spike events would involve different subsets of cells. Thus, correlated spiking will be more graded and could be harder to discount than for the simpler homogeneous case we study. We therefore speculate that additive correlations will have a stronger impact on decision-making performance for heterogeneous populations. However, richer decoders with nonlinearities that act separately on the output of different neurons could mediate this. The issue will require careful quantitative study before solid conclusions can be reached.

As long as each neuron remains modeled as a Poisson point process, the sequential accumulation theory used here will carry over directly. For example, models that introduce correlations by common gain fluctuations would provide a multiplicative model of joint fluctuations and may be amenable to our approach. This points to another limitation of this study and an opportunity for future work: the lack of temporal correlations in the statistics of the inputs. A model of correlations that includes spikes from a correlating train that are temporally jittered (Gutnisky & Josíc, 2010; Staude et al., 2010) could provide a starting place for a model of the input trains; however, defining updates to the likelihood ratio for the two competing hypotheses will be more difficult. Nevertheless, it will be interesting to see how our results carry over; in particular, there will be many more combinations of spike events that will contribute to increments for both spike integration and SPRT decision models.

While we therefore view this study as a first step in exploring many possibilities, our findings demonstrate how the population-wide structure of correlations—beyond pairwise correlation coefficients—can have a strong impact on the speed and accuracy of decisions and the circuit operations necessary to achieve optimal performance. This suggests that multielectrode and imaging technologies, together with theoretical work on neural coding, will continue to play an exciting role in understanding the structure of basic computations like decision making over time.

## Appendix A: Sequential Probability Ratio Test

### A.1. Nontrivial Root of the Moment-Generating Function.

*FC*and

*DT*of an independently sampled sequential hypothesis test (via equations 2.14 and 2.16). For the SPRT, the increment distribution is given in equation 2.18 as The “correct” hypothesis

*H*

_{1}is in the numerator in order to orient a crossing of the positive decision threshold with a correct choice. Correspondingly, the probability of observing a given sample

**,**

*S*^{i}_{p}**is known from assumption of this hypothesis and by definition follows the distribution where the independence assumption of the spike count vectors from the two separate pools**

*S*^{i}_{n}**and**

*S*^{i}_{p}**has allowed the factoring of the distribution Dropping the sampling index**

*S*^{i}_{n}*i*for notational convenience, the MGF can then be computed as The nontrivial root () can then be seen by inspection (see equation 2.19): We note that this computation is fully general, without any assumptions on the structure of correlations within and across pools.

### A.2. *E[w]*, Independent Activity

*E[w]*

*FC*and

*DT*functions,

*E*[

*W*], is computed for independent spike count distributions (; see equation 2.20) as follows (see also Zhang & Bogacz, 2010): When this quantity is substituted into equation 2.16, the will cancel off, implying that

*DT*is not a function of the sampling increment size. We compute this quantity for correlated spike count distributions next.

### A.3. *E[w]*, Additive Correlations Model

*E[w]*

When neurons within pools are correlated, the joint PDF of the spike count vector is no longer decomposable into the product of the marginal distributions (the critical step between equations A.8 and A.9). However, an expression for *E*[*W*] can be obtained in the limit as by repeatedly expanding via Taylor series about throughout the computation.

**becomes the product of the conditioned marginal distributions. However, this is true only for the first factor in the summand of equation A.16. To continue, we must expand the log ratio of the probability distributions, using the law of total probability, in : Moreover, the**

*s*_{p}*N*−term summation in equation A.16 need only be over , as higher values will produce contributions of higher than first order in . Two cases emerge for the expansion: if

*s*=0 for any

_{i}*i*, , and we have But if

*s*=1 for all

_{i}*i*, we can compute the expression directly by total probability, as there are only four possible ways for the event to originate. To first-order in , this is Therefore, this single element of the sum offers no order one contribution (it is multiplied by which in itself is O); thus,

*p*for

*n*; adding everything together gives the final result, to first order in : We note here that as and , we reproduce the results that would be expected from equation A.11. Also, a more intuitive and tractable computation can be done for an analogous additively correlated Bernoulli process, resulting in the same solution.

### A.4. *E[w]*, Subtractive Correlations Within Pools

*E[w]*

*E*[

*W*] is the same as the additive correlation case, up to equation A.14. In this case, however, we now have Taking the case first, we notice that it is impossible for any spikes to occur without a spike in the correlating spike train: Because of this, we can simplify: Interestingly, after conditioning on the number of correlating spikes, the probability of the zero vector (or any vector

**) is the same under both**

*s*_{p}*H*

_{0}and

*H*

_{1}: We then expand to first order in : In the case of , only zero-order terms must be computed. When computing the summation only carries over for each element of

**. The case of**

*s*_{p}**=**

*s*_{p}**0**provides no contribution at zero order, as can be seen by equation A.38. For any other case, there will be a degeneracy in the expansion of the log, caused by an absence of order 0 terms: Therefore, to first order in , Combining equations A.32, A.35, A.38, and A.44, we find that As before, exchanging

*p*for

*n*takes care of the expression for the null pool, and adding together gives Once again, as and , we reproduce the results that would be expected from equation A.11.

## Appendix B: Spike Integration

### B.1. Independent Spiking.

*FC*and

*DT*for the spike integration accumulation model relies on computation of the MGF for the sampling distribution. We begin with several identities that will be useful below. The MGF for the sum of

*N*independent random variables is Given that the MGF for a random variable , it follows that Finally, the MGF for a Poisson random variable is Given the definition of the increment variable in equation 2.17 and noting that each spike count random variable is independent, we can combine these observations to construct the MGF for the sampling random variable over a time window : Now the nontrivial root can be calculated (see equation 2.21): Because the MGF is known explicitly, the computation of the expected increment is simple (see equation 2.22):

### B.2. Additive Correlations Model.

*N*times (firing at a rate ) and spikes from the

*N*independent spike trains that get counted once (each firing at a rate ): The MGF for the shared spike train can be computed directly from the definition, using its probability mass function (PMF), and thus, The MGF for the independent spike trains follows from section B.1, giving the form of the MGF of the increment over a time as After rearranging,

*h*

_{0}is implicitly defined as the nontrivial root of As , we recover the solution from section B.1. The expected increment can be directly computed as Note that this last expression is the same as in the independent case (see equation 2.22), as expected, and that unlike the SPRT, no limits in were necessary to compute the parameters for the

*FC*and

*DT*functions.

### B.3. Subtractive Correlations Model.

*N*,

*k*;

*p*] distributions, we have After applying equation B.7 with this MGF for both the preferred and null populations, we find an implicit relationship for the nontrivial real root

*t*=

*h*

_{0}that does not depend on : As before, the expected increment can be directly computed by differentiation, and we find the same expression as in the additive correlation case:

## Appendix C: Speed and Accuracy Functions with Overshoot

*FC*and

*DT*(assuming ) derived by Wald (1944), which are Specifically, equations 2.14 and 2.16 hold under the assumption that the value of the state variable on the decision step is exactly equal to the decision threshold. In practice, however, this no-overshoot assumption may not provide a particularly good approximation.

A correction term based on the mean of the overshoot distribution—that is, the distribution of the random variable defined by the excess distance over either the positive or negative threshold on the threshold crossing step—is suggested by Lee et al. (1994). This correction is based on the Taylor expansion of the conditional expectations in equation C.1 and takes the form of a shift in the decision threshold. A correction of this form is relevant to our analysis, as the performance of two model is compared parametrically in the threshold to isolate the effects of the speed-accuracy trade-off imparted by freely adjusting the threshold.

*E*conditioned on crossing the first threshold as , and let overshoot the random variable, with mean

_{n}*u*. Expanding the conditional expectation (although dropping the conditional notation for convenience) via a Taylor series centered on this mean (the so-called delta method), we have Choosing yields an expression of Wald's truncation: Here we see that if , each term in the expansion becomes zero, and Wald's approximation holds exactly. If

_{X}*E*overshoots , error will accumulate at each term in the expansion as a function of the moments of the overshoot distribution. If instead the expansion is performed about , a threshold-shifted approximation expresses the truncation error terms of the second and higher centered moments of the overshoot distribution: In practice, the overshoot distribution is often nonzero. However, if its mean can be calculated and

_{n}*h*

_{0}<0, the truncation error associated with the latter approximation might provide a more favorable approximation as long as the higher-order moments do not grow too large. For the decision time, using this alternative approximation is exactly correct and results in no additional error.

## Appendix D: Joint Cumulants for the Additive and Subtractive Model

Staude et al. (2010) suggest that cumulants provide a “natural and intuitive higher-order generalization of the covariance” for multineuron spiking. The two models of correlated activity examined here are indistinguishable when examining only first-order (i.e., mean firing rate) or second-order (i.e., pairwise correlations) statistics. Here, we derive the joint cumulants for each of these two models to clarify how the spike count distributions produced by the two models differ at higher orders.

*k*members of (Because of the homogeneity of the pool, we express the

*k*th joint cumulant as calculated on , but the same expression holds for any

*k*-sized subset of ): Here is the set of all partitions of , for example, and is the conditional joint cumulant over the set of all spike counts indexed by an element of

*B*—that is, the set .

_{j}*B*|>1, owing to the conditional independence of each neuron given the common spike train. Moreover, from the definition of the cumulant, the term of equation D.1 for the partition that contains such a block

_{j}*B*will also be zero. This implies that the only that contributes in equation D.1 is (

_{j}*i*=1 in the example of equation D.4); thus, where we have used the fact that the first cumulant is simply the expected value. Using the cumulant-generating function, we then have a formula for the joint cumulant: Thus, for the two models of correlations (assuming a firing rate ), we have:

*k*>2, we see the signature of the differences in the structure of the correlations. For the subtractive correlations model, the joint cumulant decays geometrically as more and more neurons are considered. In contrast, the joint cumulant remains constant for the additive correlations model.

## Acknowledgments

We thank Yu Hu, Adrienne Fairhall, and Michael Shadlen for their valuable comments on the manuscript. We gratefully acknowledge the support of a Career Award at the Scientific Interface from the Burroughs Welcome Fund and NSF grant CAREER DMS-1026125 (E.S.B.), and the University of Washington eScience Institute's Hyak computer cluster.