Abstract

Stimulus from the environment that guides behavior and informs decisions is encoded in the firing rates of neural populations. Neurons in the populations, however, do not spike independently: spike events are correlated from cell to cell. To what degree does this apparent redundancy have an impact on the accuracy with which decisions can be made and the computations required to optimally decide? We explore these questions for two illustrative models of correlation among cells. Each model is statistically identical at the level of pairwise correlations but differs in higher-order statistics that describe the simultaneous activity of larger cell groups. We find that the presence of correlations can diminish the performance attained by an ideal decision maker to either a small or large extent, depending on the nature of the higher-order correlations. Moreover, although this optimal performance can in some cases be obtained using the standard integration-to-bound operation, in others it requires a nonlinear computation on incoming spikes. Overall, we conclude that a given level of pairwise correlations, even when restricted to identical neural populations, may not always indicate redundancies that diminish decision-making performance.

1.  Introduction

Sensory information is often encoded in irregularly spiking neural populations. One well-studied example is given by direction-selective cells in area MT, whose firing rates depend on the degree and direction of coherent motion in the visual field (Britten, Shadlen, Newsome, & Movshon, 1992, 1993; Newsome, Britten, Movshon, & Shadlen, 1989; Salzman, Murasugi, Britten, & Newsome, 1992; Shadlen, Britten, Newsome, & Movshon, 1996). Individual neurons in MT, as in many other brain areas, exhibit noisy and variable spiking (Newsome et al., 1989) and can be modeled by Poisson point processes (Softky & Koch, 1993; Tuckwell, 1989). Moreover, this variable spiking is generally not independent from cell to cell. Returning to our example, a number of studies have measured pairwise correlations in MT during direction discrimination tasks as well as smooth-pursuit eye movements (Huang & Lisberger, 2009; Bair, Zohary, & Newsome, 2001; Zohary, Shadlen, & Newsome, 1994; Cohen & Newsome, 2008). Although this measurement is a subtle endeavor experimentally, a number of studies suggest a value near (Cohen & Kohn, 2011, summarize these observations for a number of brain areas.)

What are the consequences of correlated spike variability for the speed and accuracy of sensory decisions? The role of pairwise correlations in stimulus encoding has been the subject of many prior studies (Salinas & Sejnowski, 2001; Latham & Roudi, 2011; Averbeck, Latham, & Pouget, 2006; Abbott & Dayan, 1999). The results are rich, showing that correlations can have positive, negative, or neutral effects on encoded information. Our study serves to extend this body of work in two ways. First (as done in a different context by Ganmor, Segev, & Schneidman, 2011, and Montani et al., 2009), we contrast the effect of correlations that have the same pairwise level but a different structure at higher orders.

Second (as in Cohen & Newsome, 2008, and Beck et al., 2008), we consider the impact of correlations on decisions that unfold over time by combining a sequence of samples observed over time in the sensory populations. A classic example that we use to describe and motivate our studies is the moving dots direction discrimination task. Here, a fraction of dots in a visual display move coherently in a given direction, while the remainder exhibit random motion; the task is to identify the direction from two possible alternatives. Decisions become increasingly accurate as subjects take (or are given) longer to make the decision.

In analyzing decisions that develop over time, we use a central result from sequential analysis. This is the sequential probability ratio Test (SPRT; Wald & Wolfowitz, 1948; Gold & Shadlen, 2002), which linearly sums the log odds of independent observations from a sampling distribution until a predetermined evidence threshold is reached. The SPRT is the optimal statistical test in that it gives the minimum expected number of samples for a required level of accuracy in deciding among two task alternatives.

We pose two related questions based on the SPRT. First, how does the presence of correlated spiking in the sampled pools affect the speed and accuracy of decisions produced by the SPRT? Our focus is on how the structure of population-wide correlations determines the answer. Second, how does the presence of correlated spiking affect the computations that are necessary to perform the SPRT? This question is intriguing, because the SPRT may be performed using the simple, linear computation of integrating spikes over time and across the populations for a surprisingly broad class of inputs, including independent Poisson spike trains (Zhang & Bogacz, 2010; Bogacz, Brown, Holmes, & Cohen, 2006). Thus, in this setting, optimal decisions can be made by integrator circuits (Bogacz et al., 2006; Goldman, Compte, & Wang, 2009; Cain & Shea-Brown, 2012). Our goal here is to determine whether and when this continues to hold true for correlated neural populations.

We answer these questions for two illustrative models of correlated, Poissonian spiking. We emphasize that the spikes that these models produce are indistinguishable at the level of both single cells and pairs of cells. However, they differ in higher-order correlations in that they can be distinguished only by examining the statistics of three or more neurons. In the first model, correlations are introduced using shared spike events across the entire pool. In this case, optimal inference via the SPRT produces fast and accurate decisions but depends on a nonlinear computation. As a result, the simpler computation of spike integration requires, on average, longer times to reach the same level of accuracy. In contrast, when shared spiking events are more frequent but are common to fewer neurons within a pool, performance under the SPRT is significantly diminished. However, in this case, both SPRT and spike integration perform comparably, so a linear computation can produce decisions that are close to optimal.

2.  Models of Evidence Accumulation and Encoding

2.1.  Model Neural Populations and the Decision Task.

We begin by introducing the notation for the two decision-making models that will be compared. In this study, we consider the case of discrimination between two alternatives and therefore model two populations of neurons that encode the strength of evidence for each alternative. Returning to the moving dots task for illustration, we find that each population could be the set of MT cells that are selective for motion in a given direction. Here, the firing rates in each population represent the dot motion C via their firing rates and ; here the subscripts indicate, respectively, the “preferred” and “null” populations, which correspond to the motion direction of the visual stimulus versus the alternate direction. In this way, the firing rate of neurons encoding the preferred direction will be higher than the null direction, . Following Wang (2002) (see also Mazurek, Roitman, Ditterich, & Shadlen, 2003; Britten et al., 1993), we model this relationship as linear:
formula
2.1
formula
2.2

Throughout the text, we present results at C=6.4; however, the results do not depend on this particular value of dot motion or its precise relationship to firing rate.

In our model, we assume that each population consists of N neurons firing spikes via a homogeneous Poisson process, with rate or . We use the notation xk(t) to indicate a spike train. Integrating these processes over a time interval provides two time series of N-dimensional vectors of Poisson random variables; these independent vectors provide the input to the decision making-models. Specifically, for the kth neuron in a pool, on the ith time step,
formula
2.3
The properties of Poisson processes imply that Sik is independent from Sjk (), that is, for different time steps.
However, the outputs of different neurons at the same time are not in general independent. Following experimental observations that neurons with similar directional tuning tend to be correlated, while those with very different tuning are not (Zohary et al., 1994; Cohen & Newsome, 2008), we model neurons from different pools as independent and those within a single pool as correlated with a correlation coefficient :
formula
2.4
This implies that with vector notation for the probability distribution of spike counts for each pool,
formula
2.5
Next, we introduce notation for decision making between the two task alternatives. The task of determining, say, direction in the moving dots task is that of determining which of the two pools fires spikes with the higher firing rate. We frame this as decision making between the hypotheses,
formula
2.6
formula
2.7
where each alternative corresponds to a decision as to the motion direction. This formalism allows us to define accuracy as the fraction of trials on which the correct hypothesis H1 is accepted. In this study, we consider decision-making tasks at a fixed level of difficulty, so that and do not vary from trial to trial (i.e., this hypothesis test is simple, not composite).

2.2.  Accumulating Spikes and Evidence Over Time.

We relate the decision-making task to a discrete random walk, which follows in turn from the sequential accumulation of independent and identically distributed (i.i.d.) realizations from the sampling distribution Wi. We specify this distribution below; for now, we note that the random walk takes the general form
formula
2.8
formula
2.9
In an unbiased drift-diffusion model of decision making, accumulation continues as long as (Ratcliff, 1978; Gold & Shadlen, 2002; Bogacz et al., 2006). The number of increments necessary to cross one of the two thresholds, multiplied by its duration , defines the decision time; this is a random variable, as it varies from trial to trial. Crossing the threshold corresponding to H1 is interpreted as a correct trial; the fraction of correct (FC) trials defines the accuracy of the model. Together, the expected (mean) decision time (DT) and accuracy (FC) determine the performance of a decision-making model.
First described by Wald (1944), formulas for the mean decision time and accuracy of an i.i.d. sampled, cumulative stochastic process can be computed from two quantities: the diffusion threshold and the (nontrivial) real root h0 of the moment-generating function of the sampling distribution. Importantly, these formulas are exact under the assumption that the final increment in En does not overshoot the threshold, a point we return to below. Given the moment-generating function (MGF) for the sampling distribution,
formula
2.10
the value s=h0 is defined by the implicit relationship
formula
2.11
To intuitively understand why this value is key to computing the DT and FC, we follow Shadlen, Hanks, Churchland, Kiani, and Yang (2006). First recall that the MGF of the sum of n i.i.d. random variables is the MGF of the sampling distribution of that variable to the nth power. In a drift-diffusion model, the process is terminated after a random number of steps. Therefore, it stands to reason that some quantity, based on the sampling distribution, that does not change based on the specific number of samples drawn on a given trial would play a special role. fits that description, by construction. Informally (for more details, see Shadlen et al., 2006), denoting by the random number of steps taken to reach threshold, we have that
formula
2.12
However, under the assumption of no overshoot, with probabilities FC and 1−FC, respectively, yielding
formula
2.13
Solving this relationship for the accuracy FC, we arrive at a formula depending on h0 and the diffusion threshold:
formula
2.14
Similarly, one arrives at
formula
2.15
which can be rearranged and combined with equation 2.14 to yield a formula for :
formula
2.16
(see Bruss & Robertson, 1991, and Shadlen et al., 2006, for more details). We notice here that as increases (and assuming h0<0), both FC and RT will increase—the so-called speed-accuracy trade-off. Because the remaining two parameters h0 and E[W] are computed directly from the increment distribution, we will compare decision-making models by plotting DT versus FC parametrically in . We call these speed-accuracy trade-off plots because they describe, for a given model, how speed and accuracy trade by a modification of the decision threshold.
This theory allows accuracy and reaction times to be computed based on three quantities: the diffusion threshold (), the mean of the sampling distribution E[W], and the nontrivial root of its moment-generating function h0. We now return to the definition of the random increments W for the different models we wish to compare. First, in the spike integration (SI) model, increments are constructed by counting the spikes emitted in a window by the preferred pool and subtracting the number emitted by the null pool. This is equivalent to the time evolution of a neural integrator model that receives spikes as impulses with opposite signs from the preferred and null populations. This integrate-to-bound model is an analog of the drift-diffusion model (DDM) with inputs that are not white noise but rather Poisson spikes:
formula
2.17
(Ratcliff, 1978; Bogacz et al., 2006; Zhang & Bogacz, 2010; Beck et al., 2008; see also Mazurek et al., 2003).
Second, in the sequential probability ratio test (SPRT), the increment is defined as the log-odds ratio of observing the spike count from both of the pools under each of the two competing hypotheses:
formula
2.18

2.3.  The Case of Independent Neurons.

Zhang and Bogacz (2010) present an analysis of speed and accuracy of decision making based on independent neural pools; for completeness and to help contrast this result with the correlated case, we give the key calculations in sections A.1 and B.1. Here, choosing increments via the SPRT yields
formula
2.19
formula
2.20
Under the spike integration model, Zhang and Bogacz (2010; see also section B.1) find that
formula
2.21
formula
2.22
Therefore, by applying a change of variables in equations 2.14 and 2.16, spike integration can implement the SPRT. The implication is that simply counting spikes, positive for one pool and negative for the other, can implement statistically optimal decisions when the neural pools are independent (Zhang & Bogacz, 2010).

2.4.  Correlated Neural Populations: The Additive and Subtractive Models.

We next describe two models for introducing correlations into the Poisson spike trains of each neural population. These models have identical first-order and pairwise statistics but differ in their higher-order correlations; that is, the models cannot be distinguished by examining pairs of neurons alone but require simultaneous observation of larger groups of cells. The distinction between the models is that one features common spiking events that occur across the entire population, and the other only random subsets of neurons. Both models are studied in Kuhn, Aertsen, and Rotter (2003) and Staude, Grün, and Rotter (2010) and rely on shared input from a single correlating process to generate the correlations in each pool. These authors termed the two models SIP and MIP for, respectively, single- and multiple-interaction process; here we use the descriptors additive and subtractive. In both models, a realization of correlated spike trains that provide the input to the accumulation models is achieved with a common correlating train.

Before describing the models in detail, we note that in this study, these models are statistical approaches chosen to illustrate a range of impacts that correlations can have on decision making (see also Gutnisky & Josíc, 2010, and Niebur, 2007). In contrast, in neurobiological networks, correlated spiking arises through a complex interplay of many mechanisms, including recurrent connectivity and shared feedforward interactions (e.g., Aertsen, Gerstein, Habib, & Palm, 1989; Shadlen & Newsome, 1998; Smith & Kohn, 2008). While beyond the scope of this article, avenues for bridging the gap between statistical and network-based models of correlations in the context of decision making are considered in section 6.

The first case is the additive model, in which the spike train for each neuron is generated as the sum of two homogeneous Poisson point processes. This model might, for example, capture the effect of shared pulses of activity arising from common sensory or modulatory events upstream. The first Poisson train is generated with an overall firing rate of , where  is the intended firing rate of the neuron and is the intended pairwise spike count correlation between any two neurons in the pool. The second train, with a rate of , is added to every neuron in the pool and serves as the common source of correlations. An example of this model of spike train generation is depicted in the rastergrams in Figures 1A and 1B; the common spike events are evident as shared spikes across the entire population.

Figure 1:

Spike integration (SI) and SPRT for a single trial, with (A–C) subtractive (termed MIP in Kuhn et al., 2003) correlations and (D–F) additive (termed SIP in Kuhn et al., 2003) correlations. Rastergrams at C=6.4 for preferred (A, D) and null (B, E) populations of five neurons, correlation within pools . In C and F these spikes are either integrated (black line) or provide input for the SPRT (gray line), until a decision threshold is reached. The threshold has been set so that all four cases yield the same mean reaction time (in C, and , and in F, and ; in both cases the SPRT lines have been scaled for plotting purposes). The SPRT accumulator crosses the “correct,” upper threshold, as opposed to the “incorrect,” lower threshold for the spike integrator. The evolution of the spike integration process is not simply a scaled version of the SPRT under either model of correlations.

Figure 1:

Spike integration (SI) and SPRT for a single trial, with (A–C) subtractive (termed MIP in Kuhn et al., 2003) correlations and (D–F) additive (termed SIP in Kuhn et al., 2003) correlations. Rastergrams at C=6.4 for preferred (A, D) and null (B, E) populations of five neurons, correlation within pools . In C and F these spikes are either integrated (black line) or provide input for the SPRT (gray line), until a decision threshold is reached. The threshold has been set so that all four cases yield the same mean reaction time (in C, and , and in F, and ; in both cases the SPRT lines have been scaled for plotting purposes). The SPRT accumulator crosses the “correct,” upper threshold, as opposed to the “incorrect,” lower threshold for the spike integrator. The evolution of the spike integration process is not simply a scaled version of the SPRT under either model of correlations.

The second case is the subtractive correlations model, in which correlated spikes are generated through random, independent deletions from an original “mother” spike train; we refer to this as the correlating spike train (Kuhn et al., 2003). Correlations with this structure might arise, for example, as a consequence of synaptic failure in connections from a common pool of upstream neurons. In order to achieve an overall firing rate for the pool of spikes per second, with a pairwise correlation between any two individual neurons, the correlating train has a rate of spikes per second. Then, for each neuron in the pool, a spike is included from this train i.i.d. with a probability of . In our model, there is a separate correlating spike train for each of the two independent populations. An example of the subtractive model of spike train generation is depicted in the rastergrams in Figures 1D and 1E.

In summary, the two models include correlated spike events that originate from a single mother train. Although they produce identical correlations among cell pairs, these events are distributed in different ways across the entire population. We note that the results of Zhang and Bogacz (2010) can be seen as a limiting case as of either the additive or subtractive models.

3.  Subtractive Correlations and Decision-Making Performance

3.1.  The SPRT Decision-Making Model.

We now study the impact of subtractive correlations on decision-making performance. As noted above, recall that within a time window , the spike counts from each neuron form a vector of random variables that are independent from window to window. These independent vectors provide the evidence for each of the two alternatives, which is then weighed via log likelihood at each step in SPRT. In sections A.1 and A.4, we compute the values h0 and E[W] that define the speed and accuracy of the SPRT (see equations 2.142.16) for two pools with subtractive correlations. As this computation is done in continuous time, it is natural to take . Doing so, we find
formula
3.1
formula
3.2

Comparing these values against those of the independent SPRT given in equations 2.19 and 2.20, we see that the only effect of correlations is a scaling of the expected increment via . In the limit as , this scale factor approaches N, which in turn reduces decision time (the scale factor is inversely proportional to DT via equation 2.16). However, as , the scale factor itself approaches 1; this agrees with the intuition that as all neurons become perfectly redundant, the performance should resemble that of a single neuron. In fact, the mechanism of the SPRT on a given sample can be seen as inferring the firing rate of the correlating train from a derived vector of noisy random variables. As N gets large, then, performance should be limited by performing an SPRT on the correlating mother trains themselves. This is precisely what happens when in equation 3.2: we obtain corresponding to decision making based on mother spikes of rate and .

One consequence of this interpretation is that the particular realization of a spike vector (in a sufficiently small time bin ) carries no evidence about the decision of H1 versus H0, beyond its identity as either the zero vector 0 or not. Of course, this is a consequence of the construction of the subtractive correlations model, as the spike deletions that create the realization of the spike vector have no dependence on the firing rate of the population. Concretely, then, the increments (or decrements) are based solely on whether the vector of spikes in the preferred (or null) pool contains any spikes at all; the actual number of spikes is irrelevant in the SPRT.

It follows that the accumulation process En is a discrete-space random walk, with steps . To see this, note that for sufficiently small , there are only three possibilities for how spikes will be emitted from the two populations. First, both the preferred and null pools could produce no spikes. This event provides no information to distinguish the firing rates of the pools, so the increment is 0. Second, one of the pools could produce a vector of spikes caused by i.i.d. deletions from the mother spike train. If the spiking pool is the preferred one, each possible nonzero spike vector will increment the accumulator by the log of the ratio ; the opposite sign occurs if the null pool spikes. Events in which both pools spike are of higher order in and thus become negligible for small time windows.

The discrete nature of the SPRT effect causes the FC curve in Figure 2A to take on only discrete values of accuracy; a small increase in above a multiple of will not improve accuracy because En on the final, threshold-crossing-step will overshoot the threshold. This also explains why some of the FC values at a given do not lie on the theoretical line defined by equation 2.14; that equation is exactly true only in the case of zero overshoot past the threshold. We return to this point later in the main text and also in appendix  C.

Figure 2:

Subtractive correlations significantly diminish decision performance under SPRT (C=6.4, N=240). (A) The discrete nature of the SPRT diffusion process implies that only discrete values of accuracy are possible. These occur at values of that are multiples of . (Similar results hold for decision time; not shown.) The solid dots are simulations of the SPRT, and gray dots are exact values taken at multiples of the log ratio; the interpolating line is equation 2.14. (B) Accuracy (see equation 2.14) and decision time (see equation 2.16) are plotted parametrically as a function of threshold for eight different values of (linearly spaced on [0,.35] with a double-thickness line at ). Performance of the simulation at multiples of the log ratio of firing rates are plotted as solid dots with theoretical values in gray (gray dots are enlarged to be distinguished).

Figure 2:

Subtractive correlations significantly diminish decision performance under SPRT (C=6.4, N=240). (A) The discrete nature of the SPRT diffusion process implies that only discrete values of accuracy are possible. These occur at values of that are multiples of . (Similar results hold for decision time; not shown.) The solid dots are simulations of the SPRT, and gray dots are exact values taken at multiples of the log ratio; the interpolating line is equation 2.14. (B) Accuracy (see equation 2.14) and decision time (see equation 2.16) are plotted parametrically as a function of threshold for eight different values of (linearly spaced on [0,.35] with a double-thickness line at ). Performance of the simulation at multiples of the log ratio of firing rates are plotted as solid dots with theoretical values in gray (gray dots are enlarged to be distinguished).

We next insert the values for h0 and E[W] computed above into equations 2.14 and 2.16 and plot the resulting speed-accuracy curves relating DT and FC parametrically in the threshold (see Figure 2B). (We plot the full FC and RT functions, although only discrete values of performance along each of the lines are achievable in practice, as indicated by the dots for the case; see the caption). By comparing speed-accuracy curves for different values of ranging from 0 to 0.3, we see our first main result: introducing subtractive correlations within neural populations substantially diminishes the best possible decision performance: that obtained via the SPRT. We next derive the analogous results for the simpler spike integration model.

3.2.  The Spike Integration Decision-Making Model.

Next, we consider decision-making performance for the simpler model in which spikes are simply integrated over time as opposed to the likelihood ratio computation of the SPRT. In this case, the moment-generating function of the difference in spike counts from the two pools is more straightforward (see section B.3) and provides an easy computation of E[W]:
formula
3.3
The nontrivial root of the MGF h0 is found to be the implicit solution of
formula
3.4
Here we see that correlations affect the performance of the model only through changing h0, as the expected increment is the same in the independent case (see equation 2.22). Moreover, performance under spike integration is diminished to a degree that is comparable to the performance loss of SPRT. To illustrate this, Figure 3A plots the speed-accuracy trade-off curves from both models of decision making under subtractive correlations, for the same values of . As we must (Wald & Wolfowitz, 1948), we see the optimal character of the SPRT in the fact that at a given level of accuracy, the SPRT requires, on average, fewer samples than spike integration. However, the difference is very slight. This yields our next main result: nearly optimal decisions are produced by the simple operation of linear integration over time for the subtractive correlations model of spike correlations across neural populations.
Figure 3:

For the subtractive model of spike correlations, decision-making performance of the spike integration model is comparable to the SPRT and is well described by equations 2.14 and 2.16 despite overshoot past the decision threshold. (A) Gray lines are reproductions of speed-accuracy curves from the SPRT (see Figure 2), and black lines are speed-accuracy curves for spike integration. (B, C) Overshoot past the decision boundaries reduces the validity of Wald's approximations, but a constant shift in threshold can help mitigate the effect (see Ghosh & Sen, 1991; Lee, Park, & Kim, 1994; and appendix  C). Such a shift is automatically accounted for when comparing curves that are parametric in (panel A, for example).

Figure 3:

For the subtractive model of spike correlations, decision-making performance of the spike integration model is comparable to the SPRT and is well described by equations 2.14 and 2.16 despite overshoot past the decision threshold. (A) Gray lines are reproductions of speed-accuracy curves from the SPRT (see Figure 2), and black lines are speed-accuracy curves for spike integration. (B, C) Overshoot past the decision boundaries reduces the validity of Wald's approximations, but a constant shift in threshold can help mitigate the effect (see Ghosh & Sen, 1991; Lee, Park, & Kim, 1994; and appendix  C). Such a shift is automatically accounted for when comparing curves that are parametric in (panel A, for example).

Having established this, we pause to note a subtlety in our analysis. Figures 3B and 3C show FC and DT as a function for both simulated data and plots of equations 2.14 and 2.16. The solid lines are the graphs of those equations as written (using the values for h0 and E[W] in equations 3.4 and 3.3), and the mismatch between the lines and the data is a consequence of overshoot past the threshold. The broken line is a graph of the same formulas, with a shift in , an offset computed as the sample mean of the overshoot distribution (see Figure 6 as well as the discussion in appendix  C; also Ghosh & Sen, 1991; Lee et al., 1994). This correction term helps the FC and DT equations better approximate the data when there is potential overshoot. Interestingly, however, parametric plots like Figure 3A already take this effect into account.

4.  Additive Correlations and Decision-Making Performance

4.1.  The SPRT Decision-Making Model.

As described in section 2.4, the additive model of spike train correlations also uses a common spike train to generate correlations, but does so in a manner that gives a distinct population-wide correlation structure. We now derive the consequences for decision-making performance under the SPRT. In sections A.1 and A.3, we find the expressions for the parameters of the FC and DT curves, as the window size :
formula
4.1
formula
4.2
Comparing these with equations 2.19 and 2.20, we see that as in the subtractive correlations model, the only difference with the independent case is a scaling factor on the average increment E[W] in equation 4.2. To explain the form of the scale factor, note that the spike vector from each pool is composed of N independent spike trains firing at rate and a single (highly redundant) spike train firing at a rate .

As in the subtractive correlations model, En here also becomes a discrete random walk with increment . This can be seen by noting that for either pool, in a sufficiently small window, only one of two events is possible: no spikes occur at all, or a single spike occurs in one neuron, in one of the two pools. The first case is uninformative about either H1 or H0. The second case occurs with probability under H1 and under H0 (here, if the spike occurred in the preferred pool, for example). Taking the log ratio, we find our increment is independent of correlations. The resulting decision accuracy (FC) is plotted versus threshold in Figure 4A and is qualitatively similar to the subtractive correlations case, with plateaus following from the discrete nature of En. However, the speed-accuracy trade-off pictured in Figure 4B is very different from that found in the subtractive correlations model.

Figure 4:

Additive correlations do not significantly diminish decision performance under the SPRT. (A) The discrete diffusion with increment gives the same accuracy as the subtractive correlations case (see Figure 2A) at each value of . Because of the absence of overshoot, the FC and DT relationships can be applied exactly. (B) However, the resulting speed-accuracy curves are very different. In particular the impact of correlations on the speed-accuracy trade-off is much smaller than for subtractive correlations (see Figure 2B, noting that here, the abscissa ranges up to 30 ms, in contrast to 800 ms). Here, only , 0.15, and 0.3 are plotted for clarity.

Figure 4:

Additive correlations do not significantly diminish decision performance under the SPRT. (A) The discrete diffusion with increment gives the same accuracy as the subtractive correlations case (see Figure 2A) at each value of . Because of the absence of overshoot, the FC and DT relationships can be applied exactly. (B) However, the resulting speed-accuracy curves are very different. In particular the impact of correlations on the speed-accuracy trade-off is much smaller than for subtractive correlations (see Figure 2B, noting that here, the abscissa ranges up to 30 ms, in contrast to 800 ms). Here, only , 0.15, and 0.3 are plotted for clarity.

In particular, we see our third main result: the impact of additive correlations on optimal (SPRT) decision performance is relatively minor. For example, in the presence of pairwise correlations as strong as , the mean decision time required to reach a typical value of accuracy is increased by only a few milliseconds compared with the independent case, instead of by hundreds of milliseconds for subtractive correlations. Equation 4.2 offers an intuitive explanation for this fact: E[W] is inversely proportional to DT and does not diminish nearly as fast for additive correlations as for subtractive correlations (see equation 3.2).

4.2.  The Spike Integration Decision-Making Model.

What about the ability of the simple spike integrator to perform decision making when confronted with additive correlations? Proceeding as in the subtractive-correlations case, we derive an implicit relationship for h0, and the expected increment E[W]:
formula
4.3
formula
4.4
By comparing with equation 2.22, we see that as for spike integration in the subtractive case, correlation affects only the value of h0, not the expected increment. Substituting these values into equations 2.14 and 2.16, we then plot the speed-accuracy trade-off curves for this model under the assumption of no overshoot in Figure 5A. It appears that when decisions are made by spike integration, correlations have a significant impact on performance (black lines), in contrast to the SPRT case (solid gray lines, reproduced from Figure 4B). Overall, the degree of performance loss is comparable to that under subtractive correlations (broken gray lines, reproduced from Figure 3B). This is our fourth main result: for additive correlations, if decisions are made by spike integration instead of the SPRT, correlations have a significant impact on reducing decision performance.
Figure 5:

Decision-making performance of the spiking integrator model with additive correlations is comparable to subtractive correlations: correlations significantly decrease performance. (A) Black lines give the speed-accuracy trade-off predicted using h0 and E[W] from equations 4.3 and 4.4 (and thereby assuming no overshoot of the decision threshold). Performance is similar to the subtractive-correlations case (broken gray lines) and significantly worse than performing SPRT on additive-correlated inputs (solid gray lines). (B) At , for example, major differences arise between this theory (again, solid black line, reproduced from panel A) and simulation of the model (dots), especially at short reaction times. This is a consequence of significant overshoot of En over the decision threshold on the threshold crossing step. (Inset) At short reaction times, the simulations actually perform closer to the SPRT (gray line, reproduced from Figure 4A); see the text.

Figure 5:

Decision-making performance of the spiking integrator model with additive correlations is comparable to subtractive correlations: correlations significantly decrease performance. (A) Black lines give the speed-accuracy trade-off predicted using h0 and E[W] from equations 4.3 and 4.4 (and thereby assuming no overshoot of the decision threshold). Performance is similar to the subtractive-correlations case (broken gray lines) and significantly worse than performing SPRT on additive-correlated inputs (solid gray lines). (B) At , for example, major differences arise between this theory (again, solid black line, reproduced from panel A) and simulation of the model (dots), especially at short reaction times. This is a consequence of significant overshoot of En over the decision threshold on the threshold crossing step. (Inset) At short reaction times, the simulations actually perform closer to the SPRT (gray line, reproduced from Figure 4A); see the text.

However, the assumption that integrated spikes do not overshoot the decision threshold might seem suspect under the additive model of correlations, as there is a possibility that the threshold crossing step might occur as a result of every neuron in a pool simultaneously spiking at once. In fact, when the number of neurons in the pool is large (as in the cases we consider), additive correlations can indeed cause significant overshooting of thresholds. Importantly, and unlike for subtractive correlations, this effect cannot be compensated by a constant offset in the decision threshold.

Figure 5B demonstrates the consequences for the speed-accuracy trade-off. Here, when the spike integration model is simulated directly, we see a surprising nonmonotonic relationship between FC and DT in the presence of additive correlations of strength . This violates the usual intuition that accuracy should increase at slower decision speeds. The explanation comes from the fact that as the decision threshold is raised, DT correspondingly increases while accuracy suffers, a consequence of not finishing a trial before a (relatively rare) spike in a correlating spike train in one of the two pools causes the accumulator to jump far beyond the threshold.

For large thresholds, the sequential sampling theory of equations 2.14 and 2.16, which assumes no overshoot, accurately approximates the simulated data; however, for low values of , the approximation is poor. In fact, the inset to Figure 5B shows that in this regime, the decision-making performance of the spike integration model is far better described by the theory predicted by the SPRT. The intuition behind this observation is that for short reaction times, there is a small probability of a shared spike that will send the integrator significantly over the threshold. This allows accumulation to occur one spike at a time (for sufficiently small ), where each spike arrives from an independent spike train. As we have seen, the process of integrating independent spikes is equivalent to the SPRT. It is only at longer decision times, when the chances of having integrated a large common spike event are larger, that a significant impact of correlations appears.

Figure 6 provides further evidence for this scenario. Density plots of the distribution of the overshoot (conditioned on crossing the upper threshold) for both additive and subtractive correlations are shown as a function of the decision threshold, with particular overshoot distributions plotted at and 250. For the additive correlations model, a significant fraction of the trials terminate with zero overshoot at low values of (because, for example, large correlating events are relatively rare), implying that many trials underwent optimal accumulation of evidence without experiencing a common, correlating spike event, as discussed above.

Figure 6:

Overshoot distributions for spike integration under additive and subtractive correlations. The random variable X indicates the distribution of conditioned on crossing the upper threshold (similar results for the lower threshold are not shown). The probability mass function (PMF) of X varies as a function of , and two vertical slices through this density are shown at and 250. Here the overshoot distributions are discrete due to the integral nature of the increment distribution. For plotting purposes, the vertical axis has been split in the additive case to allow plotting of the outlier point at zero. The black line indicates E[X] as varies; crucially, this quantity varies significantly and for higher values of under the additive correlations model, resulting in the nonmonotonic speed-accuracy trade-off pictured in Figure 5.

Figure 6:

Overshoot distributions for spike integration under additive and subtractive correlations. The random variable X indicates the distribution of conditioned on crossing the upper threshold (similar results for the lower threshold are not shown). The probability mass function (PMF) of X varies as a function of , and two vertical slices through this density are shown at and 250. Here the overshoot distributions are discrete due to the integral nature of the increment distribution. For plotting purposes, the vertical axis has been split in the additive case to allow plotting of the outlier point at zero. The black line indicates E[X] as varies; crucially, this quantity varies significantly and for higher values of under the additive correlations model, resulting in the nonmonotonic speed-accuracy trade-off pictured in Figure 5.

Overall, the monotonic dependence of accuracy (FC) on decision time (DT) follows from the invariance of the moments of the overshoot distribution relative to changes in the threshold value ; this is particularly true for the first moment (see Appendix  C). Figure 7 (Additive) demonstrates that these moments continue to fluctuate over a larger range of , and with larger magnitude, for the additive correlations model. This serves to explain the strange shape of the speed-accuracy trade-off curve pictured in Figure 5B that (unlike the subtractive correlations model) cannot be explained by a constant shift in .

Figure 7:

Increments for the SPRT are nonlinear when input spikes are correlated. (A) For both additive and subtractive correlations, the spike integration model of decision making implies a linear mapping between the number of spikes in the preferred and null populations and the increment to the accumulator. (B) With subtractive correlations, a severe nonlinearity means that only increments of occur. This stands in direct contrast to the optimality of linear summation in the zero-correlations case. (C) A nonlinear computation also appears as a consequence of the additive correlations model; however, the nonlinearity is much less severe than in the subtractive model. (All results pictured hold in the case of vanishing .)

Figure 7:

Increments for the SPRT are nonlinear when input spikes are correlated. (A) For both additive and subtractive correlations, the spike integration model of decision making implies a linear mapping between the number of spikes in the preferred and null populations and the increment to the accumulator. (B) With subtractive correlations, a severe nonlinearity means that only increments of occur. This stands in direct contrast to the optimality of linear summation in the zero-correlations case. (C) A nonlinear computation also appears as a consequence of the additive correlations model; however, the nonlinearity is much less severe than in the subtractive model. (All results pictured hold in the case of vanishing .)

5.  Nonlinear Computations and Optimal Performance via the SPRT

When the neurons in each pool spike independently, Zhang and Bogacz (2010) demonstrated that linear summation of spikes across the two pools at each time step implements the SPRT. Because the SPRT is optimal in the sense of minimizing DT for a prescribed level of FC, the conclusion is that linear integration of spikes across pools, and then across time, provides an optimal decision-making strategy. However, is this optimality of linear integration confined to the case of independent activity within the pool?

Above, we showed that when correlations are introduced into this model, it is no longer true that each spike should be given the same “weight,” as in linear integration. Moreover, knowing only the pairwise correlations and firing rates alone does not allow one to write down a rule for the function that should be applied to incoming spikes in order to implement the SPRT, although in these cases, this function takes the form of the difference between the result of a nonlinearity applied to both pools. This dependence on higher-order statistics is demonstrated in Figure 7 by the fact that the nonlinearities for subtractive correlations (panel B) and additive correlations (panel C) take a significantly different form.

For subtractive correlations, the nonlinearity pictured in Figure 7B that implements the SPRT (up to a change in threshold) takes the form
formula
5.1
formula
5.2
At first glance, it is surprising that such a severe nonlinearity, applied to two correlated spiking pools, results in nearly the same performance as in simple spike integration (see Figure 3). The intuition here is that optimal inference requires essentially performing spike integration on the correlating spike train, as no information about the firing rate is added through spike deletions. This random walk on one of three cases (−1, 0, or +1) is approximated by linear integration in the limit as the size of the pool (N) increases.
Another perspective on the nonlinearities that enable optimal computation is that they leverage knowledge about the mechanism of correlations to improve performance. In the additive correlations model, the nonlinear function depicted in Figure 7C is, as in the subtractive correlations case, a consequence of applying a nonlinearity to each pool and then subtracting. However, in this case, the form is not as drastic: a shared spike event coming from the correlating train registers as only a single spike:
formula
5.3
formula
5.4
Intuitively, this strategy uses the fact that a simultaneous spike in every neuron in a pool has only one explanation for a sufficiently small window of integration, and therefore uses the correlating spike train as an additional independent input in the likelihood ratio. At low values of , this does not confer much of an advantage; however, as the threshold increases, higher accuracy is achievable at much shorter decision times. The nonlinearity, pictured in Figure 8A, also offers an intuition as to why, for low threshold values, spike integration performs almost optimally: when spikes from the correlating train are rare (or can be properly weighted), spike integration implements SPRT (see Figure 8B).
Figure 8:

Optimal performance via spike integration under additive correlations can be realized with a simple nonlinearity. (A) A nonlinearity discounts the contribution to the accumulator of a shared spike event (see equation 5.3). (B) Spike integration with this nonlinearity is suggested by Figure 7C and recovers performance of the decision-making model (black dots) to agreement with the results of SPRT (gray line). Without this nonlinearity to discount shared events, performance suffers (gray dots, reproduced from Figure 5B, inset).

Figure 8:

Optimal performance via spike integration under additive correlations can be realized with a simple nonlinearity. (A) A nonlinearity discounts the contribution to the accumulator of a shared spike event (see equation 5.3). (B) Spike integration with this nonlinearity is suggested by Figure 7C and recovers performance of the decision-making model (black dots) to agreement with the results of SPRT (gray line). Without this nonlinearity to discount shared events, performance suffers (gray dots, reproduced from Figure 5B, inset).

6.  Discussion

Correlated spiking among the neurons that encode sensory evidence appears ubiquitous. Such correlations might arise arise from any number of neuroanatomical features, the simplest being overlapping feedforward connectivity, which can cause collective fluctuations across a population (Binder & Powers, 2001; Shadlen & Newsome, 1998; De La Rocha, Doiron, Shea-Brown, Josić, & Reyes, 2007; Mazurek et al., 2003). They can also result from sensory events that have an impact on an entire population or from rapid modulatory effects. Moreover, for large neural populations, it appears that accurate descriptions of population-wide activity can require more than the typically measured pairwise correlations, but higher-order correlations as well (Montani et al., 2009; Ganmor et al., 2011; Yu et al., 2011).

The aim of our study is to improve our understanding of how correlated activity in these populations can affect the speed and accuracy of decisions that require accumulating sensory information over time. Faced with the wide range of possible mechanisms and structures of correlations alluded to above, we chose to focus on two models for population-wide correlations that illustrate a key distinction in how correlations can occur. These models have identical first-order and pairwise statistics, but differ in how each common spiking event involves either a small subset of the neurons (the subtractive case) or each neuron in the pool (the additive correlations case) (Kuhn et al., 2003; Staude et al., 2010).

Figure 9 quantifies this difference. Based on calculations in appendix  D, we plot the joint cumulant across k neurons in a pool under both subtractive and additive correlations. This statistic, computed over a subset of the neurons in a pool, provides a generalization of the notion of covariance to higher orders. In this way, it can be used to distinguish the collective activity resulting from the additive and subtractive models. While the additive model possesses a constant joint cumulant no matter how many neurons are included, the joint cumulant of k neurons falls off geometrically for the subtractive case. We conjecture that this is a statistical signature that could suggest when other, more general patterns of correlated activity—measured experimentally or arising in mechanistic models of neural circuits (Mazurek et al., 2003)—will produce similar effects on decisions. Exploring this conjecture using models and data is a target of our future research.

Figure 9:

The joint cumulants of the additive and subtractive models processes differ for pools of greater than two neurons. Under the additive model, the joint cumulants of the spike counts from N neurons are constant for all N>2. In contrast, the joint cumulants of the subtractive model decay geometrically as the pool size increases, and this difference helps to characterize the differences in higher-order correlations between the two models. (See Appendix  D for supplementary computations.)

Figure 9:

The joint cumulants of the additive and subtractive models processes differ for pools of greater than two neurons. Under the additive model, the joint cumulants of the spike counts from N neurons are constant for all N>2. In contrast, the joint cumulants of the subtractive model decay geometrically as the pool size increases, and this difference helps to characterize the differences in higher-order correlations between the two models. (See Appendix  D for supplementary computations.)

We summarize our main findings as follows. For both models of correlated spiking, decisions produced by a simple, linear spike integration model (i.e., a neural integrator) become slower and less accurate as correlations increase. However, a strong difference appears for decisions made using the optimal decision strategy (SPRT). Here, additive correlations have only a minor impact on decision performance, while subtractive correlations continue to strongly diminish this performance. The conclusion is that decision-making circuits, faced with subtractive correlated sensory populations, will invariably produce diminished decision performance and stand little to gain by implementing computations more complex than a simple integration of spikes over time and neurons. However, in the presence of additive correlations, circuit mechanisms that implement or approximate the SPRT—perhaps by a nonlinearity such as that shown in Figure 8 applied to the sum of incoming spikes—stand to produce substantially better decision performance than their linear counterparts.

In other contexts, nonlinear computations have also been shown to improve discrimination between two alternatives. Field and Rieke (2002; also see Field, Sampath, & Rieke, 2005) demonstrated the importance of a thresholding nonlinearity in pooling the responses of rod cells, where this nonlinearity served to reject background noise. Closer to the present setting, gating inhibition that prevents accumulation of noise samples before the onset of evidence-encoding stimulus can account for visual search performance (Purcell, Schall, Logan, & Palmeri, 2012), and recent results suggest that related nonlinearities can improve performance for mistuned neural integrators (Cain, Barreiro, Shadlen, & Shea-Brown, 2011; see also Cain & Shea-Brown, 2012).

Our cases in which correlations decrease performance—in particular, when spikes are linearly integrated—are consistent with several prior studies of the role of correlated activity in decision making (Zohary et al., 1994; Britten, Newsome, Shadlen, Celebrini, & Movshon, 1996; Cohen & Newsome, 2009). We note, however, two differences in our models. The first is the mechanism through which correlated spikes are generated: whereas we use additive and subtractive models based on Poisson processes, Britten et al. (1996) and Cohen and Newsome (2009) use a multivariate gaussian description of spike counts. The second is that in these studies, decisions are rendered after a duration that is fixed before the trial begins—either a single duration, (Britten et al., 1996) or one that is drawn from a distribution of reaction times (Cohen & Newsome, 2009). This is different from our setting, where incoming signals on each trial determine the reaction time through a bound crossing.

Our result, in the case of subtractive correlations, that linear integration of spikes closely approximates the optimal decision-making strategy is similar to findings of Beck et al. (2008). Specifically, they model a dense range of differently tuned populations and find that optimal Bayesian inference can be based on linear integration of inputs for a wide set of correlation models. Our additive case, however, behaves differently, as nonlinearities are needed to achieve the optimal strategy.

An aim of future work is extending the setting of our study to include orientation tuning curves as in Cohen and Newsome (2009) and Beck et al. (2008). This is more realistic for many decision tasks (including the direction discrimination task) and will also allow progress toward models with multiple decision alternatives. An important challenge will come from defining pairwise correlations that vary as a function of preferred tuning orientation (see Zohary et al., 1994, and Cohen & Newsome, 2008), while also including the full structure of correlating events across multiple cells in a realistic way. For example, in this article, additive correlating events occurred independently in the two populations; future work could take a more graded approach, in which only some events have an impact on the entire sensory population (as in an eyeblink or possibly an attentional shift during a visual task).

Moreover, cells with similar orientation preferences would be more likely to share additive common spike events (these will occur with a frequency determined by their pairwise correlations, which can be higher for cells with similar response properties: Zohary et al., 1994; Cohen & Newsome, 2008).

At least from the perspective of the simple class of decoders that arose in this article, which combines the total population output of all cells that favor either of the two task alternatives, this raises interesting complexities. Common spike events would involve different subsets of cells. Thus, correlated spiking will be more graded and could be harder to discount than for the simpler homogeneous case we study. We therefore speculate that additive correlations will have a stronger impact on decision-making performance for heterogeneous populations. However, richer decoders with nonlinearities that act separately on the output of different neurons could mediate this. The issue will require careful quantitative study before solid conclusions can be reached.

As long as each neuron remains modeled as a Poisson point process, the sequential accumulation theory used here will carry over directly. For example, models that introduce correlations by common gain fluctuations would provide a multiplicative model of joint fluctuations and may be amenable to our approach. This points to another limitation of this study and an opportunity for future work: the lack of temporal correlations in the statistics of the inputs. A model of correlations that includes spikes from a correlating train that are temporally jittered (Gutnisky & Josíc, 2010; Staude et al., 2010) could provide a starting place for a model of the input trains; however, defining updates to the likelihood ratio for the two competing hypotheses will be more difficult. Nevertheless, it will be interesting to see how our results carry over; in particular, there will be many more combinations of spike events that will contribute to increments for both spike integration and SPRT decision models.

While we therefore view this study as a first step in exploring many possibilities, our findings demonstrate how the population-wide structure of correlations—beyond pairwise correlation coefficients—can have a strong impact on the speed and accuracy of decisions and the circuit operations necessary to achieve optimal performance. This suggests that multielectrode and imaging technologies, together with theoretical work on neural coding, will continue to play an exciting role in understanding the structure of basic computations like decision making over time.

Appendix A:  Sequential Probability Ratio Test

A.1.  Nontrivial Root of the Moment-Generating Function.

The nontrivial real root of the moment-generating function (MGF) of a sampling distribution is critical to finding FC and DT of an independently sampled sequential hypothesis test (via equations 2.14 and 2.16). For the SPRT, the increment distribution is given in equation 2.18 as
formula
A.1
The “correct” hypothesis H1 is in the numerator in order to orient a crossing of the positive decision threshold with a correct choice. Correspondingly, the probability of observing a given sample Sip, Sin is known from assumption of this hypothesis and by definition follows the distribution
formula
A.2
where the independence assumption of the spike count vectors from the two separate pools Sip and Sin has allowed the factoring of the distribution Dropping the sampling index i for notational convenience, the MGF can then be computed as
formula
A.3
The nontrivial root () can then be seen by inspection (see equation 2.19):
formula
A.4
We note that this computation is fully general, without any assumptions on the structure of correlations within and across pools.

A.2.  E[w], Independent Activity

The other parameter of the sampling distribution critical to computing the FC and DT functions, E[W], is computed for independent spike count distributions (; see equation 2.20) as follows (see also Zhang & Bogacz, 2010):
formula
A.5
formula
A.6
formula
A.7
formula
A.8
formula
A.9
formula
A.10
formula
A.11
When this quantity is substituted into equation 2.16, the will cancel off, implying that DT is not a function of the sampling increment size. We compute this quantity for correlated spike count distributions next.

A.3.  E[w], Additive Correlations Model

When neurons within pools are correlated, the joint PDF of the spike count vector is no longer decomposable into the product of the marginal distributions (the critical step between equations A.8 and A.9). However, an expression for E[W] can be obtained in the limit as by repeatedly expanding via Taylor series about throughout the computation.

First, we simplify the expression for the expected increment by using the independence of the two pools:
formula
A.12
Next we expand each term to first order in . Below, we demonstrate the expansion for only the “preferred” population; the calculation for the null pool follows by exchanging and . In that case, by using the law of total expectation conditioned on the number of spikes in the common spike train “shared” across the pool (which spikes at a rate ), we have
formula
A.13
formula
A.14
formula
A.15
Taking the case of ,
formula
A.16
The aim here is to take advantage of the conditioning. Because the spike counts of neurons within the same pool are conditionally independent, given the number of spikes in the correlating spike train, the joint distribution across the vector sp becomes the product of the conditioned marginal distributions. However, this is true only for the first factor in the summand of equation A.16. To continue, we must expand the log ratio of the probability distributions, using the law of total probability, in :
formula
A.17
formula
A.18
formula
A.19
formula
A.20
formula
A.21
formula
A.22
Moreover, the N−term summation in equation A.16 need only be over , as higher values will produce contributions of higher than first order in . Two cases emerge for the expansion: if si=0 for any i, , and we have
formula
A.23
But if si=1 for all i, we can compute the expression directly by total probability, as there are only four possible ways for the event to originate. To first-order in , this is
formula
A.24
formula
A.25
Therefore, this single element of the sum offers no order one contribution (it is multiplied by which in itself is O); thus,
formula
A.26
The case of is simpler, as only zero-order terms must be kept (due to the coefficient in equation A.15). Recycling the expansion from equation A.23, we have that to zero order,
formula
A.27
Finally, combining equations A.15, A.26, and A.27, we have that
formula
A.28
formula
A.29
formula
A.30
Repeating the exercise for the other component of equation A.12 amounts to exchanging p for n; adding everything together gives the final result, to first order in :
formula
A.31
We note here that as and , we reproduce the results that would be expected from equation A.11. Also, a more intuitive and tractable computation can be done for an analogous additively correlated Bernoulli process, resulting in the same solution.

A.4.  E[w], Subtractive Correlations Within Pools

In the case of subtractive correlations within pools, the derivation of E[W] is the same as the additive correlation case, up to equation A.14. In this case, however, we now have
formula
A.32
Taking the case first, we notice that it is impossible for any spikes to occur without a spike in the correlating spike train:
formula
A.33
Because of this, we can simplify:
formula
A.34
formula
A.35
Interestingly, after conditioning on the number of correlating spikes, the probability of the zero vector (or any vector sp) is the same under both H0 and H1:
formula
A.36
We then expand to first order in :
formula
A.37
formula
A.38
In the case of , only zero-order terms must be computed. When computing
formula
A.39
the summation only carries over for each element of sp. The case of sp=0 provides no contribution at zero order, as can be seen by equation A.38. For any other case, there will be a degeneracy in the expansion of the log, caused by an absence of order 0 terms:
formula
A.40
formula
A.41
formula
A.42
Therefore, to first order in ,
formula
A.43
formula
A.44
Combining equations A.32, A.35, A.38, and A.44, we find that
formula
A.45
As before, exchanging p for n takes care of the expression for the null pool, and adding together gives
formula
A.46
Once again, as and , we reproduce the results that would be expected from equation A.11.

Appendix B:  Spike Integration

B.1.  Independent Spiking.

Computing FC and DT for the spike integration accumulation model relies on computation of the MGF for the sampling distribution. We begin with several identities that will be useful below. The MGF for the sum of N independent random variables is
formula
B.1
Given that the MGF for a random variable , it follows that
formula
B.2
Finally, the MGF for a Poisson random variable is
formula
B.3
Given the definition of the increment variable in equation 2.17 and noting that each spike count random variable is independent, we can combine these observations to construct the MGF for the sampling random variable over a time window :
formula
B.4
Now the nontrivial root can be calculated (see equation 2.21):
formula
B.5
Because the MGF is known explicitly, the computation of the expected increment is simple (see equation 2.22):
formula
B.6

B.2.  Additive Correlations Model.

When additive correlations are introduced within pools, the spike count distribution MGF over a time period can still be broken into the product of two separate MGFs, one each for the preferred and null pools, which are identical in form but differ in their Poisson rate parameters (indicated by the semicolon):
formula
B.7
For the preferred pool, the spike count can be broken into two independent contributions: spikes from the shared (i.e., correlating) spike train that get counted N times (firing at a rate ) and spikes from the N independent spike trains that get counted once (each firing at a rate ):
formula
B.8
The MGF for the shared spike train can be computed directly from the definition, using its probability mass function (PMF),
formula
B.9
and thus,
formula
B.10
The MGF for the independent spike trains follows from section B.1, giving the form of the MGF of the increment over a time as
formula
B.11
After rearranging, h0 is implicitly defined as the nontrivial root of
formula
B.12
As , we recover the solution from section B.1. The expected increment can be directly computed as
formula
B.13
Note that this last expression is the same as in the independent case (see equation 2.22), as expected, and that unlike the SPRT, no limits in were necessary to compute the parameters for the FC and DT functions.

B.3.  Subtractive Correlations Model.

With subtractive correlations, we again derive an MGF for the spike count vector of an individual pool and apply equation B.7. In this case, however, the number of spikes in a pool, conditioned on the number of spikes in that pool correlating train, is binomially distributed. Thus, applying the law of total probability,
formula
B.14
using the definitions for the PMFs of the Poiss and Binom[N, k; p] distributions, we have
formula
B.15
After applying equation B.7 with this MGF for both the preferred and null populations, we find an implicit relationship for the nontrivial real root t=h0 that does not depend on :
formula
B.16
As before, the expected increment can be directly computed by differentiation, and we find the same expression as in the additive correlation case:
formula
B.17

Appendix C:  Speed and Accuracy Functions with Overshoot

The identities provided in equations 2.14 and 2.16 are very useful; however, they are simplifications of the full formulas for FC and DT (assuming ) derived by Wald (1944), which are
formula
C.1
formula
C.2
Specifically, equations 2.14 and 2.16 hold under the assumption that the value of the state variable on the decision step is exactly equal to the decision threshold. In practice, however, this no-overshoot assumption may not provide a particularly good approximation.

A correction term based on the mean of the overshoot distribution—that is, the distribution of the random variable defined by the excess distance over either the positive or negative threshold on the threshold crossing step—is suggested by Lee et al. (1994). This correction is based on the Taylor expansion of the conditional expectations in equation C.1 and takes the form of a shift in the decision threshold. A correction of this form is relevant to our analysis, as the performance of two model is compared parametrically in the threshold to isolate the effects of the speed-accuracy trade-off imparted by freely adjusting the threshold.

Denote the value of En conditioned on crossing the first threshold as , and let overshoot the random variable, with mean uX. Expanding the conditional expectation (although dropping the conditional notation for convenience) via a Taylor series centered on this mean (the so-called delta method), we have
formula
C.3
Choosing yields an expression of Wald's truncation:
formula
C.4
formula
C.5
Here we see that if , each term in the expansion becomes zero, and Wald's approximation holds exactly. If En overshoots , error will accumulate at each term in the expansion as a function of the moments of the overshoot distribution. If instead the expansion is performed about , a threshold-shifted approximation expresses the truncation error terms of the second and higher centered moments of the overshoot distribution:
formula
C.6
formula
C.7
In practice, the overshoot distribution is often nonzero. However, if its mean can be calculated and h0<0, the truncation error associated with the latter approximation might provide a more favorable approximation as long as the higher-order moments do not grow too large. For the decision time, using this alternative approximation is exactly correct and results in no additional error.

Appendix D:  Joint Cumulants for the Additive and Subtractive Model

Staude et al. (2010) suggest that cumulants provide a “natural and intuitive higher-order generalization of the covariance” for multineuron spiking. The two models of correlated activity examined here are indistinguishable when examining only first-order (i.e., mean firing rate) or second-order (i.e., pairwise correlations) statistics. Here, we derive the joint cumulants for each of these two models to clarify how the spike count distributions produced by the two models differ at higher orders.

The derivation relies on the conditional independence of the spike counts for each neuron in a pool, conditioned on the spike count in the common spike train. Let be the random variables giving spike counts in windows of size from each of the neurons in a correlated pool, and let be the spike count in the common spike train. The law of total cumulance (Brillinger, 1969) allows a relatively simple expression of the joint cumulant on k members of (Because of the homogeneity of the pool, we express the kth joint cumulant as calculated on , but the same expression holds for any k-sized subset of ):
formula
D.1
Here is the set of all partitions of , for example,
formula
D.2
formula
D.3
formula
D.4
and is the conditional joint cumulant over the set of all spike counts indexed by an element of Bj—that is, the set .
In our special case, whenever |Bj|>1, owing to the conditional independence of each neuron given the common spike train. Moreover, from the definition of the cumulant, the term of equation D.1 for the partition that contains such a block Bj will also be zero. This implies that the only that contributes in equation D.1 is (i=1 in the example of equation D.4); thus,
formula
D.5
where we have used the fact that the first cumulant is simply the expected value. Using the cumulant-generating function, we then have a formula for the joint cumulant:
formula
D.6
Thus, for the two models of correlations (assuming a firing rate ), we have:
Subtractive:
formula
D.7
formula
D.8
formula
D.9
formula
D.10
formula
D.11
Additive:
formula
D.12
formula
D.13
formula
D.14
formula
D.15
formula
D.16
Comparing equations D.11 and D.16 (see also Figure 9), we see agreement for as expected. These correspond to the intended firing rate and pairwise covariance of neurons within the pool. However, for k>2, we see the signature of the differences in the structure of the correlations. For the subtractive correlations model, the joint cumulant decays geometrically as more and more neurons are considered. In contrast, the joint cumulant remains constant for the additive correlations model.

Acknowledgments

We thank Yu Hu, Adrienne Fairhall, and Michael Shadlen for their valuable comments on the manuscript. We gratefully acknowledge the support of a Career Award at the Scientific Interface from the Burroughs Welcome Fund and NSF grant CAREER DMS-1026125 (E.S.B.), and the University of Washington eScience Institute's Hyak computer cluster.

References

Abbott
,
L. F.
, &
Dayan
,
P.
(
1999
).
The effect of correlated variability on the accuracy of a population code
.
Neural Computation
,
11
(
1
),
91
101
.
Aertsen
,
A.
,
Gerstein
,
G.
,
Habib
,
M.
, &
Palm
,
G.
(
1989
).
Dynamics of neuronal firing correlation: Modulation of “effective connectivity.”
Journal of Neurophysiology
,
61
(
5
),
900
917
.
Averbeck
,
B. B.
,
Latham
,
P. E.
, &
Pouget
,
A.
(
2006
).
Neural correlations, population coding and computation
.
Nature Reviews Neuroscience
,
7
(
5
),
358
366
.
Bair
,
W.
,
Zohary
,
E.
, &
Newsome
,
W. T.
(
2001
).
Correlated firing in macaque visual area MT: Time scales and relationship to behavior
.
Journal of Neuroscience
,
21
(
5
),
1676
1697
.
Beck
,
J. M.
,
Ma
,
W. J.
,
Kiani
,
R.
,
Hanks
,
T.
,
Churchland
,
A. K.
,
Roitman
,
J.
, et al
(
2008
).
Probabilistic population codes for bayesian decision making
.
Neuron
,
60
(
6
),
1142
1152
.
Binder
,
M.
, &
Powers
,
R.
(
2001
).
Relationship between simulated common synaptic input and discharge synchrony in cat spinal motoneurons
.
Journal of Neurophysiology
,
86
(
5
),
2266
2275
.
Bogacz
,
R.
,
Brown
,
E.
,
Holmes
,
P.
, &
Cohen
,
J. D.
(
2006
).
The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks
.
Psychological Review
,
113
(
4
),
700
765
.
Brillinger
,
D. R.
(
1969
).
The calculation of cumulants via conditioning
.
Ann. Inst. Stat. Math. Annals of the Institute of Statistical Mathematics
,
21
(
1
),
215
218
.
Britten
,
K. H.
,
Newsome
,
W. T.
,
Shadlen
,
M. N.
,
Celebrini
,
S.
, &
Movshon
,
J. A.
(
1996
).
A relationship between behavioral choice and the visual responses of neurons in macaque MT
.
Visual Neuroscience
,
13
,
87
100
.
Britten
,
K. H.
,
Shadlen
,
M. N.
,
Newsome
,
W. T.
, &
Movshon
,
J. A.
(
1992
).
The analysis of visual motion: A comparison of neuronal and psychophysical performance
.
Journal of Neuroscience
,
12
(
12
),
4745
4765
.
Britten
,
K. H.
,
Shadlen
,
M. N.
,
Newsome
,
W. T.
, &
Movshon
,
J. A.
(
1993
).
Responses of neurons in macaque MT to stochastic motion signals
.
Visual Neuroscience
,
10
(
06
),
1157
1169
.
Bruss
,
F. T.
, &
Robertson
,
J. B.
(
1991
).
“Wald's lemma” for sums of order statistics of i.i.d. random variables
.
Advances in Applied Probability
,
23
(
3
),
612
623
.
Cain
,
N.
,
Barreiro
,
A.
,
Shadlen
,
M.
, &
Shea-Brown
,
E.
(
2011
).
A favorable tradeoff between robustness and performance in sequential decision tasks
. In
Computational and Systems Neuroscience Abstracts
.
Cain
,
N.
, &
Shea-Brown
,
E.
(
2012
).
Computational models of decision making: Integration, stability, and noise
.
Current Opinion in Neurobiology
,
22
,
1
7
.
Cohen
,
M. R.
, &
Kohn
,
A.
(
2011
).
Measuring and interpreting neuronal correlations
.
Nature Publishing Group
,
14
(
7
),
811
819
.
Cohen
,
M. R.
, &
Newsome
,
W. T.
(
2008
).
Context-dependent changes in functional circuitry in visual area MT
.
Neuron
,
60
(
1
),
162
173
.
Cohen
,
M. R.
, &
Newsome
,
W. T.
(
2009
).
Estimates of the contribution of single neurons to perception depend on timescale and noise correlation
.
Journal of Neuroscience
,
29
(
20
),
6635
6648
.
De La Rocha
,
J.
,
Doiron
,
B.
,
Shea-Brown
,
E.
,
Josić
,
K.
, &
Reyes
,
A.
(
2007
).
Correlation between neural spike trains increases with firing rate
.
Nature
,
448
(
7155
),
802
806
.
Field
,
G.
, &
Rieke
,
F.
(
2002
).
Nonlinear signal transfer from mouse rods to bipolar cells and implications for visual sensitivity
.
Neuron
,
34
(
5
),
773
785
.
Field
,
G.
,
Sampath
,
A.
, &
Rieke
,
F.
(
2005
).
Retinal processing near absolute threshold: From behavior to mechanism
.
Annual Review of Physiology
,
67
,
491
514
.
Ganmor
,
E.
,
Segev
,
R.
, &
Schneidman
,
E.
(
2011
).
Sparse low-order interaction network underlies a highly correlated and learnable neural population code
.
Proceedings of the National Academy of Sciences
,
108
(
23
),
9679
9684
.
Ghosh
,
B. K.
, &
Sen
,
P. K.
(
1991
).
Handbook of sequential analysis
.
New York
:
M. Dekker
.
Gold
,
J. I.
, &
Shadlen
,
M. N.
(
2002
).
Banburismus and the brain decoding the relationship between sensory stimuli, decisions, and reward
.
Neuron
,
36
(
2
),
299
308
.
Goldman
,
M.
,
Compte
,
A.
, &
Wang
,
X.
(
2009
).
Neural integrator models
. In
L. R. Squire (Ed.)
,
Encyclopedia of neuroscience
(Vol.
6
, pp.
165
178
).
Orlando, FL
:
Academic Press
.
Gutnisky
,
D. A.
, &
Josíc
,
K.
(
2010
).
Generation of spatiotemporally correlated spike trains and local field potentials using a multivariate autoregressive process
.
Journal of Neurophysiology
,
103
(
5
),
2912
2930
.
Huang
,
X.
, &
Lisberger
,
S. G.
(
2009
).
Noise correlations in cortical area mt and their potential impact on trial-by-trial variation in the direction and speed of smooth-pursuit eye movements
.
Journal of Neurophysiology
,
101
(
6
),
3012
3030
.
Kuhn
,
A.
,
Aertsen
,
A.
, &
Rotter
,
S.
(
2003
).
Higher-order statistics of input ensembles and the response of simple model neurons
.
Neural Computation
,
15
(
1
),
67
101
.
Latham
,
P. E.
, &
Roudi
,
Y.
(
2011
).
Role of correlations in population coding
.
Arxiv preprint arXiv:1109.6524, q-bio.NC
.
Lee
,
J.
,
Park
,
C.
, &
Kim
,
B.
(
1994
).
An estimation method for the excess over the boundaries in the SPRT and its applications
.
Sequential Analysis
,
13
(
2
),
127
144
.
Mazurek
,
M. E.
,
Roitman
,
J. D.
,
Ditterich
,
J.
, &
Shadlen
,
M. N.
(
2003
).
A role for neural integrators in perceptual decision making
.
Cerebral Cortex
,
13
(
11
),
1257
1269
.
Montani
,
F.
,
Ince
,
R.A.A.
,
Senatore
,
R.
,
Arabzadeh
,
E.
,
Diamond
,
M. E.
, &
Panzeri
,
S.
(
2009
).
The impact of high-order interactions on the rate of synchronous discharge and information transmission in somatosensory cortex
.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
,
367
(
1901
),
3297
3310
.
Newsome
,
W. T.
,
Britten
,
K. H.
,
Movshon
,
J. A.
, &
Shadlen
,
M. N.
(
1989
).
Single neurons and the perception of motion
. In
D.M.-K. Lam & C. D. Gilbert
(Eds.),
Neural mechanisms of visual perception
(pp.
171
198
).
Woodlands, TX
:
Portfolio Pub. Co
.
Niebur
,
E.
(
2007
).
Generation of synthetic spike trains with defined pairwise correlations
.
Neural Computation
,
19
(
7
),
1720
1738
.
Purcell
,
B. A.
,
Schall
,
J. D.
,
Logan
,
G. D.
, &
Palmeri
,
T. J.
(
2012
).
From salience to saccades: Multiple-alternative gated stochastic accumulator model of visual search
.
Journal of Neuroscience
,
32
(
10
),
3433
3446
.
Ratcliff
,
R.
(
1978
).
A theory of memory retrieval
.
Psychological Review
,
85
(
2
),
59
108
.
Salinas
,
E.
, &
Sejnowski
,
T.
(
2001
).
Correlated neuronal activity and the flow of neural information
.
Nature Reviews Neuroscience
,
2
(
8
),
539
550
.
Salzman
,
C. D.
,
Murasugi
,
C. M.
,
Britten
,
K. H.
, &
Newsome
,
W. T.
(
1992
).
Microstimulation in visual area MT: Effects on direction discrimination performance
.
Journal of Neuroscience
,
12
(
6
),
2331
2355
.
Shadlen
,
M. N.
,
Britten
,
K. H.
,
Newsome
,
W. T.
, &
Movshon
,
J. A.
(
1996
).
A computational analysis of the relationship between neuronal and behavioral responses to visual motion
.
Journal of Neuroscience
,
16
(
4
),
1486
1510
.
Shadlen
,
M.
,
Hanks
,
T.
,
Churchland
,
A.
,
Kiani
,
R.
, &
Yang
,
T.
(
2006
).
The speed and accuracy of a simple perceptual decision: A mathematical primer
. In
K. Doya, S. Ishii, A. Pouget, & R. P. N. Rao
(Eds.),
Bayesian brain: Probabilistic approaches to neural coding
(pp.
209
237
).
Cambridge, MA
:
MIT Press
.
Shadlen
,
M.
, &
Newsome
,
W.
(
1998
).
The variable discharge of cortical neurons: Implications for connectivity, computation, and information coding
.
Journal of Neuroscience
,
18
(
10
),
3870
3896
.
Smith
,
M. A.
, &
Kohn
,
A.
(
2008
).
Spatial and temporal scales of neuronal correlation in primary visual cortex
.
Journal of Neuroscience
,
28
(
48
),
12591
12603
.
Softky
,
W.
, &
Koch
,
C.
(
1993
).
The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs
.
Journal of Neuroscience
,
13
(
1
),
334
350
.
Staude
,
B.
,
Grün
,
S.
, &
Rotter
,
S.
(
2010
).
Higher-order correlations and cumulants
. In
S. Grün & S. Rotter
(Eds.),
Analysis of parallel spike trains
(pp.
253
280
).
New York
:
Springer
.
Tuckwell
,
H. C.
(
1989
).
Stochastic processes in the neurosciences
.
Philadelphia
:
Society for Industrial and Applied Mathematics
.
Wald
,
A.
(
1944
).
On cumulative sums of random variables
.
Annals of Mathematical Statistics
,
15
,
283
296
.
Wald
,
A.
, &
Wolfowitz
,
J.
(
1948
).
Optimum character of the sequential probability ratio test
.
Annals of Mathematical Statistics
,
19
(
3
),
326
339
.
Wang
,
X.-J.
(
2002
).
Probabilistic decision making by slow reverberation in cortical circuits
.
Neuron
,
36
(
5
),
955
968
.
Yu
,
S.
,
Yang
,
H.
,
Nakahara
,
H.
,
Santos
,
G.
,
Nikolić
,
D.
, &
Plenz
,
D.
(
2011
).
Higher-order interactions characterized in cortical activity
.
Journal of Neuroscience
,
31
(
48
),
17514
17526
.
Zhang
,
J.
, &
Bogacz
,
R.
(
2010
).
Optimal decision making on the basis of evidence represented in spike trains
.
Neural Computation
,
22
(
5
),
1113
1148
.
Zohary
,
E.
,
Shadlen
,
M. N.
, &
Newsome
,
W. T.
(
1994
).
Correlated neuronal discharge rate and its implications for psychophysical performance
.
Nature
,
370
,
140
143
.