Stimulus from the environment that guides behavior and informs decisions is encoded in the firing rates of neural populations. Neurons in the populations, however, do not spike independently: spike events are correlated from cell to cell. To what degree does this apparent redundancy have an impact on the accuracy with which decisions can be made and the computations required to optimally decide? We explore these questions for two illustrative models of correlation among cells. Each model is statistically identical at the level of pairwise correlations but differs in higher-order statistics that describe the simultaneous activity of larger cell groups. We find that the presence of correlations can diminish the performance attained by an ideal decision maker to either a small or large extent, depending on the nature of the higher-order correlations. Moreover, although this optimal performance can in some cases be obtained using the standard integration-to-bound operation, in others it requires a nonlinear computation on incoming spikes. Overall, we conclude that a given level of pairwise correlations, even when restricted to identical neural populations, may not always indicate redundancies that diminish decision-making performance.
Sensory information is often encoded in irregularly spiking neural populations. One well-studied example is given by direction-selective cells in area MT, whose firing rates depend on the degree and direction of coherent motion in the visual field (Britten, Shadlen, Newsome, & Movshon, 1992, 1993; Newsome, Britten, Movshon, & Shadlen, 1989; Salzman, Murasugi, Britten, & Newsome, 1992; Shadlen, Britten, Newsome, & Movshon, 1996). Individual neurons in MT, as in many other brain areas, exhibit noisy and variable spiking (Newsome et al., 1989) and can be modeled by Poisson point processes (Softky & Koch, 1993; Tuckwell, 1989). Moreover, this variable spiking is generally not independent from cell to cell. Returning to our example, a number of studies have measured pairwise correlations in MT during direction discrimination tasks as well as smooth-pursuit eye movements (Huang & Lisberger, 2009; Bair, Zohary, & Newsome, 2001; Zohary, Shadlen, & Newsome, 1994; Cohen & Newsome, 2008). Although this measurement is a subtle endeavor experimentally, a number of studies suggest a value near (Cohen & Kohn, 2011, summarize these observations for a number of brain areas.)
What are the consequences of correlated spike variability for the speed and accuracy of sensory decisions? The role of pairwise correlations in stimulus encoding has been the subject of many prior studies (Salinas & Sejnowski, 2001; Latham & Roudi, 2011; Averbeck, Latham, & Pouget, 2006; Abbott & Dayan, 1999). The results are rich, showing that correlations can have positive, negative, or neutral effects on encoded information. Our study serves to extend this body of work in two ways. First (as done in a different context by Ganmor, Segev, & Schneidman, 2011, and Montani et al., 2009), we contrast the effect of correlations that have the same pairwise level but a different structure at higher orders.
Second (as in Cohen & Newsome, 2008, and Beck et al., 2008), we consider the impact of correlations on decisions that unfold over time by combining a sequence of samples observed over time in the sensory populations. A classic example that we use to describe and motivate our studies is the moving dots direction discrimination task. Here, a fraction of dots in a visual display move coherently in a given direction, while the remainder exhibit random motion; the task is to identify the direction from two possible alternatives. Decisions become increasingly accurate as subjects take (or are given) longer to make the decision.
In analyzing decisions that develop over time, we use a central result from sequential analysis. This is the sequential probability ratio Test (SPRT; Wald & Wolfowitz, 1948; Gold & Shadlen, 2002), which linearly sums the log odds of independent observations from a sampling distribution until a predetermined evidence threshold is reached. The SPRT is the optimal statistical test in that it gives the minimum expected number of samples for a required level of accuracy in deciding among two task alternatives.
We pose two related questions based on the SPRT. First, how does the presence of correlated spiking in the sampled pools affect the speed and accuracy of decisions produced by the SPRT? Our focus is on how the structure of population-wide correlations determines the answer. Second, how does the presence of correlated spiking affect the computations that are necessary to perform the SPRT? This question is intriguing, because the SPRT may be performed using the simple, linear computation of integrating spikes over time and across the populations for a surprisingly broad class of inputs, including independent Poisson spike trains (Zhang & Bogacz, 2010; Bogacz, Brown, Holmes, & Cohen, 2006). Thus, in this setting, optimal decisions can be made by integrator circuits (Bogacz et al., 2006; Goldman, Compte, & Wang, 2009; Cain & Shea-Brown, 2012). Our goal here is to determine whether and when this continues to hold true for correlated neural populations.
We answer these questions for two illustrative models of correlated, Poissonian spiking. We emphasize that the spikes that these models produce are indistinguishable at the level of both single cells and pairs of cells. However, they differ in higher-order correlations in that they can be distinguished only by examining the statistics of three or more neurons. In the first model, correlations are introduced using shared spike events across the entire pool. In this case, optimal inference via the SPRT produces fast and accurate decisions but depends on a nonlinear computation. As a result, the simpler computation of spike integration requires, on average, longer times to reach the same level of accuracy. In contrast, when shared spiking events are more frequent but are common to fewer neurons within a pool, performance under the SPRT is significantly diminished. However, in this case, both SPRT and spike integration perform comparably, so a linear computation can produce decisions that are close to optimal.
2. Models of Evidence Accumulation and Encoding
2.1. Model Neural Populations and the Decision Task.
Throughout the text, we present results at C=6.4; however, the results do not depend on this particular value of dot motion or its precise relationship to firing rate.
2.2. Accumulating Spikes and Evidence Over Time.
2.3. The Case of Independent Neurons.
2.4. Correlated Neural Populations: The Additive and Subtractive Models.
We next describe two models for introducing correlations into the Poisson spike trains of each neural population. These models have identical first-order and pairwise statistics but differ in their higher-order correlations; that is, the models cannot be distinguished by examining pairs of neurons alone but require simultaneous observation of larger groups of cells. The distinction between the models is that one features common spiking events that occur across the entire population, and the other only random subsets of neurons. Both models are studied in Kuhn, Aertsen, and Rotter (2003) and Staude, Grün, and Rotter (2010) and rely on shared input from a single correlating process to generate the correlations in each pool. These authors termed the two models SIP and MIP for, respectively, single- and multiple-interaction process; here we use the descriptors additive and subtractive. In both models, a realization of correlated spike trains that provide the input to the accumulation models is achieved with a common correlating train.
Before describing the models in detail, we note that in this study, these models are statistical approaches chosen to illustrate a range of impacts that correlations can have on decision making (see also Gutnisky & Josíc, 2010, and Niebur, 2007). In contrast, in neurobiological networks, correlated spiking arises through a complex interplay of many mechanisms, including recurrent connectivity and shared feedforward interactions (e.g., Aertsen, Gerstein, Habib, & Palm, 1989; Shadlen & Newsome, 1998; Smith & Kohn, 2008). While beyond the scope of this article, avenues for bridging the gap between statistical and network-based models of correlations in the context of decision making are considered in section 6.
The first case is the additive model, in which the spike train for each neuron is generated as the sum of two homogeneous Poisson point processes. This model might, for example, capture the effect of shared pulses of activity arising from common sensory or modulatory events upstream. The first Poisson train is generated with an overall firing rate of , where is the intended firing rate of the neuron and is the intended pairwise spike count correlation between any two neurons in the pool. The second train, with a rate of , is added to every neuron in the pool and serves as the common source of correlations. An example of this model of spike train generation is depicted in the rastergrams in Figures 1A and 1B; the common spike events are evident as shared spikes across the entire population.
The second case is the subtractive correlations model, in which correlated spikes are generated through random, independent deletions from an original “mother” spike train; we refer to this as the correlating spike train (Kuhn et al., 2003). Correlations with this structure might arise, for example, as a consequence of synaptic failure in connections from a common pool of upstream neurons. In order to achieve an overall firing rate for the pool of spikes per second, with a pairwise correlation between any two individual neurons, the correlating train has a rate of spikes per second. Then, for each neuron in the pool, a spike is included from this train i.i.d. with a probability of . In our model, there is a separate correlating spike train for each of the two independent populations. An example of the subtractive model of spike train generation is depicted in the rastergrams in Figures 1D and 1E.
In summary, the two models include correlated spike events that originate from a single mother train. Although they produce identical correlations among cell pairs, these events are distributed in different ways across the entire population. We note that the results of Zhang and Bogacz (2010) can be seen as a limiting case as of either the additive or subtractive models.
3. Subtractive Correlations and Decision-Making Performance
3.1. The SPRT Decision-Making Model.
Comparing these values against those of the independent SPRT given in equations 2.19 and 2.20, we see that the only effect of correlations is a scaling of the expected increment via . In the limit as , this scale factor approaches N, which in turn reduces decision time (the scale factor is inversely proportional to DT via equation 2.16). However, as , the scale factor itself approaches 1; this agrees with the intuition that as all neurons become perfectly redundant, the performance should resemble that of a single neuron. In fact, the mechanism of the SPRT on a given sample can be seen as inferring the firing rate of the correlating train from a derived vector of noisy random variables. As N gets large, then, performance should be limited by performing an SPRT on the correlating mother trains themselves. This is precisely what happens when in equation 3.2: we obtain corresponding to decision making based on mother spikes of rate and .
One consequence of this interpretation is that the particular realization of a spike vector (in a sufficiently small time bin ) carries no evidence about the decision of H1 versus H0, beyond its identity as either the zero vector 0 or not. Of course, this is a consequence of the construction of the subtractive correlations model, as the spike deletions that create the realization of the spike vector have no dependence on the firing rate of the population. Concretely, then, the increments (or decrements) are based solely on whether the vector of spikes in the preferred (or null) pool contains any spikes at all; the actual number of spikes is irrelevant in the SPRT.
It follows that the accumulation process En is a discrete-space random walk, with steps . To see this, note that for sufficiently small , there are only three possibilities for how spikes will be emitted from the two populations. First, both the preferred and null pools could produce no spikes. This event provides no information to distinguish the firing rates of the pools, so the increment is 0. Second, one of the pools could produce a vector of spikes caused by i.i.d. deletions from the mother spike train. If the spiking pool is the preferred one, each possible nonzero spike vector will increment the accumulator by the log of the ratio ; the opposite sign occurs if the null pool spikes. Events in which both pools spike are of higher order in and thus become negligible for small time windows.
The discrete nature of the SPRT effect causes the FC curve in Figure 2A to take on only discrete values of accuracy; a small increase in above a multiple of will not improve accuracy because En on the final, threshold-crossing-step will overshoot the threshold. This also explains why some of the FC values at a given do not lie on the theoretical line defined by equation 2.14; that equation is exactly true only in the case of zero overshoot past the threshold. We return to this point later in the main text and also in appendix C.
We next insert the values for h0 and E[W] computed above into equations 2.14 and 2.16 and plot the resulting speed-accuracy curves relating DT and FC parametrically in the threshold (see Figure 2B). (We plot the full FC and RT functions, although only discrete values of performance along each of the lines are achievable in practice, as indicated by the dots for the case; see the caption). By comparing speed-accuracy curves for different values of ranging from 0 to 0.3, we see our first main result: introducing subtractive correlations within neural populations substantially diminishes the best possible decision performance: that obtained via the SPRT. We next derive the analogous results for the simpler spike integration model.
3.2. The Spike Integration Decision-Making Model.
Having established this, we pause to note a subtlety in our analysis. Figures 3B and 3C show FC and DT as a function for both simulated data and plots of equations 2.14 and 2.16. The solid lines are the graphs of those equations as written (using the values for h0 and E[W] in equations 3.4 and 3.3), and the mismatch between the lines and the data is a consequence of overshoot past the threshold. The broken line is a graph of the same formulas, with a shift in , an offset computed as the sample mean of the overshoot distribution (see Figure 6 as well as the discussion in appendix C; also Ghosh & Sen, 1991; Lee et al., 1994). This correction term helps the FC and DT equations better approximate the data when there is potential overshoot. Interestingly, however, parametric plots like Figure 3A already take this effect into account.
4. Additive Correlations and Decision-Making Performance
4.1. The SPRT Decision-Making Model.
As in the subtractive correlations model, En here also becomes a discrete random walk with increment . This can be seen by noting that for either pool, in a sufficiently small window, only one of two events is possible: no spikes occur at all, or a single spike occurs in one neuron, in one of the two pools. The first case is uninformative about either H1 or H0. The second case occurs with probability under H1 and under H0 (here, if the spike occurred in the preferred pool, for example). Taking the log ratio, we find our increment is independent of correlations. The resulting decision accuracy (FC) is plotted versus threshold in Figure 4A and is qualitatively similar to the subtractive correlations case, with plateaus following from the discrete nature of En. However, the speed-accuracy trade-off pictured in Figure 4B is very different from that found in the subtractive correlations model.
In particular, we see our third main result: the impact of additive correlations on optimal (SPRT) decision performance is relatively minor. For example, in the presence of pairwise correlations as strong as , the mean decision time required to reach a typical value of accuracy is increased by only a few milliseconds compared with the independent case, instead of by hundreds of milliseconds for subtractive correlations. Equation 4.2 offers an intuitive explanation for this fact: E[W] is inversely proportional to DT and does not diminish nearly as fast for additive correlations as for subtractive correlations (see equation 3.2).
4.2. The Spike Integration Decision-Making Model.
However, the assumption that integrated spikes do not overshoot the decision threshold might seem suspect under the additive model of correlations, as there is a possibility that the threshold crossing step might occur as a result of every neuron in a pool simultaneously spiking at once. In fact, when the number of neurons in the pool is large (as in the cases we consider), additive correlations can indeed cause significant overshooting of thresholds. Importantly, and unlike for subtractive correlations, this effect cannot be compensated by a constant offset in the decision threshold.
Figure 5B demonstrates the consequences for the speed-accuracy trade-off. Here, when the spike integration model is simulated directly, we see a surprising nonmonotonic relationship between FC and DT in the presence of additive correlations of strength . This violates the usual intuition that accuracy should increase at slower decision speeds. The explanation comes from the fact that as the decision threshold is raised, DT correspondingly increases while accuracy suffers, a consequence of not finishing a trial before a (relatively rare) spike in a correlating spike train in one of the two pools causes the accumulator to jump far beyond the threshold.
For large thresholds, the sequential sampling theory of equations 2.14 and 2.16, which assumes no overshoot, accurately approximates the simulated data; however, for low values of , the approximation is poor. In fact, the inset to Figure 5B shows that in this regime, the decision-making performance of the spike integration model is far better described by the theory predicted by the SPRT. The intuition behind this observation is that for short reaction times, there is a small probability of a shared spike that will send the integrator significantly over the threshold. This allows accumulation to occur one spike at a time (for sufficiently small ), where each spike arrives from an independent spike train. As we have seen, the process of integrating independent spikes is equivalent to the SPRT. It is only at longer decision times, when the chances of having integrated a large common spike event are larger, that a significant impact of correlations appears.
Figure 6 provides further evidence for this scenario. Density plots of the distribution of the overshoot (conditioned on crossing the upper threshold) for both additive and subtractive correlations are shown as a function of the decision threshold, with particular overshoot distributions plotted at and 250. For the additive correlations model, a significant fraction of the trials terminate with zero overshoot at low values of (because, for example, large correlating events are relatively rare), implying that many trials underwent optimal accumulation of evidence without experiencing a common, correlating spike event, as discussed above.
Overall, the monotonic dependence of accuracy (FC) on decision time (DT) follows from the invariance of the moments of the overshoot distribution relative to changes in the threshold value ; this is particularly true for the first moment (see Appendix C). Figure 7 (Additive) demonstrates that these moments continue to fluctuate over a larger range of , and with larger magnitude, for the additive correlations model. This serves to explain the strange shape of the speed-accuracy trade-off curve pictured in Figure 5B that (unlike the subtractive correlations model) cannot be explained by a constant shift in .
5. Nonlinear Computations and Optimal Performance via the SPRT
When the neurons in each pool spike independently, Zhang and Bogacz (2010) demonstrated that linear summation of spikes across the two pools at each time step implements the SPRT. Because the SPRT is optimal in the sense of minimizing DT for a prescribed level of FC, the conclusion is that linear integration of spikes across pools, and then across time, provides an optimal decision-making strategy. However, is this optimality of linear integration confined to the case of independent activity within the pool?
Above, we showed that when correlations are introduced into this model, it is no longer true that each spike should be given the same “weight,” as in linear integration. Moreover, knowing only the pairwise correlations and firing rates alone does not allow one to write down a rule for the function that should be applied to incoming spikes in order to implement the SPRT, although in these cases, this function takes the form of the difference between the result of a nonlinearity applied to both pools. This dependence on higher-order statistics is demonstrated in Figure 7 by the fact that the nonlinearities for subtractive correlations (panel B) and additive correlations (panel C) take a significantly different form.
Correlated spiking among the neurons that encode sensory evidence appears ubiquitous. Such correlations might arise arise from any number of neuroanatomical features, the simplest being overlapping feedforward connectivity, which can cause collective fluctuations across a population (Binder & Powers, 2001; Shadlen & Newsome, 1998; De La Rocha, Doiron, Shea-Brown, Josić, & Reyes, 2007; Mazurek et al., 2003). They can also result from sensory events that have an impact on an entire population or from rapid modulatory effects. Moreover, for large neural populations, it appears that accurate descriptions of population-wide activity can require more than the typically measured pairwise correlations, but higher-order correlations as well (Montani et al., 2009; Ganmor et al., 2011; Yu et al., 2011).
The aim of our study is to improve our understanding of how correlated activity in these populations can affect the speed and accuracy of decisions that require accumulating sensory information over time. Faced with the wide range of possible mechanisms and structures of correlations alluded to above, we chose to focus on two models for population-wide correlations that illustrate a key distinction in how correlations can occur. These models have identical first-order and pairwise statistics, but differ in how each common spiking event involves either a small subset of the neurons (the subtractive case) or each neuron in the pool (the additive correlations case) (Kuhn et al., 2003; Staude et al., 2010).
Figure 9 quantifies this difference. Based on calculations in appendix D, we plot the joint cumulant across k neurons in a pool under both subtractive and additive correlations. This statistic, computed over a subset of the neurons in a pool, provides a generalization of the notion of covariance to higher orders. In this way, it can be used to distinguish the collective activity resulting from the additive and subtractive models. While the additive model possesses a constant joint cumulant no matter how many neurons are included, the joint cumulant of k neurons falls off geometrically for the subtractive case. We conjecture that this is a statistical signature that could suggest when other, more general patterns of correlated activity—measured experimentally or arising in mechanistic models of neural circuits (Mazurek et al., 2003)—will produce similar effects on decisions. Exploring this conjecture using models and data is a target of our future research.
We summarize our main findings as follows. For both models of correlated spiking, decisions produced by a simple, linear spike integration model (i.e., a neural integrator) become slower and less accurate as correlations increase. However, a strong difference appears for decisions made using the optimal decision strategy (SPRT). Here, additive correlations have only a minor impact on decision performance, while subtractive correlations continue to strongly diminish this performance. The conclusion is that decision-making circuits, faced with subtractive correlated sensory populations, will invariably produce diminished decision performance and stand little to gain by implementing computations more complex than a simple integration of spikes over time and neurons. However, in the presence of additive correlations, circuit mechanisms that implement or approximate the SPRT—perhaps by a nonlinearity such as that shown in Figure 8 applied to the sum of incoming spikes—stand to produce substantially better decision performance than their linear counterparts.
In other contexts, nonlinear computations have also been shown to improve discrimination between two alternatives. Field and Rieke (2002; also see Field, Sampath, & Rieke, 2005) demonstrated the importance of a thresholding nonlinearity in pooling the responses of rod cells, where this nonlinearity served to reject background noise. Closer to the present setting, gating inhibition that prevents accumulation of noise samples before the onset of evidence-encoding stimulus can account for visual search performance (Purcell, Schall, Logan, & Palmeri, 2012), and recent results suggest that related nonlinearities can improve performance for mistuned neural integrators (Cain, Barreiro, Shadlen, & Shea-Brown, 2011; see also Cain & Shea-Brown, 2012).
Our cases in which correlations decrease performance—in particular, when spikes are linearly integrated—are consistent with several prior studies of the role of correlated activity in decision making (Zohary et al., 1994; Britten, Newsome, Shadlen, Celebrini, & Movshon, 1996; Cohen & Newsome, 2009). We note, however, two differences in our models. The first is the mechanism through which correlated spikes are generated: whereas we use additive and subtractive models based on Poisson processes, Britten et al. (1996) and Cohen and Newsome (2009) use a multivariate gaussian description of spike counts. The second is that in these studies, decisions are rendered after a duration that is fixed before the trial begins—either a single duration, (Britten et al., 1996) or one that is drawn from a distribution of reaction times (Cohen & Newsome, 2009). This is different from our setting, where incoming signals on each trial determine the reaction time through a bound crossing.
Our result, in the case of subtractive correlations, that linear integration of spikes closely approximates the optimal decision-making strategy is similar to findings of Beck et al. (2008). Specifically, they model a dense range of differently tuned populations and find that optimal Bayesian inference can be based on linear integration of inputs for a wide set of correlation models. Our additive case, however, behaves differently, as nonlinearities are needed to achieve the optimal strategy.
An aim of future work is extending the setting of our study to include orientation tuning curves as in Cohen and Newsome (2009) and Beck et al. (2008). This is more realistic for many decision tasks (including the direction discrimination task) and will also allow progress toward models with multiple decision alternatives. An important challenge will come from defining pairwise correlations that vary as a function of preferred tuning orientation (see Zohary et al., 1994, and Cohen & Newsome, 2008), while also including the full structure of correlating events across multiple cells in a realistic way. For example, in this article, additive correlating events occurred independently in the two populations; future work could take a more graded approach, in which only some events have an impact on the entire sensory population (as in an eyeblink or possibly an attentional shift during a visual task).
Moreover, cells with similar orientation preferences would be more likely to share additive common spike events (these will occur with a frequency determined by their pairwise correlations, which can be higher for cells with similar response properties: Zohary et al., 1994; Cohen & Newsome, 2008).
At least from the perspective of the simple class of decoders that arose in this article, which combines the total population output of all cells that favor either of the two task alternatives, this raises interesting complexities. Common spike events would involve different subsets of cells. Thus, correlated spiking will be more graded and could be harder to discount than for the simpler homogeneous case we study. We therefore speculate that additive correlations will have a stronger impact on decision-making performance for heterogeneous populations. However, richer decoders with nonlinearities that act separately on the output of different neurons could mediate this. The issue will require careful quantitative study before solid conclusions can be reached.
As long as each neuron remains modeled as a Poisson point process, the sequential accumulation theory used here will carry over directly. For example, models that introduce correlations by common gain fluctuations would provide a multiplicative model of joint fluctuations and may be amenable to our approach. This points to another limitation of this study and an opportunity for future work: the lack of temporal correlations in the statistics of the inputs. A model of correlations that includes spikes from a correlating train that are temporally jittered (Gutnisky & Josíc, 2010; Staude et al., 2010) could provide a starting place for a model of the input trains; however, defining updates to the likelihood ratio for the two competing hypotheses will be more difficult. Nevertheless, it will be interesting to see how our results carry over; in particular, there will be many more combinations of spike events that will contribute to increments for both spike integration and SPRT decision models.
While we therefore view this study as a first step in exploring many possibilities, our findings demonstrate how the population-wide structure of correlations—beyond pairwise correlation coefficients—can have a strong impact on the speed and accuracy of decisions and the circuit operations necessary to achieve optimal performance. This suggests that multielectrode and imaging technologies, together with theoretical work on neural coding, will continue to play an exciting role in understanding the structure of basic computations like decision making over time.
Appendix A: Sequential Probability Ratio Test
A.1. Nontrivial Root of the Moment-Generating Function.
A.2. E[w], Independent Activity
A.3. E[w], Additive Correlations Model
When neurons within pools are correlated, the joint PDF of the spike count vector is no longer decomposable into the product of the marginal distributions (the critical step between equations A.8 and A.9). However, an expression for E[W] can be obtained in the limit as by repeatedly expanding via Taylor series about throughout the computation.
A.4. E[w], Subtractive Correlations Within Pools
Appendix B: Spike Integration
B.1. Independent Spiking.
B.2. Additive Correlations Model.
B.3. Subtractive Correlations Model.
Appendix C: Speed and Accuracy Functions with Overshoot
A correction term based on the mean of the overshoot distribution—that is, the distribution of the random variable defined by the excess distance over either the positive or negative threshold on the threshold crossing step—is suggested by Lee et al. (1994). This correction is based on the Taylor expansion of the conditional expectations in equation C.1 and takes the form of a shift in the decision threshold. A correction of this form is relevant to our analysis, as the performance of two model is compared parametrically in the threshold to isolate the effects of the speed-accuracy trade-off imparted by freely adjusting the threshold.
Appendix D: Joint Cumulants for the Additive and Subtractive Model
Staude et al. (2010) suggest that cumulants provide a “natural and intuitive higher-order generalization of the covariance” for multineuron spiking. The two models of correlated activity examined here are indistinguishable when examining only first-order (i.e., mean firing rate) or second-order (i.e., pairwise correlations) statistics. Here, we derive the joint cumulants for each of these two models to clarify how the spike count distributions produced by the two models differ at higher orders.
We thank Yu Hu, Adrienne Fairhall, and Michael Shadlen for their valuable comments on the manuscript. We gratefully acknowledge the support of a Career Award at the Scientific Interface from the Burroughs Welcome Fund and NSF grant CAREER DMS-1026125 (E.S.B.), and the University of Washington eScience Institute's Hyak computer cluster.