## Abstract

Neurons in many brain areas exhibit high trial-to-trial variability, with spike counts that are overdispersed relative to a Poisson distribution. Recent work (Goris, Movshon, & Simoncelli, 2014) has proposed to explain this variability in terms of a multiplicative interaction between a stochastic gain variable and a stimulus-dependent Poisson firing rate, which produces quadratic relationships between spike count mean and variance. Here we examine this quadratic assumption and propose a more flexible family of models that can account for a more diverse set of mean-variance relationships. Our model contains additive gaussian noise that is transformed nonlinearly to produce a Poisson spike rate. Different choices of the nonlinear function can give rise to qualitatively different mean-variance relationships, ranging from sublinear to linear to quadratic. Intriguingly, a rectified squaring nonlinearity produces a linear mean-variance function, corresponding to responses with a constant Fano factor. We describe a computationally efficient method for fitting this model to data and demonstrate that a majority of neurons in a V1 population are better described by a model with a nonquadratic relationship between mean and variance. Finally, we demonstrate a practical use of our model via an application to Bayesian adaptive stimulus selection in closed-loop neurophysiology experiments, which shows that accounting for overdispersion can lead to dramatic improvements in adaptive tuning curve estimation.

## 1 Introduction

Quantifying neural variability is crucial for understanding how neurons process and transmit information. This has motivated a large body of work that seeks to characterize the signal and noise governing neural responses to sensory stimuli. A simple but popular approach to this problem models neural spike counts using the Poisson distribution. Under this model, spike count mean and variance are equal. Deviations from this relationship are often characterized by the Fano factor, defined as the ratio of the spike count variance to the mean (Geisler & Albrecht, 1997; Eden & Kramer, 2010; Shadlen & Newsome, 1998). The Fano factor for a Poisson neuron is therefore equal to one. A Fano factor less than one indicates sub-Poisson variability, a condition referred to as underdispersion; a Fano factor greater than one indicates greater-than-Poisson variability, commonly known as overdispersion. A substantial literature has shown that Fano factors in a variety of brain areas differ substantially from one (Shadlen & Newsome, 1998; Gur, Beylin, & Snodderly, 1997; Barberini, Horwitz, & Newsome, 2001; Baddeley et al., 1997; Gershon, Wiener, Latham, & Richmond, 1998). Recent work has shown that not only do neural responses not obey a Poisson distribution; the overall mean-variance relationship is often not well characterized by a line, meaning the data are not consistent with a single Fano factor (Pillow & Scott, 2012b; Gao, Busing, Shenoy, & Cunningham, 2015; Stevenson, 2016; Goris, Movshon, & Simoncelli, 2014; Wiener & Richmond, 2003; Moshitch & Nelken, 2014). Rather, the spike count variance changes nonlinearly with the mean.

Here we develop a flexible model for overdispersed spike count data that captures a variety of different nonlinear mean-variance relationships. Our approach extends recent work from Goris et al. (2014), which described overdispersed spike responses in the early visual pathway using a Poisson model with multiplicative noise affecting the trial-to-trial spike rate. This produces a quadratic relationship between mean and variance, so the Fano factor increases linearly with mean spike count.

By contrast, our model seeks to describe a more diverse range of mean-variance relationships while maintaining tractability for simulation and fitting. Our model, which we refer to as the *flexible overdispersion model*, consists of a stimulus-dependent term, additive gaussian noise, a point nonlinearity, and conditionally Poisson spiking (see Figure 1). This framework includes the multiplicative Goris model as a special case when the nonlinearity is exponential but includes the flexibility to exhibit other behaviors as well—for example, Fano factors that decrease with increasing spike rate, which arises for a rectified-linear nonlinearity, and constant slope linear relationships, corresponding to a constant Fano factor, which arises for a rectified squaring nonlinearity. We show that the model can be tractably fit to data using the Laplace approximation to compute likelihoods and requires fewer parameters than other models for non-Poisson spike count data such as the modulated binomial or generalized count models (Gao et al., 2015; Stevenson, 2016). We apply our model to the V1 data set presented in Goris et al. (2014) and show that a rectified power law nonlinearity provides a better description of most individual neurons than a purely multiplicative model.

In addition to exploring the properties of our model, as fit to spiking data from V1 recordings, we also explore the practical use of our model. As an example, we use our flexible overdispersion model to develop an application to adaptive stimulus selection in closed-loop experiments, also known as active learning, for characterizing multidimensional tuning curves. These methods seek to minimize the amount of time required to estimate tuning curves by selecting stimuli that are as useful or informative as possible about the tuning curve. We use the flexible overdispersion model in place of the standard Poisson model and demonstrate a marked improvement for tuning curve estimation for both simulated experiments and color-tuning maps in awake fixating monkeys.

## 2 Background: Fano Factor and the modulated Poisson Model

A common approach to the phenomenon of overdispersion is to regard the Fano factor, , as a constant that characterizes the generic degree of overdispersion in neural firing (Geisler & Albrecht, 1997; Shadlen & Newsome, 1998). However, this description is adequate only if spike count variance scales linearly with the mean. Recent work has shown that spike responses in four different early visual areas exhibit variance that grows superlinearly with the mean, meaning that overdispersion (and the Fano factor) increase with the firing rate, inconsistent with a single Fano factor (Goris et al., 2014). Moreover, the Fano factor falls short of providing a complete description of neural spike count statistics, as it does not specify a full probability distribution over spike counts. Such a description is necessary for quantifying information in neural population codes. In light of these shortcomings, we feel that the Fano factor should be set aside as the default statistic for characterizing neural overdispersion; rather, researchers should consider the full curve describing how variance changes as a function of mean.

^{1}where is the distribution of a (unobserved) stochastic gain variable , which is assumed to have mean , and variance . The conditional distribution of the response given the stimulus requires marginalizing over , where denotes the Poisson distribution (see equation 2.1). When has a gamma distribution, this integral can be evaluated in closed form and results in a negative binomial distribution, a popular model for overdispersed spike counts (Pillow & Scott, 2012b; Goris et al., 2014; Linderman, Adams, & Pillow, 2016). (See appendix A for a derivation of this relationship.)

Goris et al. (2014) showed that this model provides a more accurate description of mean-variance relationships than the Poisson model for neurons from multiple visual areas (LGN, V1, V4, and MT). However, as we will show in the following, the quadratic relationship between variance and mean imposed by the Goris model does not accurately apply to all neurons.

## 3 The Flexible Overdispersion Model

The choice of nonlinearity allows the model the flexibility to produce different forms of overdispersion, that is, mean-variance curves with different shapes. When is exponential, we recover the Goris model framework with multiplicative noise, since , meaning the noise multiplicatively modulates with the stimulus-induced Poisson firing rate . However, as we will show, other choices of lead to different overdispersion curves from the quadratic mean-variance curve implied by multiplicative noise, as depicted in Figure 2.

### 3.1 Spike Count Mean and Variance

For arbitrary choices of , the spike count mean and variance have no closed-form expression and must be computed by numerical integration. However, for several simple choices for , we can evaluate these quantities analytically or approximate them closely using the delta method (see Table 1).

Nonlinearity . | Mean . | Variance . |
---|---|---|

Nonlinearity . | Mean . | Variance . |
---|---|---|

Notes: Here denotes the normal cumulative density function and , with denoting the modified Bessel function of the first kind and weighting coefficients , , , . To calculate the approximate variances for , we used the delta method (valid for small ) to obtain a general approximation: .

### 3.2 Capturing Different Forms of Overdispersion

To illustrate the model's flexibility, we consider several specific choices for , namely, exponential, and rectified power functions, for , where . Figure 2 shows the different mean-variance relationships produced by these four different choices for , at three different values of noise level . This demonstrates that simple nonlinear transforms that grow at different rates can capture important classes of behavior.

For concave nonlinearities like the rectified square root, , the spike count variance is actually larger at low spike rates than at high spike rates, and the Fano factor drops rapidly to 1 with increasing spike rate (see Figure 2, column 1). Such responses are thus highly overdispersed at low spike counts but indistinguishable from Poisson at high spike counts. A linear-rectified nonlinearity, , on the other hand, produces linear variance-mean curves with unit slope but with a positive intercept that grows with noise variance . Although they exhibit a constant degree of overdispersion relative to Poisson, such neurons have a falling Fano factor because the ratio of variance to mean is smaller at high firing rates (see Figure 2, column 2). Interestingly, a rectified squaring nonlinearity, , gives rise to linear variance-mean curves with slopes that vary as a function of noise variance . This is the only model of the four considered that generates a constant Fano factor across spike rates (see Figure 2, column 3).

For the exponential nonlinearity, , both the spike count mean and variance can be calculated analytically by exploiting the fact that has a log-normal distribution (see Table 1). As noted above, this corresponds to the multiplicative noise setting considered by Goris et al. (2014) and exhibits variance that grows quadratically as a function of mean (see Figure 2, column 4). However, we will show in section 5 that this model behaves very differently from the multiplicative model with a gamma-distributed gain variable (i.e., which produces negative-binomial distributed spike counts discussed in Goris et al., 2014). This indicates that all multiplicative noise models are not equal; differences in the distribution of the multiplicative noise variable can give rise to major differences in higher-order moments, which in turn can produce very different fits to data.

It is also worth noting that a model with latent gaussian noise and exponential nonlinearity is also the default model considered in an extensive literature on factor or latent linear dynamical system models for multivariate spike count data (Macke et al., 2011; Buesing, Macke, & Sahani, 2012; Archer, Koster, Pillow, & Macke, 2014; Rabinowitz, Goris, Cohen, & Simoncelli, 2015; Ecker, Denfield, Bethge, & Tolias, 2016; Gao, Archer, Paninski, & Cunningham, 2016; Zhao & Park, 2017). The majority of this literature, however, has focused on capturing covariance across time or across neurons and has devoted less attention to the issue of how count variance grows with mean for single neurons.

Finally, we note that despite and often requiring numerical integration to obtain, we can guarantee certain properties on the parametric curve, as parameterized by . First, we can guarantee that both and are smooth functions of (although the derivatives may be discontinuous). Second, so long as the function is positive and monotonically increasing, is also a monotonically increasing function of . This indicates that can truly be considered a function of since each value of maps onto only one value of . No similar guarantee could be made for , however, indicating that local minima can exist in the mean-variance relation. Local minima occur when the function has regions that are concave and have small derivatives, as occurs for (see Figure 2). In fact, if were constructed to have multiple concave regions with small derivatives, multiple local minima and maxima could be present in the mean-variance relationship. Finally, we note that since , we always have , indicating that while the amount of overdispersion may vary significantly, our model always produces greater-than-Poisson spike count variance.

## 4 Model Fitting

In order to apply the flexible overdispersion model to neural data, we need a tractable method for fitting the model parameters. Consider a data set composed of a set of stimuli and the measured spike counts of a single neuron where denotes the th response of the neuron to stimulus , for . The model parameters consist of a stimulus-dependent term for each stimulus, denoted , the noise variance , and (optionally) the exponent governing the soft-rectification power nonlinearity. (We consider the variable to stand for any parameters governing the nonlinearity, although for the exponential nonlinearity, there are no additional parameters to fit.)

To demonstrate that this method yields an accurate approximation to the log likelihood, we compare the true computed log likelihood (via numerical integration) with the Laplace approximation. Figure 3 depicts the log-likelihood landscapes for different spike counts () for both the numerically integrated log likelihood and the Laplace approximation. The ratio between numerically computed and Laplace-based log likelihoods (right-most column) demonstrates that errors are greatest at small spike counts (where the Poisson likelihood deviates most severely from the gaussian), but is negligible even in this regime for the exponential nonlinearity.

We can use the above methodology to address the particular case where is the soft rectification raised to a power () or an exponential function . Full details about evaluating and maximizing the log-likelihood using the Laplace approximation are provided in appendix B.

## 5 Results: V1 Spike Counts

To test the ability of the flexible overdispersion model to capture the variability of real neurons, we fit to data collected from macaque primary visual cortex V1 (Graf, Kohn, Jazayeri, & Movshon, 2011), one of the data sets considered in Goris et al. (2014). This data set contained five recording sessions with multielectrode array recordings of neural population responses to 72 distinct visual grating orientations. A total of 654 neurons were recorded, and each orientation was presented 50 times. Trials consisted of drifting grating stimulus presentations for 1280 ms, with a 1280 ms rest period in between. From this data set, we chose the 112 well-tuned neurons for model fitting. Well-tuned neurons were determined as in previous studies by applying a threshold to the difference between the minimum and maximum mean firing rate over all stimuli orientations (Graf et al., 2011).

We fit the model parameters using the Laplace-approximation-based method described in section 4 for both the soft-rectification-power function (soft-rect-p; see equation 3.7) and the exponential nonlinearity. For comparison, we also fit the data with the negative-binomial (NB) model from Goris et al. (2014). All model fits had 72 parameters to describe the stimulus drive for each orientation and a noise variance parameter , and one extra parameter *p* for the model with the soft-rect-p nonlinearity. Example mean-variance curves (as well as example draws from the model with the inferred parameters) are plotted against the true data in Figure 4. While for some neurons the negative binomial model and our model resulted in near-identical mean-variance curves, for other neurons, the flexibility of the soft-rectification function produced better qualitative fits. The fact that our model is a log-likelihood fit means that the justification for the mean-variance curves might not be completely apparent from the means and variances of the data. This quality is apparent in neurons 3 and 4 in Figure 4, where the data plotted as mean-variance plots seem ambiguous as to what model parameters might be appropriate, but the mean Fano factor plots below indicate that the model should capture the trend of the data to have lower Fano factors at higher firing rates, a property that our model captures. Figure 5 depicts the range of mean-variance relationships inferred for a sampling of neurons in the data set. The range of curves observed for even this single data set implies that a single mean-variance relationship is insufficient to capture the entire range of neural variability. Figure 5 further emphasizes this point by showing histograms of the inferred parameters (in the case of soft rectification, this includes the latent noise standard deviation and the soft-rectification power ).

We quantified the performance of different models by comparing the Akaike information criterion (AIC) across all three fits (power soft rectification, exponential, and negative binomial). The calculated AIC values, displayed as histograms in Figure 6, demonstrate that the softrect-p model provided the best fit for the majority of neurons, achieving the highest AIC values for 83.4% of the 112 neurons analyzed. This result indicates that despite the AIC penalization for requiring an additional parameter, the softrect-p fit best modeled approximately 83% of the tuned V1 neurons in the Graf data set. Interestingly, we also note that the exponential nonlinearity model, a model commonly used as an inverse link function in machine learning and neuroscience (Nelder & Baker, 1972), was a worse model for of the neurons according to AIC, even having performed worse than the NB model (as shown in Figure 6). This result may seem contrary to the fact that the model with exponential nonlinearity belongs to the same class of multiplicative-noise models as the negative binomial and thus also exhibits a quadratic mean-variance curve. Figure 4 demonstrates this behavior, showing that data fits to both quadratic models can still differ significantly in their curvature. The implication here is that while the full mean-variance curve may provide basic overdispersion properties, it is insufficient to describe a neural spiking model. Specifically, model fitting with a full probabilistic model implies assumptions on higher-order moments that will affect the mean-variance curve fit.

To emphasize the potential insufficiency of the mean-variance curve as a model summary, Figure 7 shows an example case where the exponential nonlinearity yielded a very different mean-variance curve than the negative binomial model, despite the fact that both are multiplicative noise models defined by a stochastic gain variable with a mean of 1. Specifically, for this neuron, both the exponential and NB models seem to have significantly overestimated the variance as a function of the mean. The exponential fit overestimation was so extreme that the curve for the exponential fit is barely visible in the mean-variance plot. After ruling out potential numerical or approximation error with the Laplace approximation technique, we observed that while the mean-variance curves did not match the data, the full count distributions did seem to match the data. The upper-right and upper-left plots show that overall, both models fit the data distributions for both high and low mean spike counts. Plotting the spike count distributions on a log scale, however, shows that despite matching the data well at low firing rates, both models have higher tails than what would be expected. These heavy tails contributed directly to the overfitting of the spike count variance. In fact, the exponential nonlinearity's spike count distribution has a much heavier tail than even the NB model, correlating with how the exponential nonlinearity model nonlinearity modeled the variance to a higher degree. To understand why the erroneous fits were obtained, we observe that up to a certain value of spike counts, the tails of the data distribution actually look heavy-tailed. In fact the log distributions shown in the lower-left and lower-right panels of Figure 7 do follow the data. Thus, heavy tails stemmed from matching the data in the maximum likelihood (ML) estimation. Unlike the data, however, our models are unbounded, indicating that heavy tails could have an impact on the model variance calculations in ways that would not affect the data fits. This analysis demonstrates that attempting to fit the higher-order moment of the model (as ML-type methods do) may be detrimental to accurately fitting the mean-variance curve.

## 6 Application: Adaptive Closed-Loop Experiments

While improved models of neural spiking processes are important from a basic scientific understanding standpoint, these models also have applications as replacing the Poisson model as a front-end in data analysis and efficient recording methods. To illustrate one of the potential uses of the flexible overdispersion model, we developed an application to adaptive stimulus selection for closed-loop neurophysiology experiments. Such methods, known as *active learning* in machine learning and *adaptive optimal experimental design* in statistics, seek to select stimuli based on the stimuli and responses observed so far during an experiment in order to characterize the neuron as quickly and efficiently as possible (Lindley, 1956; Bernardo, 1979; MacKay, 1992; Chaloner & Verdinelli, 1995; Cohn, Ghahramani, & Jordan, 1996; Paninski, 2005). Adaptive stimulus selection is particularly useful in settings where data are limited or expensive to collect and can substantially reduce the number of trials needed for fitting an accurate model of neural responses (Paninski, Pillow, & Lewi, 2007; Benda, Gollisch, Machens, & Herz, 2007; Lewi, Butera, & Paninski, 2009; DiMattina & Zhang, 2011, 2013; Bölinger & Gollisch, 2012; Park & Pillow, 2012; Park, Weller, Horwitz, & Pillow, 2014; Pillow & Park, 2016).

Here we introduce a method for adaptive stimulus selection for estimating a neuron's multidimensional tuning curve, or firing rate map, under the flexible overdispersion model. Our method involves computing the posterior distribution over the tuning curve given the previously observed responses in an experiment and selects the stimulus for which the value of the tuning curve has maximal posterior variance. We illustrate the performance gain using this adaptive learning method for estimating color tuning curves of V1 neurons recorded in awake, fixating monkeys (Park et al., 2014). For simplicity, we use the exponential nonlinearity , which has the advantage of having analytical expressions for the mean and variance of the spike count given an input level and noise variance .

### 6.1 GP Tuning Curve Model with Flexible Overdispersion

Building on prior work (Rad & Paninski, 2010; Park et al., 2014; Pillow & Park, 2016), we model tuning curves by placing a GP prior over the input function . Here can be considered a log of the tuning curve plus a constant, since the tuning curve (expected spike count given a stimulus) for the exponential nonlinearity is (see Table 1).

The necessary steps in our stimulus selection algorithm include include updating the posterior over after each trial, updating hyperparameters using the Laplace approximation-based marginal likelihood, and selecting the stimulus for which the tuning curve has maximal posterior variance.

### 6.2 Inference and Laplace Approximation

The tuning curve inference problem here is similar to that of fitting the flexible overdispersion model (see section 4), except that we have added a prior over the values of that encourages smoothness and shrinks them toward zero. After each trial, we maximize marginal likelihood under the Laplace approximation to update the hyperparameters governing the GP prior's scale and smoothness, as well as the noise variance .

Let denote the data set after trials, which is simply a paired list of stimuli and spike counts collected so far in the experiment. For the purposes of our derivation, let denote the stimulus input values for the observed stimulus set , and let and be the unobserved gaussian input noise and observed spike counts on each trial, respectively.

*total-data posterior*, after each trial: where is the GP-induced gaussian prior over the function values (see equation 6.1) with covariance matrix , and is the distribution of the (unobserved) overdispersion noise for each trial. The full set of hyperparameters is denoted .

### 6.3 Hyperparameter Optimization

Figure 8 shows the consequences of overdispersed spike responses on the estimation of tuning curves and their hyperparameters. We generated a 2D firing rate map as a mixture of gaussian bumps (left), and simulated spike counts from the flexible overdispersion model (with exponential nonlinearity) at a random collection of 2D locations. We then performed inference using either the Poisson model (center) or the flexible overdispersion model (right). We found that the Poisson estimate suffered from systematic underestimation of the GP length scale, resulting in an estimate that is significantly rougher than the true map. In essence, the Poisson model attributes super-Poisson variability of the responses as reflecting structure in the firing rate map itself rather than noise, thus necessitating a shorter length scale .

### 6.4 Adaptive Stimulus Selection Method

To adaptively select stimuli during an experiment, we search for the stimulus for which the posterior variance of the tuning curve is maximal. This approach, commonly known as uncertainty sampling, is motivated by the idea that we would like to minimize posterior uncertainty about the tuning curve at all points within the stimulus range.

### 6.5 Results: Adaptive Stimulus Selection for Color Tuning Curves

We applied our method to the problem of estimating color tuning functions for V1 neurons recorded in awake, fixating monkeys under two paradigms. In the first paradigm, Gabor patches (gaussian windowed sinusoidal gratings) were drifted across the receptive field. The orientation, spatial frequency, and direction of drift in each Gabor patch were tailored to each neuron. In the second paradigm, gaussian-smoothed rectangular bars drifted across the receptive field. The length and width of the bar, as well as its direction of drift, were tailored to each cell. Figure 9 shows a schematic illustration of the stimulus space, the raw experimental data, and the adaptive stimulus stimulus selection protocol.

In interleaved trials, individual V1 neurons were tested with stimuli that were chosen either adaptively (using our method; see Figure 9C) or nonadaptively. In nonadaptive trials, stimulus selection was independent of responses. In adaptive trials, stimulus selection was based on the posterior variance of the firing rate map in the overdispersed GP-Poisson model. Results showed that the adaptive method yielded faster convergence and more accurate firing map estimates than the nonadaptive method. As shown in Figure 10, the estimates of the Poisson model are much coarser than would be expected. As a result, the flexible over-dispersion model achieved on average a four-times-higher likelihood on test data compared to the Poisson model.

In the closed-loop design, we selected the stimulus that had the highest posterior variance of firing rate map in each trial. In the open-loop design, we sampled the stimuli uniformly. In Figure 11, we show the estimated tuning functions from the drifting bar paradigm using all data collected from adaptive and non-adaptive designs, as well as estimates using a quarter of the data collected from each design. The estimates obtained using the closed-loop design look more similar to the estimates obtained using all the data. Finally, we computed the average prediction error using data collected from each design. The closed-loop method achieved substantially lower prediction error and faster convergence than the open-loop method.

## 7 Discussion

We have presented a flexible model for super-Poisson spike count responses and demonstrated both its properties when fit to V1 spiking data and its practical use by applying the new model to closed-loop neurophysiology experiments. Our model, which introduces extra-Poisson variability via a gaussian noise term added to the stimulus response level and passed through a nonlinearity, provides flexibility in accounting for neural overdispersion and permits efficient model fitting. We find that this model better fits individual mean-variance relationships for neurons in V1. Furthermore, we find that when used as a likelihood in lieu of the Poisson likelihood, our model can significantly improve closed-loop estimation of color tuning curves in V1.

### 7.1 Relationship to Previous Work

A wide variety of other approaches to overdispersed spike count distributions have been proposed in the recent literature. These include mixtures of Poisson distributions (Wiener & Richmond, 2003; Shidara, Mizuhiki, & Richmond, 2005) and the Twiddle distribution, which is characterized by a polynomial mean-variance relationship for some and (Moshitch & Nelken, 2014; Koyama, 2015; Gershon et al., 1998). Although these models offer substantial improvements over models that specify a single Fano factor, they are still restricted in the forms of overdispersion they can capture, and in some cases they are more difficult to connect to mechanistic interpretations (i.e., how such counts might be generated). In contrast to these approaches, recent work on neural partitioning and negative binomial spike counts (Onken, Grünewälder, Munk, & Obermayer, 2009; Goris et al., 2014; Pillow & Scott, 2012a) instead seeks more descriptive, interpretable models of overdispersion via hierarchical (or doubly stochastic) modeling.

Other methods to quantify neural overdispersion rely on describing more general count distributions, such as the Bernoulli or Conway-Maxwell-Poisson (COM-Poisson) distributions (Zhu, Sellers, Morris, & Shmueli, 2017; Sellers, Borle, & Shmueli, 2012), rather than layering on top of the Poisson distribution (Gao et al., 2015; Stevenson, 2016). These models may be limited by requiring many additional parameters (Gao et al., 2015) or may require that the mean-variance curves be monotonically increasing (Stevenson, 2016), reducing the model's ability to explain neurons that exhibit higher variance at lower means. While completely replacing the Poisson distribution is satisfying in that the mean-variance relationship is no longer restricted at any stage by Poisson behavior, we feel that layering additional variability inside the Poisson rate is more intuitive (Goris et al., 2014). Specifically, we find that such hierarchical models can isolate variables responsible for overdispersion and allow mechanistic theories to attempt to explain these factors.

Our model can be considered as a significant generalization of the mixture-of-Poisson models (Wiener & Richmond, 2003; Shidara et al., 2005), as our model is essentially an infinite mixture of Poissons, as well as a generalization of prior work using latent hierarchical dispersion variables (Goris et al., 2014). In terms of the power law models of (Moshitch & Nelken, 2014; Koyama, 2015), our model also includes as a subset any power law where the exponent is between one and two (i.e., for ). Although our model is incapable of achieving or , the achievable range of includes experimentally observed values in the literature, for example, (Shadlen & Newsome, 1998) or for cat auditory cortex (Moshitch & Nelken, 2014). Interestingly, however, our analysis demonstrates that power-law-type characterizations might still be insufficient for assessing neural overdispersion. In particular, our model exploration indicates that a full discussion of the firing statistics should also include the full count distributions (see Figure 7). Our results demonstrated that our model more accurately captures the variation in V1 neural responses, as compared to both traditional and more recent Poisson-based models.

### 7.2 Interpretation and Future Directions

The origins of neural response variability are still a subject of debate (Renart & Machens, 2014; Masquelier, 2013), and we admit that our model does not attempt to explain how overdispersion arises. Our model only implies that the unknown nuisance variable is well modeled by an addition of stimulus-dependent drive and gaussian noise, followed by a nonlinearity and Poisson spike generation. In terms of the nonlinearity used, we found that the single-parameter power soft-rectification function was flexible enough to account for a large range of neural behavior, despite its reliance on a single parameter. The inference cost for the flexible overdispersion model therefore remains modest and is, moreover, justified by the substantial gain in model expressibility (e.g., as shown by the AIC results in Figure 6). Other nonlinearities are certainly possible in this framework, and the general properties we describe in section 3 guarantee that aside from perhaps requiring numerical integration, the model will be well behaved (i.e., the variance-mean curve will remain a proper function). For example, we found that the exponential nonlinearity allowed for more computationally efficient calculations for the closed-loop inference procedure.

While we focus here on explaining super-Poisson behavior of neural firing, we note that our model does not attempt to account for the *under*dispersion sometimes observed in certain neural circuits, such as retina or auditory cortex, that exhibit higher degrees of response regularity (DeWeese & Zador, 2002; Kara, Reinagel, & Reid, 2000; Gur et al., 1997; Maimon & Assad, 2009). Comparable flexible models for underdispersed spike counts therefore pose an important open problem for future work. In terms of neural populations, our model focuses only on explaining the spiking process for a single neuron. The increasing number of simultaneously recorded neurons thus poses the future challenge of explaining correlations in overdispersion between neurons (e.g., Goris et al., 2014). Such population overdispersion models would carry through the benefits of more accurate spike count models that we observe here for single-neuron closed-loop experiments to cases where entire populations are analyzed simultaneously. Finally, we note that we have modeled spike counts using a single time bin size, corresponding roughly to the window during which each stimulus was presented. We have therefore not addressed how spike count variability varies with time binning, which may be important for understanding population coding of information on different timescales. In preliminary work, we have found that fits to spike counts in smaller time bins yield different parameters for the model in Goris et al. (2014) and flexible overdispersion model than those we have shown here (Charles & Pillow, 2017). Therefore, an important future direction for future research will be to extend current overdispersion models to provide a single, consistent account for spike count variability in arbitrary time bin sizes.

## Appendix A: Modulated Poisson and the NB model

Here we review the connection between the modulated Poisson model introduced by Goris et al. (2014) and the negative binomial model. The modulated Poisson model was introduced generally as having a Poisson rate modulated by a stochastic gain with a distribution satisfying and . The general mean-variance relationship of the marginal spike counts under this model was shown to always satisfy a quadratic relationship (i.e., . However, fitting the model parameters to data required additional assumptions about the distribution of . In particular, Goris et al. used a gamma distribution, which makes the integral over analytically tractable, producing a negative binomial distribution over spike counts.

## Appendix B: Parameter Optimization Using Laplace Approximation

The general method we present for estimating the parameters of the nonlinearity that modulates neural spiking is outlined in algorithm 1. For a particular nonlinearity (i.e., the soft rectification raised to a power in equation 3.7, these steps can be written more explicitly. Here we fit data using both the exponential nonlinearity, for which the fitting algorithm is provided in algorithm 3, and the power-soft-rectification nonlinearity, corresponding to the model fitting in algorithm 2). Note that the estimation of and the gradient descent steps for and can be accomplished analytically; however, the estimation of would typically need to be calculated numerically.

## Acknowledgments

We are grateful to Arnulf Graf and J. Anthony Movshon for V1 spike count data used for fitting and validating the flexible overdispersion model. A.S.C. was supported by the NIH NRSA Training Grant in Quantitative Neuroscience (T32MH065214). M.P. was supported by the Gatsby Charitable Foundation. J.W.P. and G.H. were supported by NIH grant EY018849. J.W.P. was supported by grants from the McKnight Foundation, Simons Collaboration on the Global Brain (SCGB AWD1004351), the NSF CAREER Award (IIS-1150186), and an NIMH grant (MH099611).

## References

## Notes

^{1}

Note that in contrast to Goris et al. (2014), we define in units of spikes per bin, removing the bin size from the ensuing equations.

## Author notes

A.S.C. and M.P. share first authorship.

eLife