## Abstract

A central theme in computational neuroscience is determining the neural correlates of efficient and accurate coding of sensory signals. Diversity, or heterogeneity, of intrinsic neural attributes is known to exist in many brain areas and is thought to significantly affect neural coding. Recent theoretical and experimental work has argued that in uncoupled networks, coding is most accurate at intermediate levels of heterogeneity. Here we consider this question with data from in vivo recordings of neurons in the electrosensory system of weakly electric fish subject to the same realization of noisy stimuli; we use a generalized linear model (GLM) to assess the accuracy of (Bayesian) decoding of stimulus given a population spiking response. The long recordings enable us to consider many uncoupled networks and a relatively wide range of heterogeneity, as well as many instances of the stimuli, thus enabling us to address this question with statistical power. The GLM decoding is performed on a single long time series of data to mimic realistic conditions rather than using trial-averaged data for better model fits. For a variety of fixed network sizes, we generally find that the optimal levels of heterogeneity are at intermediate values, and this holds in all core components of GLM. These results are robust to several measures of decoding performance, including the absolute value of the error, error weighted by the uncertainty of the estimated stimulus, and the correlation between the actual and estimated stimulus. Although a quadratic fit to decoding performance as a function of heterogeneity is statistically significant, the result is highly variable with low $R2$ values. Taken together, intermediate levels of neural heterogeneity are indeed a prominent attribute for efficient coding even within a single time series, but the performance is highly variable.

## 1  Introduction

Animals are able to robustly detect important sensory stimuli, using neurons as the central processing units. Thus, sensory neurons, even at initial stages of processing, must accurately encode noisy stimuli to faithfully propagate sensory information to subsequent brain regions not directly exposed to the stimuli. The coordinated activity of populations of neurons can successfully encode stimuli despite the fact that individual neurons respond noisily and nonlinearly to inputs (Dayan & Abbott, 2001). Intrinsic cellular diversity has been proposed as a crucial attribute for efficient coding in many sensory systems, such as olfactory (Padmanabhan & Urban, 2010), auditory (Ahn et al., 2014), electrosensory (Marsat & Maler, 2010), and visual (Chelaru & Dragoi, 2008). Beyond whether intrinsic diversity is important, Tripathy, Padmanabhan, Gerkin, and Urban (2013) showed that an intermediate level of diversity, not just whether there is diversity, could enhance population coding in uncoupled mitral cells in the olfactory bulb. Moreover, Mejias and Longtin (2012) theoretically analyzed how this optimal level of heterogeneity enhances neural coding in uncoupled populations of spiking neurons. However, despite these influential works, we still do not know how well networks can optimally decode stimuli at intermediate heterogeneity without using trial averaging in experimental data and with stochastic models. Also, a more detailed statistical analysis of how neural heterogeneity is related to accurate or optimal decoding of noisy input signals would provide deeper insights to this important question.

Here we address the relevance of diversity in populations of sensory neurons for decoding noisy input stimuli. We fit generalized linear models (GLM) (Pillow, 2007; Pillow et al., 2008; Pillow, Ahmadian, & Paninski, 2011) to in vivo electrophysiological recordings of weakly electric fish Apteronotus leptorhynchus, from the same animal. GLMs are commonly used in many systems and have many desired features, including the existence of an optimal fit to the encoding model and Bayesian decoding via maximum a posterior (MAP) estimate of the stimulus given the population spiking response (Pillow, Ahmadian, & Paninski, 2011). The rich data set we use consists of long time series of spiking responses for each of the seven recorded cells, all subject to the same realization of random stimulus input. A network consists of a random selection of neurons; in addition, these data enable a given neuron to have a variety of encoding models fit to different segments of time, resulting in a wide range of heterogeneous networks. In a given time interval, the task of decoding the same random input with a network model fit to a random prior time mimics realistic conditions, as opposed to decoding the trial- or time-averaged statistics of inputs (Tripathy, Padmanabhan, Gerkin, and Urban, 2013) or using trial- or time-averaged experimental data to fit a statistical model well. In considering decoding of stimuli for physiologically relevant time periods of several hundreds of milliseconds, we thus have many time intervals of the noisy input to assess decoding. We are therefore able to estimate statistical measures of model fits with sufficient statistical power to assess whether the diversity of the components of GLM is significant for accurate decoding.

We find that the intermediate levels of heterogeneity are indeed a signature of accurate decoding of stimuli, and this result holds for multiple metrics for decoding performance. Specifically, we show that decoding performance measured by the $l1$-norm of error1 has a minimum value when performing quadratic regression on error versus intrinsic heterogeneity. When we incorporate Bayesian uncertainty by weighting each time bin inversely with the standard deviation of the MAP estimate, we see similar results, but the quality of fit of the quadratic regression is worse compared to the $l1$-norm. A notable addition in our analysis is the use of $R2$ as a quality-of-fit statistic to assess the strength of the quadratic regression beyond its statistical significance via $p$-values.

In addition to these two error metrics, we consider the Pearson's correlation between the actual and decoded stimulus to assess how well the two entities covary (see Ostojic & Brunel, 2011 and René, Longtin, & Macke, 2020, who used a similar measure of error, among other measures). We find again that correlations are maximal at intermediate levels of heterogeneity. In almost all cases with these three metrics for decoding performance, we find that the quadratic regression fit is statistically significant but noisy with small $R2$ values. We also observe that decoding performance is enhanced as the network size increases in terms of both reduced error and increased correlation strength, consistent with theories of population coding (Shamir & Sompolinsky, 2006; Yarrow, Challis, & Seriès, 2012). Although decoding performance is variable for a particular level of heterogeneity, our detailed study shows that intermediate levels of heterogeneity in all of the core attributes of GLM can result in optimal decoding of stimuli and that this trend holds for all of the network sizes we considered.

## 2  Results

The data set we use consists of long time series ($∼$$14s$) of spiking responses from seven cells subject to random stimulus inputs (see section 3 for more on population size). We utilize GLMs to model the neuron's encoding of the stimulus $xt$ into Poisson-distributed binary responses. Figure 1A is a schematic of the model, which includes a filter applied to the stimulus $(k)$, a filter for the neuron's own recent spike history $(h)$, and a bias constant shift $(b)$. The filtered stimulus and filtered spike history are then connected with the spiking response at time $t$ by a static nonlinearity (or log link). Examples of the components of the GLM for the seven cells are shown in Figure 1B where the GLM is fit using maximum likelihood. In this example, all seven cells are fit to the exact same time period of $200ms$ (80 time bins of 2.5 ms),2 so they all have the same instance of stimulus input. Note how the two filters $(k,h)$ and the shift $(b)$ vary across the seven neurons under the same conditions, indicative of intrinsic heterogeneity in the electrosensory lateral line lobe (ELL) cells. A GLM for each cell will also vary depending on the time period used to fit, even within the same neuron, because of its dynamic response.
Figure 1:

Generalized linear model (GLM) fit to experimental data. (A) Schematic of the model. The input stimulus $xt$ is filtered (convolution) with $k$, shifted by a fixed amount $b$, and with any proceeding spikes having a function $h$ (postspike filter) added on before input to a static exponential nonlinear function (see equation 2.1). (B) Maximum likelihood fits of the GLM to the seven recorded cells, using the same 200 ms of data (80 time bins of 2.5 ms each) where all seven cells receive exact same stimulus input. The result is seven GLM instances consisting of $(b,k,h)$. The top bar chart shows the (bias) constant $b$, followed by the stimulus filter (middle), and postspike filter (bottom) with time bins of size 2.5 ms. (C) GLM results on out-of-sample data from time interval immediately following time period used to fit (in-sample fits from B). In the absence of trial-averaged experimental data, we consider the cumulative spikes from both the GLM $∫0tλ(t')dt'$ (solid curves, which is theoretically the expected value of cumulative spike counts) and the experimental data (dots), color-coded by cell number in (B).

Figure 1:

Generalized linear model (GLM) fit to experimental data. (A) Schematic of the model. The input stimulus $xt$ is filtered (convolution) with $k$, shifted by a fixed amount $b$, and with any proceeding spikes having a function $h$ (postspike filter) added on before input to a static exponential nonlinear function (see equation 2.1). (B) Maximum likelihood fits of the GLM to the seven recorded cells, using the same 200 ms of data (80 time bins of 2.5 ms each) where all seven cells receive exact same stimulus input. The result is seven GLM instances consisting of $(b,k,h)$. The top bar chart shows the (bias) constant $b$, followed by the stimulus filter (middle), and postspike filter (bottom) with time bins of size 2.5 ms. (C) GLM results on out-of-sample data from time interval immediately following time period used to fit (in-sample fits from B). In the absence of trial-averaged experimental data, we consider the cumulative spikes from both the GLM $∫0tλ(t')dt'$ (solid curves, which is theoretically the expected value of cumulative spike counts) and the experimental data (dots), color-coded by cell number in (B).

The GLM models a neuron's response as an instantaneous firing rate $λ(t)$, which is the relative intensity of neural firing, dependent on filters $k$ and $h$, stimulus $xt$, and spike history $rt$:
$λ(t)=exp(b+xt·k+rt·h).$
(2.1)
The model prediction of the number of spikes over a larger time interval $(0,T)$ is the integral of instantaneous firing rate, $μ=∫0Tλ(t)dt$, an entity that we utilize as an assessment of the encoding models because of the absence of trial-averaged data that would likely give precise fits. In Figure 1C, we compare the cumulative predicted spikes through time against the actual cumulative spikes from the experiments in the immediate time period ($400ms$) following the time segment used to fit the GLMs (again, 200 ms with 80 bins of $2.5ms$). Here we use a shorter encoding time period than decoding as an extreme demonstration because the in-sample length is often much larger than out-of-sample (Pillow et al., 2008, 2011). The results vary in accuracy, but can be surprisingly good (i.e., neurons 3, 4, 5, 6 for these time periods), despite the stochastic nature of neural firing in real data. Trial-averaged fitting and predicting are common and often used for model fits to demonstrate how a neuron on average encodes the stimulus to response, but these results indicate that single-trial predictions can still be effective while also incorporating another aspect of biological realism.
The encoding models, even fit to this short time period, are effective in capturing the spiking behavior of the different neurons, so we employ them to reconstruct stimuli based on population spiking from the experimental data. The decoding of the stimulus is modeled by Bayes' rule, using the relevant likelihood functions of the system, with $p(·)$ to denote relevant likelihoods (consistent with Pillow et al., 2011):
$p(xt|rt)=p(rt|xt)p(xt)p(rt).$
(2.2)
The Bayesian prior $p(xt)$ is determined by the characteristics of the stimulus in a training set. The stimulus $xt$ is approximately normally distributed (see Figure 2A). It also has an autocorrelation structure at $2.5ms$ time bins (see Figure A1A for raw auto- and partial-correlation functions with $0.5ms$ time bins). We use a multivariate normal distribution for the prior likelihood $p(xt)$ of the stimulus time series, incorporating the same mean and autocorrelation structure as the training stimulus (see section 4 for further details). To generate the conditional spike likelihood $p(rt|xt)$, we use the GLM model conditioned on stimulus and postspike filters, $p(rt|xt,k,h)$. Meanwhile, the spike train itself follows a Poisson distribution for individual spikes, $λe-λ$, so the prior likelihood for the spike train is a Poisson process by construction.
Figure 2:

Using maximum a posteriori (MAP) to estimate, or decode, stimulus from spike trains. (A) Statistics of input stimulus for the entire time. (B) Schematic of how MAP is accomplished after GLM is fit. (C) Decoding performances range from poor (top row) to good (bottom row); each column has a fixed network size of uncoupled neurons (1, 4, and 7, respectively). Neurons were randomly selected (except for $N=7$), and we use the same random segment of time to fit all of the neurons' GLM in a given network, but the time segment to fit the GLM could be different for a given network size (i.e., for $N=7$, the top panel was fit to a different time segment from the bottom). Here and for all subsequent figures, we use 1397.5 ms (559 bins of 2.5 ms) to fit the GLM encoding model and decode for 417.5 ms (167 bins of 2.5 ms) approximately 30% of the encoding time. In all six instances, decoding was performed on the time period immediately following the random segment used to fit the GLM. See equation 2.4 for a definition of “Error.”

Figure 2:

Using maximum a posteriori (MAP) to estimate, or decode, stimulus from spike trains. (A) Statistics of input stimulus for the entire time. (B) Schematic of how MAP is accomplished after GLM is fit. (C) Decoding performances range from poor (top row) to good (bottom row); each column has a fixed network size of uncoupled neurons (1, 4, and 7, respectively). Neurons were randomly selected (except for $N=7$), and we use the same random segment of time to fit all of the neurons' GLM in a given network, but the time segment to fit the GLM could be different for a given network size (i.e., for $N=7$, the top panel was fit to a different time segment from the bottom). Here and for all subsequent figures, we use 1397.5 ms (559 bins of 2.5 ms) to fit the GLM encoding model and decode for 417.5 ms (167 bins of 2.5 ms) approximately 30% of the encoding time. In all six instances, decoding was performed on the time period immediately following the random segment used to fit the GLM. See equation 2.4 for a definition of “Error.”

We model the decoded stimuli from this information via the maximum a posteriori (MAP) estimate (i.e., the mode of the posterior distribution drawn from maximum likelihood estimation that incorporates the prior). Given $k$ and $h$ from the encoding model and given a spike train $rt$, we want to find the most likely stimulus $xt$ (see Figure 2B). That is, iterating on $ln[p(xt|rt)]$, the log likelihood of the stimulus conditioned on the spike train, we produce the following design (Pillow et al., 2011; Tripathy et al., 2013):
$xtjMAP≡argmaxxtln[p(xt|rt)].$
(2.3)

Figure 2C shows comparisons of the decoded stimulus (black) and the actual stimulus (cyan) for different network sizes ($N=1,4,7$). Here and for all subsequent figures and results, we use 1397.5 ms (559 bins of 2.5 ms) to fit the GLM encoding model and decode for 417.5 ms (167 bins of 2.5 ms), approximately 30% of the in-sample training period. As the cell population increases, we see that the decoded stimulus falls more in line with the actual stimulus in these examples. However, with a fixed population size, the results can vary dramatically from poor (top row) to good (bottom row) decoding. Given these confounding results, we aim to statistically quantify how intrinsic heterogeneity is related to the quality of decoding performance by generating many random networks and considering many random time segments to decode.

### 2.1  Heterogeneity versus Decoding Error

Networks were constructed by randomly selecting cells from among all seven neurons to obtain a prescribed network size, and GLMs for each cell were fit on random segments of time prior to the decoding time segment. Note that a given network of cells decodes stimuli in the same time segment. A given network has three measures of heterogeneity based on the encoding models' component (stimulus filters, postspike filters, and constant bias); we define heterogeneity as the variability across these respective components. Following the convention in Tripathy et al. (2013), the stimulus filter heterogeneity and postspike filter heterogeneity are both determined by the mean of the Euclidean distances between all distinct pairs of filter values or coefficients at all vector values, while the bias heterogeneity is the standard deviation of the $b$ values.

We calculate the mean absolute value of the difference between the decoded stimulus and actual stimulus (i.e., the $l1$-error), which we label as “Error” in the figures. Since decoding of the stimulus is accomplished with a Bayesian model, we can assess the uncertainty of the MAP estimate of the stimulus. So, we consider a Hessian-weighted error that weights each of the error values by $1/Cjj∑k1/Ckk$, where $Cjj$ is the $j$th diagonal entry of the inverse of the Hessian matrix (second derivative matrix of $-ln(p(xt|rt))$ with respect to $xtj$). Since the stimulus has a gaussian distribution, the inverse of the Hessian matrix is precisely the standard deviation of MAP estimate of $xt$ (Pillow et al., 2011). For stimulus duration $T$, the two types of error we consider are:
$E=∑j=1Twj|xtjMAP-xtj|,wherewj=1T,forError,$
(2.4)
$wj=1/Cjj∑k1/CkkforHessian-weightederror.$
(2.5)
Thus, for each network, there is a bias heterogeneity value, a stimulus filter heterogeneity value, a postspike filter heterogeneity value, and two error terms. Across a sample of networks of a fixed size, we regress the decoding error against each heterogeneity type using quadratic regression to determine if there is an intermediate global minimum. Note that these heterogeneity types are highly correlated with each other, so including them together as regressors violates an assumption of regression. Thus, we produce three different error regression models for each of the two decoding error types. Each data point in Figure 3 (and Figures A2 to A5) is constructed by randomly selecting among the seven cells, the training time intervals, and the validation time interval. Since the filters for an individual cell can vary in time, the same cell selected on different time intervals can produce unique encoding models. Thus, our random selection allows for selecting the same cell more than once, which can lead to more homogeneous networks. For example, if we selected cell A at time $t$ and again at time $t+k$, we would likely see more similarity in their model constructions than if we selected cell A at time $t$ and cell B at time $t+k$. The end result is a relatively wide range of networks in terms of heterogeneity.
In Figure 3, the plots of decoding error as a function of stimulus heterogeneity indicate that heterogeneity has some impact on decoding error. For exposition purposes, only networks of size $N=2$, 4, and 6 cells are shown (see Figures A2 to A5 and Tables 1, 2, and A1 for the complete set of results). Most models have statistically significant quadratic coefficients ($α=0.001$) for smaller populations. However, with sample sizes in the thousands, small $p$-values are fairly easily achievable as minor connections can be exaggerated. While there may be a quadratic relationship that is statistically significant, it may not be practically significant. We consider the quality of fit too, which can easily be judged by the $R2$ for the regression models. As evident from Figure 3, there is a significant amount of noise in decoding error plotted against heterogeneity. These results are consistent with Tripathy et al. (2013), where a quadratic relationship was found to be statistically significant, and though the $R2$ values were not reported, their plots demonstrated a high level of variability in error not explained by heterogeneity. Note that because we are regressing on error rather than on a response variable directly, we would expect dampened $R2$ values for the quadratic fits. There is only so much error that can be mitigated before the impact of an additional variable's influence is overwhelmed by the system's natural variability, something that is likely noticeable with noisy neurons.
Figure 3:

Decoding error as a function of intrinsic cell heterogeneity within the network. Results from a large, random sample of various network sizes, where the cell is randomly selected (out of seven) and their GLM attributes ($k,h,b$) trained on a random time period before the decoding time segment. Note that a given network of cells decodes stimuli in the same time segment. (A, C, E) The $l1$-norm of the error (Error) as a function of stimulus filter heterogeneity for network sizes 2, 4, and 6. (B, D, F) Same respective network sizes and random samples, except plotting the Hessian-weighted error. See section 4 for details on how random samples are generated.

Figure 3:

Decoding error as a function of intrinsic cell heterogeneity within the network. Results from a large, random sample of various network sizes, where the cell is randomly selected (out of seven) and their GLM attributes ($k,h,b$) trained on a random time period before the decoding time segment. Note that a given network of cells decodes stimuli in the same time segment. (A, C, E) The $l1$-norm of the error (Error) as a function of stimulus filter heterogeneity for network sizes 2, 4, and 6. (B, D, F) Same respective network sizes and random samples, except plotting the Hessian-weighted error. See section 4 for details on how random samples are generated.

For the unweighted Error, the $R2$ values are somewhat notable, especially for larger networks (see Table 1), while for Hessian-weighted error, the $R2$ values are generally much smaller (see Table 2) but an intermediate minimum is still found. So when uncertainty of the decoder is accounted for, heterogeneity of the network has less influence on decoding.

Besides the differences between the $l1$-norm of error and the Hessian-weighted error, note that all of these quadratic models are statistically significant with minuscule $p$-values due to the large sample size, but the actual fits can be of questionable merit (e.g., $R2=0.0019$ for stimulus filter heterogeneity versus the Hessian-weighted error at network size $N=3$). This contrast demonstrates the usefulness of quality-of-fit metrics when employing regression to reveal a fuller story. With Hessian-weighted error, the intermediate levels of heterogeneity for optimal decoding are not as robust compared to the $l1$-error. In particular, for postspike filter heterogeneity for network sizes $N=2$, 3, 4, 5, the optimal level of heterogeneity is not in the domain (see Figures A4 and A5 and Table 2). However, a similar limitation in explanatory power occurs for postspike filter heterogeneity against the $l1$-norm; although the optimal heterogeneity value is in the interior for $N=3$, 4, the heterogeneity values are relatively small (see Figure A2). In general, this suggests possible limitations of postspike filter heterogeneity as an overall characterization of network heterogeneity.

### 2.2  Heterogeneity versus Correlation

Besides evaluating heterogeneity levels that minimize error, we determine decoding effectiveness by measuring the Pearson's correlation between the actual and decoded stimuli, where larger correlation values coincide with better decoding. While error measures how closely the decoded stimulus matches the actual stimulus in precision, correlation measures how well the decoded stimulus covaries with the actual stimulus, with less emphasis on the precise values of the stimuli. It is conceivable that downstream neurons do not need to have the exact copies of the incoming stimuli to propagate the necessary information, so the error metrics may be too demanding. Ostojic and Brunel (2011) used Pearson's correlation to measure the accuracy between the predicted instantaneous firing rate of GLM-like model with simulations of a spiking model; René et al. (2020) also used Pearson's correlation to measure the accuracy of a mesoscopic mean-field-like reduction of a large spiking model. One issue with correlation is that two time series can have high correlation but the basic statistics (i.e., mean, variance) can be very different (see the Methods section in Ostojic & Brunel, 2011). Thus, the correlation coefficient is not intended to replace error as a measure in our study but rather to provide more thorough information. Although both correlation and error as measures of decoding accuracy have potential flaws, they complement each other well (Ostojic & Brunel, 2011; René et al., 2020).

Table 1:
Statistics of Quadratic Fit to (Average) $l1$-Norm of Error as a Function of Stimulus Filter ($k$) and Postspike Filter ($h$) Heterogeneity.
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0293 0.0140 0.0347 0.0534 0.0725 0.0974
$p$-value for $F$-statistic versus Constant $3.9×10-42$ $1.8×10-20$ $7.1×10-50$ $4.9×10-77$ $2.2×10-105$ $3.9×10-143$
$h$ $R2$ 0.0103 0.01794 0.0606 0.0382 0.0530 0.0888
$p$-value for $F$-statistic versus Constant $3.6×10-15$ $6.7×10-26$ $1.4×10-87$ $6.7×10-55$ $2.3×10-76$ $5.2×10-130$
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0293 0.0140 0.0347 0.0534 0.0725 0.0974
$p$-value for $F$-statistic versus Constant $3.9×10-42$ $1.8×10-20$ $7.1×10-50$ $4.9×10-77$ $2.2×10-105$ $3.9×10-143$
$h$ $R2$ 0.0103 0.01794 0.0606 0.0382 0.0530 0.0888
$p$-value for $F$-statistic versus Constant $3.6×10-15$ $6.7×10-26$ $1.4×10-87$ $6.7×10-55$ $2.3×10-76$ $5.2×10-130$

Note: The number in bold corresponds to quadratic fits where the optimal level of heterogeneity is not in the interior.

Table 2:
Statistics of Quadratic Fit to (Average) Hessian-Weighted Error as a Function of Stimulus Filter ($k$) and Postspike Filter ($h$) Heterogeneity.
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0068 0.0019 0.0060 0.0191 0.0315 0.0401
$p$-value for $F$-statistic versus Constant $3.6×10-10$ 0.0022 $4.5×10-9$ $1.5×10-27$ $3.0×10-45$ $1.6×10-57$
$h$ $R2$ 0.0014 0.0020 0.0101 0.0171 0.0181 0.0356
$p$-value for $F$-statistic versus Constant 0.0032 $3.8×10-4$ $7.2×10-15$ $1.2×10-24$ $4.3×10-26$ $4.5×10-51$
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0068 0.0019 0.0060 0.0191 0.0315 0.0401
$p$-value for $F$-statistic versus Constant $3.6×10-10$ 0.0022 $4.5×10-9$ $1.5×10-27$ $3.0×10-45$ $1.6×10-57$
$h$ $R2$ 0.0014 0.0020 0.0101 0.0171 0.0181 0.0356
$p$-value for $F$-statistic versus Constant 0.0032 $3.8×10-4$ $7.2×10-15$ $1.2×10-24$ $4.3×10-26$ $4.5×10-51$

Note: The numbers in bold correspond to quadratic fits where the optimal level of heterogeneity is not in the domain of heterogeneity.

Pearson's correlation of the two time series directly is inappropriate because the assumption of independent observations is violated when time series have nontrivial autocorrelations. The Pearson's correlation (also the cross-correlation at lag 0) is affected by the cross-correlation and autocorrelations between the two time series at other lags. As a result, these Pearson's correlation coefficient estimates are mathematically biased relative to the true correlation strength, so we use a method known as prewhitening (Brockwell, Davis, & Calder, 2002) to be statistically rigorous. Essentially, if we fit an effective model to a time series, the residuals should be a white noise process. That is, for the two time series $Yt$ and $Xt$ and their corresponding best-fit models $Y^t$ and $X^t$, the following should be two white noise processes: $et=Yt-Y^t$ and $wt=Xt-X^t$. The Pearson's correlation between $et$ and $wt$ (also their cross-correlation function at lag 0) would be the unbiased estimate of the correlation strength between the two original time series. (For more details and the method of model fitting used on all samples, see section 4.)

After prewhitening, we fit a quadratic model between heterogeneity and adjusted (prewhitened) correlation, which is found to be statistically significant with all network sizes and for all heterogeneity types, albeit using a higher significance level ($α=0.05$) (see Table 3 for stimulus filter and postspike filter heterogeneity; see Table A2 for all). As with the two error types, we see fluctuating $R2$ values as population size increases, with the best fits found with a population of size 6 or 7. However, $R2$ values for the different heterogeneity types and for different network sizes are all relatively small, sometimes even less than $R2=0.01$ and never more than $R2=0.04$; the plots in Figure 4 confirm these results. For networks with two cells ($N=2$), adjusted correlation appears to be fairly constant across the domain of stimulus heterogeneity, confirmed by the relatively flat quadratic curve and the weak quality of fit ($R2=0.0018$), despite the quadratic model being “statistically significant.” The curve is more pronounced, though, for $N=4$ and $N=6$, in line with larger (yet still small) $R2$ values ($R2=0.0076$ and $R2=0.0212$, respectively).

Note that as the network size increases,4 we see fewer homogeneous networks, with minimal data for stimulus filter heterogeneity less than 0.15 at $N=4$ and $N=6$. We also see a diminishing number of extremely heterogeneous networks, as the distribution of heterogeneity becomes tighter toward intermediate heterogeneity (albeit still positively skewed). Also, with the other heterogeneity measures, the standard deviation generally decreases as population size increases (see Figures A6 and A7). The statistical significance of these models suggests that there is still some underlying dependence on heterogeneity, but the effect is not strong and we see that network size also has an interacting effect. We will next explore the notable direct effects that population size has on decoding efficiency on error types and adjusted correlation.

Table 3:
Statistics of Quadratic Fit to Correlation as a Function of Stimulus Filter ($k$) and Postspike Filter ($h$) Heterogeneity.
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0018 0.0123 0.0076 0.0149 0.0212 0.0357
$p$-value for $F$-statistic versus Constant 0.003 $6.0×10-18$ $2.2×10-11$ $1.4×10-21$ $1.6×10-30$ $2.9×10-51$
$h$ $R2$ 0.0199 0.0159 0.0166 0.0156 0.0232 0.0356
$p$-value for $F$-statistic versus Constant $1.1×10-28$ $4.6×10-23$ $5.3×10-24$ $1.4×10-22$ $2.4×10-33$ $4.4×10-51$
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0018 0.0123 0.0076 0.0149 0.0212 0.0357
$p$-value for $F$-statistic versus Constant 0.003 $6.0×10-18$ $2.2×10-11$ $1.4×10-21$ $1.6×10-30$ $2.9×10-51$
$h$ $R2$ 0.0199 0.0159 0.0166 0.0156 0.0232 0.0356
$p$-value for $F$-statistic versus Constant $1.1×10-28$ $4.6×10-23$ $5.3×10-24$ $1.4×10-22$ $2.4×10-33$ $4.4×10-51$

Notes: Correlations were prewhitened for the two time series to ensure unbiased estimates. Bold text corresponds to quadratic fits where the optimal level of heterogeneity is not in the interior.

Figure 4:

Using Pearson's correlation between actual stimulus and estimate ($xMAP$), with prewhitening applied to both (see the main text for justification and section 4 for details), for the same network sizes ($N=2$, 4, 6) as before in Figure 3.

Figure 4:

Using Pearson's correlation between actual stimulus and estimate ($xMAP$), with prewhitening applied to both (see the main text for justification and section 4 for details), for the same network sizes ($N=2$, 4, 6) as before in Figure 3.

Figure 5:

Summary of how decoding generally improves with network size. (A) The average $l1$-norm of error (black) and the minimum of the quadratic fits for stimulus filter heterogeneity (blue), postspike filter heterogeneity (green), and bias heterogeneity (red) all tend to decrease with increasing network size. (B) similar to panel A but with Hessian-weighted error. (C) The panel uses the Pearson's correlation after both are prewhitened. The exceptions to where optimal decoding is at intermediate levels of heterogeneity are in panel B, with postspike filter heterogeneity (green dots) for $N=2,3,4,and5$; in panel C, with postspike filter heterogeneity for $N=5$; and with bias heterogeneity (red dot) for $N=2$.

Figure 5:

Summary of how decoding generally improves with network size. (A) The average $l1$-norm of error (black) and the minimum of the quadratic fits for stimulus filter heterogeneity (blue), postspike filter heterogeneity (green), and bias heterogeneity (red) all tend to decrease with increasing network size. (B) similar to panel A but with Hessian-weighted error. (C) The panel uses the Pearson's correlation after both are prewhitened. The exceptions to where optimal decoding is at intermediate levels of heterogeneity are in panel B, with postspike filter heterogeneity (green dots) for $N=2,3,4,and5$; in panel C, with postspike filter heterogeneity for $N=5$; and with bias heterogeneity (red dot) for $N=2$.

### 2.3  Effect of Population Size on Decoding

We have already observed that the impact of neural heterogeneity on decoding error changes as the network size increases. As more neurons are added to the network, the result is generally better decoding of the stimulus. From Figure 5A, we can see that the $l1$-norm of error, averaged over all samples (black), decreases with increased population size, though the decrease is reduced and saturates around $N=6$. Also the minimum of the quadratic fit decreases for all three forms of heterogeneity (colored curves) except for $N=5$ to 6 for postspike filter heterogeneity (green). With a one-way analysis of variance (ANOVA), the mean difference in error between population sizes is statistically significant ($F=450.18$, $p$-value $≈$ 0). Notably, this is not simply an artifact of massive sample sizes, as the effect size is moderate ($η2=0.059$). We observe a significant difference in pairwise comparisons for population sizes 2 through 6, measured via the Tukey honest significant difference (HSD) test (Montgomery, 2017), but there is not a significant difference between population sizes 6 and 7 as the decrease in error saturates. The results are similar with the Hessian-weighted error (see Figure 5B, black curve); an increased number of cells leads to more effective decoding, but the marginal improvement diminishes with increased population size. As before with the $l1$-norm of error, the impact of increased population is still statistically significant ($F=108.58$, $p$-value $=$$2.887×10-114$), but the practical impact is diminished, with a small effect size of $η2=0.014$. In addition, based on the Tukey HSD test, population sizes 4 and 5 do not appear to be significantly different from one another, and population sizes 5, 6, and 7 do not appear to be significantly different as the marginal improvement in error declines faster for Hessian-weighted error than for the $l1$-norm of error. Once again, the comparison between the types of error reveals how uncertainty affects these measures of average error. Average decreases in error still occur when incorporating uncertainty about the MAP estimates, but some of network size's explanatory power on error is lost when adjusting for uncertainty.

With prewhitened correlation values, we assess the impact of population size with another ANOVA test and set of Tukey HSD tests. All networks are independently and randomly sampled from the time series, but a Pearson's correlation will have a skewed distribution whenever it is not centered around $ρ=0$. Thus, a Fisher $z$ transformation is necessary to create approximately normal distributions for each group; the transformed value for Pearson's correlation coefficient $r$ is simply $z=arctanh(r)$ (Vrbik, 2005). In Figure 5C, we see a similar improvement in decoding as the network size increases, which we find to be statistically significant with ANOVA on our transformed values ($F=1411.64$, $p$-value $≈$ 0) with a notably large effect size ($η2=0.181$), much larger than the effect size than previously observed with $l1$-norm and Hessian-weighted errors. The prewhitened correlations are significantly different among all population sizes (albeit using very large sample sizes), and the correlation strength improves at each step up in network size with a similar diminishing marginal improvement as the two error types. The exceptions to these results are specifically for bias heterogeneity for $N=2$ and postspike filter heterogeneity for $N=5$ (see Figure 5C, red and green, respectively) where the optimal level of heterogeneity occurs outside the domain and not at intermediate values (see Figures A6 and A7, and blue entries in Table A2). Overall, the prewhitened correlation values for the networks prove to be useful metrics when assessing the impact of population size on decoding efficiency.

### 2.4  Remarks about Time Intervals for Encoding and Decoding

We have chosen to set the encoding time period to fit the GLM and decoding period to be fixed for all figures (1397.5 ms and 417.5 ms). To the best of our knowledge, there does not appear to be a systematic or algorithmic way of determining ideal time lengths for these entities, or for the time bins (we have chosen 2.5 ms). We perform many simulations to assess how the statistics of the errors ($l1$-Error and Hessian-weighted error) changes with encoding and decoding time intervals, while neglecting the effects of heterogeneity that should be qualitatively similar (see Figure 6). For these simulations, we generate 6500 random networks before, but fixing $N=4$ because it is an intermediate network size that has the most possibilities for choosing different cells.

Figure 6:

Showing how error ($l1$-error Error in panels A and C and Hessian-weighted error in panels B and D varies with different encoding time (to fit all GLM) and decoding time for $N=4$ cell networks. (A, B) Box plots of the errors for five encoding times (each encoding time corresponds to a color) as a function of the decoding time in seconds. The times we use throughout the letter (green, indicated by arrow) have very similar box plots to larger encoding times and larger decoding times, evidence that our choice might be large enough for convergence. (C, D) Similar box plots but grouping the decoding time period as a percentage of the encoding time period; following that out-of-sample (decoding) time period is much less than in-sample (encoding). Box plots: Shaded rectangles show interquartiles ranging from the 25th to 75th percentile, the middle bar is the median (50th percentile), and the whiskers capture the entire range excluding any outliers, out of 6500 randomly generated networks.

Figure 6:

Showing how error ($l1$-error Error in panels A and C and Hessian-weighted error in panels B and D varies with different encoding time (to fit all GLM) and decoding time for $N=4$ cell networks. (A, B) Box plots of the errors for five encoding times (each encoding time corresponds to a color) as a function of the decoding time in seconds. The times we use throughout the letter (green, indicated by arrow) have very similar box plots to larger encoding times and larger decoding times, evidence that our choice might be large enough for convergence. (C, D) Similar box plots but grouping the decoding time period as a percentage of the encoding time period; following that out-of-sample (decoding) time period is much less than in-sample (encoding). Box plots: Shaded rectangles show interquartiles ranging from the 25th to 75th percentile, the middle bar is the median (50th percentile), and the whiskers capture the entire range excluding any outliers, out of 6500 randomly generated networks.

Figures 6A and 6B show error distributions via box plots for five different encoding time periods (each color corresponding to a fixed time), as a function of the decoding time. The median error and spread of errors (whiskers) generally decrease as times increase, perhaps an indication of convergence. Figures 6C and 6D present a similar plot but group the decoding time as a percentage of the five different encoding times; again we see for these values that increasing encoding times leads to lower median error and less spread of errors (whiskers). Our choice throughout this letter appears to have very similar box plots compared to larger encoding times and larger decoding times.

## 3  Discussion

Addressing how neurons in various sensory systems achieve coding efficiency has been an active area of investigation both theoretically and experimentally. Using in vivo voltage recordings from a weakly electric fish electrosensory system and a frequently used Bayesian statistical model (GLM), we analyzed how intrinsic heterogeneity of uncoupled networks of distinct neurons affects decoding from downstream neurons. We leveraged the long electrophysiological recordings to obtain large data sets for each recorded cell that enabled us to address this question with thorough and proper statistical analyses, considering three important metrics of decoding accuracy. We found that intermediate levels of heterogeneity robustly lead to the best decoding performance, consistent with the results of Tripathy et al. (2013). However, an important caveat is that the relative decoding performances were quite noisy and that a quality-of-fit metric such as $R2$ should always come with conclusions about statistical significance.

This letter demonstrates a careful statistical application on data from the electrosensory system provided to us from Marsat and Maler (2010) that may be instructive for other neural systems, but we did not address some details of heterogeneity and coding specific to this system. In the electrosensory system, prior work has shown that neural attribute heterogeneities of P-units that are a prelude to ELL have a strong effect on some of the response properties: with fast frequency stimuli and a slower envelope frequency, only the slower envelope frequency responses were affected by heterogeneities, while the fast frequency P-unit responses are invariant to neural heterogeneity (Metzen & Chacron, 2015). Also, the heterogeneity of ON and OFF cell types in the ELL generally led to improved population coding of envelope signals (Hofmann & Chacron, 2020); here, the data provided consisted only of ON cells. The stimuli we considered consist of random amplitude modulations with gaussian statistics rather than more natural stimuli with faster and slower envelope frequencies. This more natural stimuli are known to elicit heterogeneous responses of ELL pyramidal cells where some inputs are “cancelled,” perhaps by parallel fiber inputs to ELL (Bol, Marsat, Harvey-Girard, Longtin, & Maler, 2011; Bol, Marsat, Mejías, Maler, & Longtin, 2013; Mejias, Marsat, Bol, Maler, & Longtin, 2013). These cancellation mechanisms, which are likely a large factor for the resulting ELL cell heterogeneity, were not addressed here but would be interesting to further explore with a more anatomically realistic Bayesian model. The GLM assumes all statistics are stationary when in fact the electrosensory system is known to have longer timescales (plasticity) that can violate these assumptions. However, nonstationarities from envelope frequencies or up-down dynamics that are slow (Melanson, Mejias, Jun, Maler, & Longtin, 2017) may not appreciably alter the GLM decoding performance because the statistics are close to stationary.

Although we only have recordings from 7 distinct neurons, and it is likely that more distinct sensory neurons are required for real coding of signals, note that the exact number of relevant neurons in the lateral segment of the ELL to code stimuli (and indeed many other brain regions) is unknown. Our work is a theoretical exploration of the signatures of decoding in a subset of the population over relevant lengths of time, where individual neurons could have learned how to encode in prior time periods. The use of these 7 cells and encoding models fit to different time periods results in appreciable ranges of heterogeneity. Our use of 7 ELL pyramidal cells is on a similar order or scale with other published works on this system: 12 cells in Mejias et al. (2013) across different animals, 15 cells in Ly and Marsat (2018), 5 cells in Bastian (1986), 12 cells (and 9 pairs) in Simmonds & Chacron (2015), and 14 cells in Huang, Zhang, and Chacron (2016) but much smaller than others: 50 cells in Hofmann and Chacron (2020), and 77 cells in Chacron, Maler, and Bastian (2005). With the limited number of distinct neurons and all aforementioned limitations, we still observe effective neural coding at intermediate levels of heterogeneity.

Like Tripathy et al. (2013), we developed encoding and decoding models for uncoupled networks and used the same measures for heterogeneity. We observed a similar quadratic relationship between heterogeneity and the $l1$-norm of error and confirmed these results as statistically significant, but we did not see the same diminishing impact of heterogeneity as the population size increased. In fact, the best quadratic fits occurred at relatively larger network sizes 6 and 7. Tripathy et al. (2013) do not address the practical significance of the quadratic relationship as thoroughly as we did here; they relied on statistical significance in a case of large sample sizes where small $p$-values are more easily attained. Though statistical significance is an important baseline, indicative of a relationship, quality-of-fit metrics such as $R2$ are also important, especially with the noisy data naturally produced in neural networks. Tripathy et al. (2013) used trial-averaged data, which helps to mitigate the noisiness of neurons in order to discern underlying dynamics in the encoding model. Our focus was on the arguably more biologically realistic case of single trials in a long time series. Of course there are merits to trial averaging, especially for encoding models, and one might observe larger $R2$ values in this framework. Tripathy et al. (2013) also employed greedy search methods to produce a more uniform spread of heterogeneity, while we chose a simpler uniform random selection that did not result in the same uniformity of heterogeneity, but did reveal natural clustering at intermediate levels of heterogeneity, particularly as network size increased.

We also incorporated additional metrics to analyze decoding efficiency. Since Bayesian methods also quantify uncertainty, we implemented Hessian-weighted absolute error. By comparing this error metric to the $l1$-norm of error, we could observe the weakening of the quadratic regression models and the ANOVA models when weighting error with uncertainty. Such a phenomenon implies that heterogeneity and population size are explaining some of the same neural dynamics as that explained by the relative level of uncertainty in the decoded stimuli, though properly exploring this is beyond the scope of our study. Finally, since decoding of noisy spiking outputs will naturally produce some error that can only be partially accounted for by heterogeneity or network size, we considered Pearson's correlation between the actual and decoded stimulus time series for thoroughness. Perhaps downstream neurons do not need to have the exact copies of the incoming stimuli to decode, in which case the prior two error measures could be misleading. The correlation metric is problematic in isolation because there can a high correlation but drastically different values. Following Ostojic and Brunel (2011) and René et al. (2020), we used several different metrics to provide a fuller picture of our results. Despite these different measures of decoding, we generally have the same results.

This letter has focused on the intrinsic heterogeneity between neurons in terms of how they code stimuli and how their own spiking dynamics vary. The data set we use contains separate recordings of an awake animal, not simultaneous recordings, which necessitated our assumption of uncoupled networks. Coupled networks are more challenging and would certainly require considering other forms of heterogeneity, for example, network heterogeneity that can interact nonlinearly with intrinsic heterogeneity (Ly, 2015; Ly & Marsat, 2018; Wendling & Ly, 2019), and also a more complicated GLM with coupling (Pillow et al., 2008). Addressing this research question with simultaneous recordings in coupled networks is an important future direction but beyond the scope of this study.

The GLM employed here allows efficient and robust computations of both the encoding model and decoding of input signals, in contrast to biophysical models or integrate-and-fire type spiking models where a maximum a posterior estimate of the input stimulus is computationally expensive to calculate, especially with time intervals we considered with hundreds of dimensions for each network (Ladenbauer, McKenzie, English, Hagens, & Ostojic, 2019). There has been a plethora of theoretical advances to capture how noisy stimuli nonlinearly alter the statistics of spiking in biophysical models (incomplete list: Dayan & Abbott, 2001; René et al., 2020), but such models are less suitable for the framework here because GLM enables robust fits to data and stimulus estimates for all model networks we considered. However, Ladenbauer et al. (2019) recently used integrate-and-fire type models, a Bayesian framework, and maximum likelihood to successfully estimate network inputs, as well as various intrinsic and network connectivity attributes, with validations from in vivo and in vitro data. These authors developed methods to infer precise statistical quantities of neural networks that did not rely on particular realizations of the data or models, whereas we focused on estimating the particular realizations of noisy stimuli from particular spike trains in the data, repeating this procedure for each network.

Although we considered a single time series for each neuron without trial-averaged data, we generally observe an increase in performance as population size increases. This observation is consistent with theories of population coding, where larger network sizes are known to theoretically increase the Fisher information and accuracy of an optimal linear estimator (Shamir & Sompolinsky, 2006), among other metrics. Others have studied how Fisher (Yarrow et al., 2012; Sompolinsky, Yoon, Kang, & Shamir, 2001) and mutual information (Yarrow et al., 2012) change (increase) with population size, results that are at least consistent with what we observe here even though we only consider networks up to size 7. These and other studies (Sompolinsky et al., 2001; Shamir & Sompolinsky, 2006; Ecker, Berens, Tolias, & Bethge, 2011; Yarrow et al., 2012) theoretically accounted for the effects of noise correlations along with heterogeneity and population size, an issue we do not address here because of separate cellular in vivo recordings. A recent study by Gjorgjieva, Meister, and Sompolinsky (2019) found with uncoupled GLM-like models with a threshold nonlinearity that in addition to homogeneous networks performing poorly, intermediate proportions of ON/OFF cells performed optimally via maximizing mutual information and minimizing error with the optimal linear estimator. This study is consistent with the results here.

## 4  Methods

Matlab code implementing these methods can be found here: https://github.com/wendlingk/NeuroGLM. The code is based on previously written code by Pillow et al. (2008), Park, Meister, Huk, and Pillow (2014), and Tripathy et al. (2013).

The experimental data provided by the Marsat Lab from a prior paper: Marsat and Maler (2010). All seven cells are from the same animal from the same area, termed lateral segment, and are all ON cells subject to spatially global random amplitude modulation (RAM) stimuli.

### 4.1  Generalized Linear Model

We adopt previous GLM construction detailed in Pillow et al. (2011) without coupling, where neural spike trains are modeled by an inhomogeneous Poisson process, with 1 representing a spike and 0 representing no spike. A key function that represents the expectation of spiking is $λ(t)$, the rate intensity at time $t$, also known as the instantaneous firing rate. The function $λ(t)$ contains components that relate nicely to some of the underlying physiology: stimulus filtering and dependence on spike history, and a constant (bias) shift $b$. The stimulus vector $xt$ is transformed by a temporal filter called the stimulus filter$k$, which is a series of coefficient estimates for different lags in time. Dependence on spike history is modeled by another temporal filter, the postspike filter $h$, which captures the refractory period and other dynamics (i.e., adaptation, depression). Since $h$ also affects the spike train $rt$, one can think of the instantaneous firing rate as $λ(t|xt,rt)$, where $xt$ and $rt$ are vectors in the time window immediately prior to time $t$.

The encoding part of the GLM requires estimating $β=[b;k;h]$, where $b$ is the constant bias correction and $k$ and $h$ are the two filters. The model matrix is $A=[1XR]$, where $X$ is the stimulus matrix and and $R$ is the spike history matrix $R$. For time $t$, let the stimulus and postspike history vectors be $xt$ and $rt$, respectively, and a single row of the model matrix be $At'=[1xt'rt']$. Our linear predictor then is
$At'β=b+xt'k+rt'h.$
(4.1)
The GLM is appealing for both its accuracy and the feature that there exists a global maximum in the maximum likelihood function to estimate the parameters $β$ if the link function between the linear predictor and the response is concave up (Pillow et al., 2011; Tripathy et al., 2013).
As indicated by Myers, Montgomery, Vining, and Robinson (2012), the “canonical link” for a Poisson distribution is the log link ($g(a)=ln(a)$, where $a$ is the response); it ensures nonnegative predictions and has been commonly employed (Pillow et al., 2011). With this link function and the linear predictor for a mean response $μt$ at time $t$, we have $g(μt)=ln(μt)=At'β$. With the inverse function, we can then predict the instantaneous firing rate for time $t$:
$λ(t|xt,rt)=eb+xt'k+rt'h.$
(4.2)
This construction is a special GLM known as a Poisson regression model more generally in statistics (Myers et al., 2012) and as a linear-nonlinear Poisson (LNP) cascade model in the context of neural encoding models. In neural coding literature, the log link is referred to as the static or point nonlinearity (Paninski, 2004; Simoncelli, Paninski, & Pillow, 2004; Pillow, 2007).

#### 4.1.1  Maximum Likelihood Estimators for the GLM

With a specified response distribution, linear predictor, and link function, we use maximum likelihood estimators (MLEs) to estimate $βi=[bi;ki;hi]$ for each of the $N$ cells for a time duration $T$. Since $μt=eAt'β$ for a single cell at time $t$, across a whole time segment, $μ=∫0TeAt'βdt$. Note also that $Ai=[1XiRi]$ is the model matrix for each cell, so $At,i'$ represents row $t$ of neuron model $i$. With sample responses $rt,i$ from vector $rt,i$ and means $μi$, the likelihood function that utilizes the PDF of a Poisson distribution is
$L(β,xt)=p(rt|xt,β)=∏i,tfi(rt,i)=∏i,tμirt,ieμirt,i!=∏i,tμirt,iexp-∑i,tμi∏i,trt,i!.$
Maximizing this function would be difficult, but a useful principle of MLEs it that we can extract the same MLE of $β$ using the log transform of the likelihood function, after which, we substitute $μi$ with $μi=∫0TeAt,i'βidt$:
$ln[p(rt|xt,β)]=∑i,trt,iln(μi)-∑i,tμi-∑i,tln(rt,i!)=∑i,trt,iln∫teAt,i'βidt-∑i,t∫teAt,i'βidt-∑i,tln(rt,i!)=∑i,trt,i∫tlneAt,i'βidt-∑i,t∫teAt,i'βidt-∑i,tln(rt,i!)=∑i,trt,i∫tAt,i'βi+ln(dt)-∑i,t∫teAt,i'βidt-∑i,tln(rt,i!).$
(4.3)
The first term is multiplied by the spike train value at time $t$ for cell $i$, so it will only be nonzero at spike times $tα,i$. There are constant terms that we ignore because they do not affect $βi$: $ln(dt)$ and the last summation term, which collectively we denote with $C$. We then substitute in the linear predictor for $At,iβi$ and get
$lnp(rt|xt,β)=∑i,trt,iAt,i'βi-eAt,i'βidt+C$
(4.4)
$=∑i,trt,i(bi+xt,i'ki+rt,i'hi)-ebi+xt,i'ki+rt,i'hidt+C.$
(4.5)
This has to be calculated numerically. A common estimation method for GLMs is iteratively reweighted least squares (Myers et al., 2012). Pillow et al. (2011) specifically use the conjugate gradient method for their LNP cascade models. We will use a quasi-Newton algorithm3 as our optimization routine for the function fminunc in Matlab. This method employs the direction of steepest descent, determined by the negation of the log-likelihood gradient $∇f(β)=∂∂β(ln[p(rt|xt,β)])$, and it also constructs the Hessian matrix, the negation of the second derivative of the negative log likelihood. Since the neurons are uncoupled, the GLM parameter estimates can be calculated independently for each cell. Differentiating in terms of $βi$ for cell $i$, the gradient and the Hessian for the likelihood are the following:
$∇if(βi)=∑trt,iAt,i'-At,i'eAt,i'βidt$
(4.6)
$Hi=∑tAt,ieAt,i'βiAt,i'dt.$
(4.7)
The algorithm iterates down the steepest descent determined by the negation of the gradient until the gradient is approximately zero. At this point, we have the following structure with some constant $Di$, a Hessian matrix $Hi$, and an optimal set of parameter estimates $βi*$:
$∇if(βi)=Hiβi*+Di=0.$
(4.8)
Thus, the optimal set of parameter estimates is $βi*=-Hi-1Di$.

#### 4.1.2  Optimal Filter Lengths

Both the stimulus filter and the postspike filter are implemented as vectors; here we describe the method used to set the optimal vector lengths. If the vectors are too short, we risk excluding relevant information and biasing the predictions. However, if they are too long, we include variables that do not have strong predictive power and risk overfitting the training data. We minimize the negative log likelihood (NLL) over a large random sample. We split the time series into 10 training sets, and in each set, we systematically consider combinations of stimulus filter lengths and postspike filter lengths. Note that 2NLL follows a chi-squared distribution and independent chi-squared variables can be summed. Since the 10 training sets are independent of one another, we add all of the NLL values for the training sets to get a summed negative log-likelihood value for every filter length combination. From the heat map in Figure A1C, we can see that the optimal stimulus filter and postfilter combinations are at times $80ms$ and $90ms$ for the stimulus filter and time $180ms$ for the postspike filter.4 We choose to use fewer coefficients in the stimulus filter for computational efficiency.

### 4.2  Decoding of Stimulus

For each cell in the uncoupled network, the GLM is constructed from a random segment in time of length $200ms$ (80 time bins of $2.5ms$ each), where each cell could be fit to a different time period. The decoding of the stimulus, or MAP estimate, is performed on a random $417.5ms$ time interval after the latest time segment used to fit the model. Note that a given network of cells decodes stimuli in the same time segment.

To construct the prior stimulus, we incorporate the autocorrelation structure from the stimulus used in the training set. First, we randomly generate a normally distributed time series $x0$ with the same parameters as the training stimulus ($N(0,1)$). Then, after calculating a Cholesky factorization on one of the training covariance matrices, $C=LTL$, we produce an instance of the prior stimulus, $xprior=Lx0$ that is used as the starting point for the optimization algorithm. The log likelihood of the prior, given that it is gaussian with mean $μ=0$ and covariance matrix $V$, can be calculated as follows:
$L(μ,V)=p(xt|μ,V)=∏i=1Nf(xt)p(xt)=∏i=1N(2π)-1/2V-12exp-12xt,i'V-1xt,ilnp(xt)=-N2ln(2π)-12ln(V)-∑iN12xt,i'V-1xt,i.$
(4.9)
Recall the implementation of Bayes's rule in equation 2.2. In order to determine the log likelihood of the posterior, we will take the log of both sides of the equation, substitute $p(xt|μ,V)$ from equation 4.9 and $p(rt|xt,β)$ from equation 4.4, and label terms invariant in $xt$ as constant $C$:
$lnp(xt|rt)=lnp(rt|xt,β)+lnp(xt|μ,V)-lnpi(rt,i|λ)=-∑i,trt,i(bi+xt,i'ki+rt,i'hi)-(ebi+xt'ki+rt,i'hi)dt+N2ln(2π)+12ln(V)+∑i12xt,i'V-1xt,i+lnpi(rt,i|λ)=∑i,t(ebi+xt,i'ki+rt,i'hi)dt-rt,ixt'ki+12xt,i'V-1xt,i+C.$
(4.10)
We want to determine what stimulus $xt$ minimizes the negation of the log-likelihood function given above (i.e., the mode vector that would maximize the posterior likelihood $p(xt|rt,β,μ,V)$). The methodology is similar to the MLE for the encoding GLM. As before, we employ the quasi-Newton algorithm with the Matlab function fminunc, but here we incorporate our prior $xprior$ as an initial condition for the iterative process that arrives at $xMAP$. For an individual neuron, the gradient and the Hessian for the negative log likelihood of $p(xt|rt)$ are the following:
$∇if(xt)=ki(ebi+xt'ki+rt,i'hi)dt-rt,iki+V-1xt,$
(4.11)
$Hi=ki(ebi+xt'ki+rt,i'hi)ki'dt+V-1.$
(4.12)
As the neurons are reconstructing a composite stimulus, the gradients and the Hessian matrices for all $N$ cells in a population are summed together for the iterative process. As before, the algorithm iterates down the steepest descent to minimize negative log likelihood. At the completion of the routine (when the composite gradient is approximately zero), the gradient has the following structure with some constant vector $D$, a composite Hessian matrix $H$, and an optimal stimulus $xt*$:
$∑i=1N∇if(xt)=Hxt*+D=0.$
(4.13)
Thus, the optimal decoding of the stimulus is $xt*=-H-1D$.

### 4.3  Prewhitening of Time Series for Unbiased Correlations

Prewhite-ning the actual and decoded time series is accomplished by transforming it to a white noise process. Given a time series $Yt$ and a model $Y^t$, we can transform the time series into white noise $et=Yt-Y^t$, assuming that we have a well-fit model. The best-fit model likely involves a similar number of time series parameters for both the actual and decoded stimulus, given their similarities.

Before we can fit an ARIMA time series model, we must confirm the stationarity of the time series, selecting the augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (adftest and kpsstest in Matlab) for our purposes. The null hypothesis of the ADF test is that there is a unit root present in the time series, while the alternative hypothesis indicates a type of stationarity. Meanwhile, the KPSS test is reversed, with the alternative hypothesis indicating a unit root, while the null hypothesis is that the time series is trend-stationary. Essentially, for each time series, we want to reject the null hypothesis of the ADF test, using significance level $α=0.001$, and fail to reject the null hypothesis of the KPSS test, using $α=0.1$. The contrasting significance levels make it difficult to confirm the alternative hypothesis of stationarity with the ADF test and easier to confirm the alternative hypothesis of a unit root process with the KPSS test, so we can carefully ensure stationarity. When we run the ADF test on all the actual and decoded stimuli, we reject the null hypothesis of a unit root in almost all cases. When conducting the KPSS test on the same sample of time series, though, we reject the null hypothesis of trend-stationarity a notable number of times in favor of the alternative, a unit root present in the time series. Since we have some evidence for nonstationarity, we next take the difference of every time series ($Yt-Yt-1$ for $t=2,…,T$) to resolve this issue while maintaining consistency across the entire sample of networks. Once we have differenced all time series, we find desirable results across the whole sample of time series, with all rejecting the null hypothesis with the ADF test and all failing to reject the null hypothesis with the KPSS test.

With consistent stationarity confirmed on once-differenced ($d=1$) time series, we fit an ARIMA model $ARIMA(p,d=1,q)$ to $Yt-Yt-1$ for each of them. From a common example in Figure A1B, we can see significant lags in the autocorrelation function (ACF) before a rapid drop-off to insignificance beyond $5ms$, while the partial autocorrelation function (PACF) shows a dampened oscillatory pattern of significant lags before insignificance past $8ms$.5 These dynamics together suggest that we should employ an even number of autoregressive lags $p$ to capture the oscillatory pattern, with a complementing number of moving average lags $q$. Here, the best model constructions tend to have half as many moving average lags as autoregressive lags, likely due to this oscillatory pattern. Sampling and analyzing a subset of time series, we find the best fit is $ARIMA(6,1,3)$ or $ARIMA(8,1,4)$, followed by $ARIMA(4,1,2)$. Our goal is to efficiently fit thousands of time series with reliable models, so we use a try and catch routine within our iterations to try to fit an ARIMA model, replacing one ARIMA structure with another if matrix invertability or stability issues occur. With a goal of balancing quality of fit with parsimony, we select $ARIMA(6,1,3)$ as our initial attempt, with each time series using the estimate function in Matlab. If a model fails with this attempt, we next attempt $ARIMA(8,1,4)$ and then finally, $ARIMA(4,1,2)$.

Every time series is effectively fit to one of these three ARIMA models. Using the infer function, we then capture the residual vector obtained from subtracting these models from the actual values. This prewhitening method results in residual vectors that are white noise processes. The Pearson's correlation (or the cross-correlation function at lag 0) for these pairs of residual vectors is an unbiased estimate of the linear correlation strength (Brockwell et al., 2002).

### 4.4  Remarks for Heterogeneity Analysis

For a given network size ($N=2$, 3, 4, 5, 6, 7), 6500 random networks were generated, but our completely random selection resulted in extreme values of heterogeneity that we considered as outliers. Thus, we estimated the CDF of the heterogeneity values using a kernel smoothing function (ksdensity in Matlab with bandwidth set to 0.1), and removed the top 1.5 percentile of points. There were at least $98.5%$ of points that remained (see Table 4 for the total number of points removed out of 6500).

Table 4:
After Estimating the CDF with Kernel Smoothing, the Total Number of Points Removed in the Tail Out of 6500 Random Samples Keeping the Lower 98.5 Percentile.
Heterogeneity \Network Size234567
$k$ 87 73 94 97 95 98
$h$ 96 96 94 91 95 93
$b$ 91 83 97 94 96 92
Heterogeneity \Network Size234567
$k$ 87 73 94 97 95 98
$h$ 96 96 94 91 95 93
$b$ 91 83 97 94 96 92

Note: Remaining heterogeneity values are in Figures 3, 4, and A2 to A7.

## Appendix

Table A1:
Statistics of Quadratic Fit to (Average) $l1$-Norm of Error (top) and Hessian-Weighted Error (Bottom) as a Function of Heterogeneity for Stimulus Filter $k$, Postspike Filter $h$, and Constant $b$.
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0293 0.0140 0.0347 0.0534 0.0725 0.0974
$p$-value for F-statistic versus Constant $3.9×10-42$ $1.8×10-20$ $7.1×10-50$ $4.9×10-77$ $2.2×10-105$ $3.9×10-143$
$h$ $R2$ 0.0103 0.01794 0.0606 0.0382 0.0530 0.0888
$p$-value for F-statistic versus Constant $3.6×10-15$ $6.7×10-26$ $1.4×10-87$ $6.7×10-55$ $2.3×10-76$ $5.2×10-130$
$b$ $R2$ 0.0141 0.0164 0.0357 0.0577 0.0620 0.0918
$p$-value for F-statistic versus Constant $1.6×10-20$ $9.1×10-24$ $3.3×10-51$ $2.1×10-83$ $1.2×10-89$ $1.3×10-134$
Heterogeneity Network Size
$k$ $R2$ 0.0068 0.0019 0.0060 0.0191 0.0315 0.0401
$p$-value for F-statistic versus Constant $3.6×10-10$ 0.0022 $4.5×10-9$ $1.5×10-27$ $3.0×10-45$ $1.6×10-57$
$h$ $R2$ 0.0014 0.0020 0.0101 0.0171 0.0181 0.0356
$p$-value for F-statistic versus Constant 0.0032 $3.8×10-4$ $7.2×10-15$ $1.2×10-24$ $4.3×10-26$ $4.5×10-51$
$b$ $R2$ 0.0029 0.0047 0.0062 0.0222 0.0290 0.0329
$p$-value for F-statistic versus Constant $8.3×10-5$ $2.5×10-7$ $2.4×10-9$ $6.5×10-32$ $1.4×10-41$ $3.2×10-47$
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0293 0.0140 0.0347 0.0534 0.0725 0.0974
$p$-value for F-statistic versus Constant $3.9×10-42$ $1.8×10-20$ $7.1×10-50$ $4.9×10-77$ $2.2×10-105$ $3.9×10-143$
$h$ $R2$ 0.0103 0.01794 0.0606 0.0382 0.0530 0.0888
$p$-value for F-statistic versus Constant $3.6×10-15$ $6.7×10-26$ $1.4×10-87$ $6.7×10-55$ $2.3×10-76$ $5.2×10-130$
$b$ $R2$ 0.0141 0.0164 0.0357 0.0577 0.0620 0.0918
$p$-value for F-statistic versus Constant $1.6×10-20$ $9.1×10-24$ $3.3×10-51$ $2.1×10-83$ $1.2×10-89$ $1.3×10-134$
Heterogeneity Network Size
$k$ $R2$ 0.0068 0.0019 0.0060 0.0191 0.0315 0.0401
$p$-value for F-statistic versus Constant $3.6×10-10$ 0.0022 $4.5×10-9$ $1.5×10-27$ $3.0×10-45$ $1.6×10-57$
$h$ $R2$ 0.0014 0.0020 0.0101 0.0171 0.0181 0.0356
$p$-value for F-statistic versus Constant 0.0032 $3.8×10-4$ $7.2×10-15$ $1.2×10-24$ $4.3×10-26$ $4.5×10-51$
$b$ $R2$ 0.0029 0.0047 0.0062 0.0222 0.0290 0.0329
$p$-value for F-statistic versus Constant $8.3×10-5$ $2.5×10-7$ $2.4×10-9$ $6.5×10-32$ $1.4×10-41$ $3.2×10-47$

Note: The numbers in bold correspond to quadratic fits where the optimal level of heterogeneity is not in the interior.

Table A2:
Statistics of Quadratic Fit to Pearson's Correlation (with Prewhitening) as a Function of Heterogeneity for Stimulus Filter $k$, Postspike Filter $h$, and Constant $b$.
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0018 0.0123 0.0076 0.0149 0.0212 0.0357
$p$-value for F-statistic versus Constant 0.003 $6.0×10-18$ $2.2×10-11$ $1.4×10-21$ $1.6×10-30$ $2.9×10-51$
$h$ $R2$ 0.0199 0.0159 0.0166 0.0156 0.0232 0.0356
$p$-value for F-statistic versus Constant $1.1×10-28$ $4.6×10-23$ $5.3×10-24$ $1.4×10-22$ $2.4×10-33$ $4.4×10-51$
$b$ $R2$ 0.0008 0.0027 0.0032 0.0048 0.0160 0.0251
$p$-value for F-statistic versus Constant 0.0237 $1.8×10-4$ $3.9×10-5$ $1.9×10-7$ $4.3×10-23$ $4.0×10-36$
HeterogeneityNetwork Size234567
$k$ $R2$ 0.0018 0.0123 0.0076 0.0149 0.0212 0.0357
$p$-value for F-statistic versus Constant 0.003 $6.0×10-18$ $2.2×10-11$ $1.4×10-21$ $1.6×10-30$ $2.9×10-51$
$h$ $R2$ 0.0199 0.0159 0.0166 0.0156 0.0232 0.0356
$p$-value for F-statistic versus Constant $1.1×10-28$ $4.6×10-23$ $5.3×10-24$ $1.4×10-22$ $2.4×10-33$ $4.4×10-51$
$b$ $R2$ 0.0008 0.0027 0.0032 0.0048 0.0160 0.0251
$p$-value for F-statistic versus Constant 0.0237 $1.8×10-4$ $3.9×10-5$ $1.9×10-7$ $4.3×10-23$ $4.0×10-36$

Note: The numbers in bold correspond to quadratic fits where the optimal level of heterogeneity is not in the interior.

Figure A1:

(A) For the raw stimulus sampled at $0.5ms$ and then differenced once to ensure stationarity, the autocorrelation functions (ACF, top) show an autoregressive process on the stimulus values, while the partial autocorrelation function (PACF, bottom) indicates a binary oscillatory pattern with some autocorrelation on the moving average. (B) Similar to panel A but for a once-differentiated stimulus chosen randomly: $yt=xt-xt-1$, for length $200ms$. For prewhitening, chosen models were ARIMA(6,1,3), followed by ARIMA(8,1,4) and ARIMA(4,1,2) in cases where a model of the initial choice could not be constructed. (C) We systematically determined the length of the stimulus filter $k$ and lag of the last postspike basis vector by considering 168 pairs of these values for a network with all 7 cells. We chose the pair that gave the smallest summed negative log likelihood (i.e., the largest maximum likelihood) after fitting to 10 segments of time, each of length of approximately $2sec$. The magenta oval indicates our choice ($180ms$ peak for last basis vector of $h$ corresponds to a total lag of $240ms$; see Figure 1(B); the magenta oval with dashed outline had a similar log likelihood but required more computational resources.

Figure A1:

(A) For the raw stimulus sampled at $0.5ms$ and then differenced once to ensure stationarity, the autocorrelation functions (ACF, top) show an autoregressive process on the stimulus values, while the partial autocorrelation function (PACF, bottom) indicates a binary oscillatory pattern with some autocorrelation on the moving average. (B) Similar to panel A but for a once-differentiated stimulus chosen randomly: $yt=xt-xt-1$, for length $200ms$. For prewhitening, chosen models were ARIMA(6,1,3), followed by ARIMA(8,1,4) and ARIMA(4,1,2) in cases where a model of the initial choice could not be constructed. (C) We systematically determined the length of the stimulus filter $k$ and lag of the last postspike basis vector by considering 168 pairs of these values for a network with all 7 cells. We chose the pair that gave the smallest summed negative log likelihood (i.e., the largest maximum likelihood) after fitting to 10 segments of time, each of length of approximately $2sec$. The magenta oval indicates our choice ($180ms$ peak for last basis vector of $h$ corresponds to a total lag of $240ms$; see Figure 1(B); the magenta oval with dashed outline had a similar log likelihood but required more computational resources.

Figure A2:

The $l1$-norm of the error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=2$ (top row) and $N=3$ (middle row), to $N=4$ (bottom row); we ensured $N$ cells were selected for each network. See section 4 for how random samples were generated. In all cases, a quadratic regression fit shows an optimal level of heterogeneity in the domain. With $N=3,4$ for postspike filter heterogeneity, the optimal levels are for smaller values of heterogeneity.

Figure A2:

The $l1$-norm of the error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=2$ (top row) and $N=3$ (middle row), to $N=4$ (bottom row); we ensured $N$ cells were selected for each network. See section 4 for how random samples were generated. In all cases, a quadratic regression fit shows an optimal level of heterogeneity in the domain. With $N=3,4$ for postspike filter heterogeneity, the optimal levels are for smaller values of heterogeneity.

Figure A3:

Similar to Figure 8 but with the remaining network sizes ($N=5,6,7$). The $l1$-norm of the error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=5$ (top row) and $N=6$ (middle row), to $N=7$ (bottom row); we ensured $N$ cells were selected for each network. See section 4 for how random samples were generated. In all cases, a quadratic regression fit shows an optimal level of heterogeneity in the domain.

Figure A3:

Similar to Figure 8 but with the remaining network sizes ($N=5,6,7$). The $l1$-norm of the error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=5$ (top row) and $N=6$ (middle row), to $N=7$ (bottom row); we ensured $N$ cells were selected for each network. See section 4 for how random samples were generated. In all cases, a quadratic regression fit shows an optimal level of heterogeneity in the domain.

Figure A4:

The Hessian-weighted error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=2$ (top row) and $N=3$ (middle row), to $N=4$ (bottom row); we ensured $N$ distinct cells were selected for each network. See section 4 for how random samples were generated. In almost all cases, a quadratic regression fit shows an optimal level of heterogeneity in the domain; the exceptions are for postspike filter heterogeneity for $N=2,3,4$ (middle column).

Figure A4:

The Hessian-weighted error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=2$ (top row) and $N=3$ (middle row), to $N=4$ (bottom row); we ensured $N$ distinct cells were selected for each network. See section 4 for how random samples were generated. In almost all cases, a quadratic regression fit shows an optimal level of heterogeneity in the domain; the exceptions are for postspike filter heterogeneity for $N=2,3,4$ (middle column).

Figure A5:

Similar to Figure A4 but with the remaining network sizes ($N=5,6,7$). The Hessian-weighted error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=5$ (top row) and $N=6$ (middle row), to $N=7$ (bottom row); we ensured $N$ distinct cells were selected for each network. See section 4 for how random samples were generated. In all but one case, a quadratic regression fit shows an optimal level of heterogeneity in the domain; the exception is with postspike filter heterogeneity for $N=5$ (middle column, top row).

Figure A5:

Similar to Figure A4 but with the remaining network sizes ($N=5,6,7$). The Hessian-weighted error as a function of all three forms of heterogeneity: stimulus filter (left column), postspike filter (middle column), and bias heterogeneity (right column). Here the network sizes ranged from $N=5$ (top row) and $N=6$ (middle row), to $N=7$ (bottom row); we ensured $N$ distinct cells were selected for each network. See section 4 for how random samples were generated. In all but one case, a quadratic regression fit shows an optimal level of heterogeneity in the domain; the exception is with postspike filter heterogeneity for $N=5$ (middle column, top row).

Figure A6:

When using Pearson's correlation after prewhitening as a measure of decoding accuracy, there is an optimal intermediate level of heterogeneity for all types, for $N=2,3,4$, with only one exception: bias heterogeneity for $N=2$ (right column, top row).

Figure A6:

When using Pearson's correlation after prewhitening as a measure of decoding accuracy, there is an optimal intermediate level of heterogeneity for all types, for $N=2,3,4$, with only one exception: bias heterogeneity for $N=2$ (right column, top row).

Figure A7:

Similar to Figure A6 but with larger network sizes ($N=5,6,7$). There is an optimal level of heterogeneity in the domain in all cases except for postspike filter heterogeneity with $N=5$ (middle column, top row).

Figure A7:

Similar to Figure A6 but with larger network sizes ($N=5,6,7$). There is an optimal level of heterogeneity in the domain in all cases except for postspike filter heterogeneity with $N=5$ (middle column, top row).

## Notes

1

$l1$-norm is the mean of the unweighted absolute value of the difference between the decoded stimuli and the actual stimuli.

2

GLM filter lengths and associated parameters were determined systematically; parameters were varied, and we chose the parameters with largest maximum likelihood. See Figure A1C and accompanying text.

3

Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is the quasi-Newton algorithm used in Matlab.

4

The 180 ms is the time for the last basis vector used for $h$, resulting in a total postspike filter length of $240ms$.

5

ACF and PACF are obtained via the autocorr and parcorr functions in Matlab's Econometrics Toolbox.

## Acknowledgments

C.L. is supported13 by a grant from the Simons Foundation (355173). We thank Gary Marsat (WVU) for generously providing us the data from experiments.

## References

Ahn
,
J.
,
Kreeger
,
L.
Lubejko
,
S.
,
Butts
,
D.
, &
MacLeod
,
K.
(
2014
).
Heterogeneity of intrinsic biophysical properties among cochlear nucleus neurons improves the population coding of temporal information
.
Journal of Neurophysiology
,
111
(
11
),
2320
2331
.
Bastian
,
J.
(
1986
).
Gain control in the electrosensory system mediated by descending inputs to the electrosensory lateral line lobe
.
Journal of Neuroscience
,
6
(
2
),
553
562
.
Bol
,
K.
,
Marsat
,
G.
,
Harvey-Girard
,
E.
,
Longtin
,
A.
, &
Maler
,
L.
(
2011
).
Frequency-tuned cerebellar channels and burst-induced LTD lead to the cancellation of redundant sensory inputs
.
Journal of Neuroscience
,
31
(
30
),
11028
11038
.
Bol
,
K.
,
Marsat
,
G.
,
Mejías
,
J. F.
,
Maler
,
L.
, &
Longtin
,
A.
(
2013
).
Modeling cancelation of periodic inputs with burst-STDP and feedback
.
Neural Networks
,
47
,
120
133
.
Brockwell
,
P. J.
,
Davis
,
R. A.
, &
Calder
,
M. V.
(
2002
).
Introduction to time series and forecasting
.
Berlin
:
Springer
.
Chacron
,
M.
,
Maler
,
L.
, &
Bastian
,
J.
(
2005
).
Feedback and feedforward control of frequency tuning to naturalistic stimuli
.
Journal of Neuroscience
,
25
(
23
),
5521
5532
.
Chelaru
,
M. I.
, &
Dragoi
,
V.
(
2008
).
Efficient coding in heterogeneous neuronal populations
. In
Proceedings of the National Academy of Sciences
,
105
(
42
),
16344
16349
.
Dayan
,
P.
, &
Abbott
,
L.
(
2001
).
Theoretical neuroscience: Computational and mathematical modeling of neural systems
.
Cambridge, MA
:
MIT Press
.
Ecker
,
A. S.
,
Berens
,
P.
,
Tolias
,
A. S.
, &
Bethge
,
M.
(
2011
).
The effect of noise correlations in populations of diversely tuned neurons
.
Journal of Neuroscience
,
31
(
40
),
14272
14283
.
Gjorgjieva
,
J.
,
Meister
,
M.
, &
Sompolinsky
,
H.
(
2019
).
Functional diversity among sensory neurons from efficient coding principles
.
PLOS Computational Biology
,
15
(
11
), e1007476.
Hofmann
,
V.
, &
Chacron.
M. J.
, (
2020
).
Neuronal on-and off-type heterogeneities improve population coding of envelope signals in the presence of stimulus-induced noise
.
Scientific Reports
,
10
(
1
),
1
16
.
Huang
,
C. G.
,
Zhang
,
Z. D.
, &
Chacron
,
M. J.
(
2016
).
Temporal decorrelation by SK channels enables efficient neural coding and perception of natural stimuli
.
Nature Communications
,
7
, 11353.
,
J.
,
McKenzie
,
S.
,
English
,
D. F.
,
Hagens
,
O.
, &
Ostojic
,
S.
(
2019
).
Inferring and validating mechanistic models of neural microcircuits based on spike-train data
.
Nature Communications
,
10
(
1
),
1
17
.
Ly
,
C.
(
2015
).
Firing rate dynamics in recurrent spiking neural networks with intrinsic and network heterogeneity
.
Journal of Computational Neuroscience
,
39
,
311
327
.
Ly
,
C.
, &
Marsat
,
G.
(
2018
).
Variable synaptic strengths controls the firing rate distribution in feedforward neural networks
.
Journal of Computational Neuroscience
,
44
,
75
95
.
Marsat
,
G.
, &
L.
Maler
, (
2010
).
Neural heterogeneity and efficient population codes for communication signals
.
Journal of Neurophysiology
,
104
(
5
),
2543
2555
.
Mejias
,
J.
, &
A.
Longtin
(
2012
).
Optimal heterogeneity for coding in spiking neural networks
.
Physical Review Letters
.
108
(
22
), 228102.
Mejias
,
J. F.
,
Marsat
,
G.
,
Bol
,
K.
,
Maler
,
L.
, &
Longtin
,
A.
(
2013
).
Learning contrast-invariant cancellation of redundant signals in neural systems
.
PLOS Computational Biology
,
9
(
9
), e1003180.
Melanson
,
A.
,
Mejias
,
J. F.
,
Jun
,
J. J.
,
Maler
,
L.
, &
Longtin
,
A.
(
2017
).
Nonstationary stochastic dynamics underlie spontaneous transitions between active and inactive behavioral states
.
Eneuro
,
4
(
2
).
Metzen
,
M. G.
, &
Chacron
,
M. J.
(
2015
).
Neural heterogeneities determine response characteristics to second-, but not first-order stimulus features
.
Journal of Neuroscience
,
35
(
7
),
3124
3138
.
Montgomery
,
D. C.
(
2017
).
Design and analysis of experiments
.
Hoboken, NJ
:
Wiley
.
Myers
,
R. H.
,
Montgomery
,
D. C.
,
Vining
,
G. G.
, &
Robinson
,
T. J.
(
2012
).
Generalized linear models: With applications in engineering and the sciences
.
Hoboken, NJ
:
Wiley
.
Ostojic
,
S.
, &
Brunel
,
N.
(
2011
).
From spiking neuron models to linear-nonlinear models
.
PLOS Comput. Biol.
,
7
(
1
), e1001056.
,
K.
, &
Urban
,
N. N.
(
2010
).
Intrinsic biophysical diversity decorrelates neuronal firing while increasing information content
.
Nature Neuroscience
,
13
(
10
),
1276
1282
.
Paninski
,
L.
(
2004
).
Maximum likelihood estimation of cascade point-process neural encoding models
.
Network: Computation in Neural Systems
,
15
(
4
),
243
262
.
Park
,
I. M.
,
Meister
,
M. L.
,
Huk
,
A. C.
, &
Pillow
,
J. W.
(
2014
).
Encoding and decoding in parietal cortex during sensorimotor decision-making
.
Nature Neuroscience
,
17
(
10
),
1395
1403
.
Pillow
,
J.
(
2007
). Likelihood-based approaches to modeling the neural code. In
K.
Doya
,
S.
Ishii
,
A.
Pouget
, &
R.
Rao
(Eds.),
Bayesian brain: Probabilistic approaches to neural coding
(pp.
53
70
).
Cambridge, MA
:
MIT Press
.
Pillow
,
J. W.
,
,
Y.
, &
Paninski
,
L.
(
2011
).
Model-based decoding, information estimation, and change-point detection techniques for multineuron spike trains
.
Neural Computation
,
23
(
1
),
1
45
.
Pillow
,
J. W.
,
Shlens
,
J.
,
Paninski
,
L.
,
Sher
,
A.
,
Litke
,
A. M.
,
Chichilnisky
,
E.
, &
Simoncelli
,
E. P.
(
2008
).
Spatio-temporal correlations and visual signalling in a complete neuronal population
.
Nature
,
454
(
7207
),
995
999
.
René
,
A.
,
Longtin
,
A.
, &
Macke
,
J. H.
(
2020
).
Inference of a mesoscopic population model from population spike trains
.
Neural Computation
,
32
(
8
),
1448
1498
.
Shamir
,
M.
, &
Sompolinsky
,
H.
(
2006
).
Implications of neuronal diversity on population coding
.
Neural Computation
,
18
(
8
),
1951
1986
.
Simmonds
,
B.
, &
Chacron
,
M. J.
(
2015
).
Activation of parallel fiber feedback by spatially diffuse stimuli reduces signal and noise correlations via independent mechanisms in a cerebellum-like structure
.
PLOS Computational Biology
,
11
(
1
), e1004034.
Simoncelli
,
E. P.
,
Paninski
,
L.
, &
Pillow
,
J.
(
2004
). Responses with stochastic stimuli. In
M. S.
Gazzaniga
(Ed.),
The cognitive neurosciences III
(pp.
327
338
).
Cambridge, MA
:
MIT press
.
Sompolinsky
,
H.
,
Yoon
,
H.
,
Kang
,
K.
, &
Shamir
,
M.
(
2001
).
Population coding in neuronal systems with correlated noise
.
Physical Review E
,
64
(
5
), 051904.
Tripathy
,
S. J.
,
,
K.
,
Gerkin
,
R. C.
, &
Urban
,
N. N.
(
2013
).
Intermediate intrinsic diversity enhances neural population coding
. In
Proceedings of the National Academy of Sciences
,
110
(
20
),
824
8253
.
Vrbik
,
J.
(
2005
).
Population moments of sampling distributions
.
Computational Statistics
,
20
(
4
),
611
621
.
Wendling
,
K.
, &
Ly
,
C.
(
2019
).
Firing rate distributions in a feedforward network of neural oscillators with intrinsic and network heterogeneity
.
Mathematical Biosciences and Engineering
,
16
,
2023
2048
.
Yarrow
,
S.
,
Challis
,
E.
, &
Seriès
,
P.
(
2012
).
Fisher and Shannon information in finite neural populations
.
Neural Computation
,
24
(
7
),
1740
1780
.