## Abstract

To understand neural activity, two broad categories of models exist: statistical and dynamical. While statistical models possess rigorous methods for parameter estimation and goodness-of-fit assessment, dynamical models provide mechanistic insight. In general, these two categories of models are separately applied; understanding the relationships between these modeling approaches remains an area of active research. In this letter, we examine this relationship using simulation. To do so, we first generate spike train data from a well-known dynamical model, the Izhikevich neuron, with a noisy input current. We then fit these spike train data with a statistical model (a generalized linear model, GLM, with multiplicative influences of past spiking). For different levels of noise, we show how the GLM captures both the deterministic features of the Izhikevich neuron and the variability driven by the noise. We conclude that the GLM captures essential features of the simulated spike trains, but for near-deterministic spike trains, goodness-of-fit analyses reveal that the model does not fit very well in a statistical sense; the essential random part of the GLM is not captured.

## 1 Introduction

As recordings of neural activity become increasingly sophisticated, the resulting data become increasingly complex. Making sense of these data often requires more sophisticated approaches than visualization and simple summary statistics. One more advanced approach is the development and application of a model. Models serve both to characterize observed data and summarize the collective scientific knowledge of the brain. In neuroscience, as in many other fields, these models are typically segregated into two categories: dynamical or mechanistic models, and statistical models. Dynamical models arise as an application of mathematical rules motivated by biophysical laws. These models tend to be deterministic—if not in practice, at least in spirit—and provide a mechanistic explanation for many dynamic brain activities. Statistical models are typically designed to capture data structure; these models often do not rely on neuronal biophysics.

The generalized linear model (GLM) has been an essential part of modern statistics since its introduction by Nelder and Wedderburn (1972), and today these models are ubiquitous in statistical analyses in many differing fields. This includes biological processes, where in recent years, GLMs have been used to describe coding properties and history dependence in neural spiking data (Kass & Ventura, 2001; Pillow et al., 2008; Sarma et al., 2012; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005). The popularity of GLMs stems from the many advantageous features of this model class. The model defines a link function between the predictors and responses that ensures that the negative log likelihood is convex, allowing for fast optimal estimation of parameters while still admitting flexible model capabilities, by utilizing general basis functions (McCullagh & Nelder, 1989). As with statistical models in general, the GLM quantifies both predictable and unpredictable structure in data and possesses efficient procedures for parameter estimation and model diagnostics and powerful tools to assess goodness of fit. An advantage of including uncertainty as an intrinsic model component in the design stage is that this random component can, to some extent, compensate for an incomplete specification of the factors influencing the observed data.

Dynamical models have been a pillar in data modeling for neuroscientists since Lapicque's 1907 integrate-and-fire model (Brunel & van Rossum, 2007). Contrary to statistical models, these models typically focus on deterministic processes and are often built to explicitly represent mechanistic features of the data of interest. As such, dynamical models are often biophysically meaningful by construction, which may not be the case for a statistical model. Dynamical models are diverse, ranging from the biophysically realistic model of Hodgkin and Huxley (1952) to the more abstract model of Izhikevich (2003). The latter implements a simple mathematical design, while still producing realistic neural behaviors, such as bursting. Because of these properties, the Izhikevich model is a common choice, for example, in large-scale simulations of neurons with different behavior (Izhikevich & Edelman, 2008) and to simulate the impact of current input to large cortical networks (Ali, Sellers, & Fröhlich, 2013).

Historically, dynamical models have been used to examine deterministic or aggregate features of observed neuronal behavior. While dynamical models are often interpretable in terms of biophysical features, estimating parameters from individual spike trains is a complicated task. Most mechanistic models possess multiple parameters, which typically remain experimentally unconstrained. For example, the Hodgkin-Huxley (1952) model possesses at least seven parameters (e.g., capacitance, reversal potentials, and maximal conductances for the sodium, potassium, and leak currents) with wide ranges of possible values. In computational neuroscience, a common procedure for estimating these parameters is hand-tuning to produce simulated model dynamics that qualitatively match the desired neuronal activity (Prinz, Billimoria, & Marder, 2003). Hand-tuning approaches usually require a great deal of time and expertise (Traub, Wong, Miles, & Michelson, 1991; Nadim, Olsen, de Schutter, & Calabrese, 1995; Vanier & Bower, 1999). Moreover, once a set of suitable parameters is found, it is often unclear whether the solution is unique or whether other model formulations exist that are compatible with the data (Prinz, Bucher, & Marder, 2004). Statistical models are often used to estimate features and describe associations but may not be directly physically interpretable.

Both modeling paradigms—dynamical and statistical—provide distinct advantages and disadvantages. Ideally, methods would exist to leverage the advantages of both approaches while mitigating their weaknesses. For example, a unified approach would allow researchers to better interpret statistical model results in terms of mechanistic features while exploiting the mathematical theory of statistical models. This theory includes tools of parameter estimation, but also rigorous procedures to validate the model against data. The aim of this letter is to investigate how simple GLMs perform when used to describe the spiking patterns obtained from simulated Izhikevich neurons with noisy input. The noise is included to resemble realistic physical observations and, more important, introduce and control variability of the data.

Previous work has analyzed the relationship between integrate-and-fire models and statistical techniques (Brunel & Latham, 2003; Paninski, 2006; Hertäg, Durstewitz, & Brunel, 2014). In these approaches, integrate-and-fire dynamical models were augmented by a stochastic component, and a statistical approach was used to estimate the firing rate. Here we pursue a different strategy, starting with a well-known class of history-dependent statistical models and analyzing the models' capabilities in capturing general features of various spike train patterns generated from a dynamical model. The class of GLMs in this exposition is very flexible, and, contrary to a dynamical model, there are no model assumptions regarding the specific mechanisms related to the physical processes that generate the spike train data, such as refractoriness. Recently, Weber and Pillow (in press) showed that the GLM framework replicates multiple spiking patterns from the Izhikevich model, but to the authors' knowledge, no work has been presented that analyzes how well the GLM class captures features of different spike patterns under the varying influence of noise.

In this letter, we show that the performance of the particular class of multiplicatively separable, history-dependent GLMs varies with the amount of noise added to the input and with the large-scale firing properties of the Izhikevich neuron. This class of GLMs is commonly used in spike train modeling (Pillow et al., 2008; Macke et al., 2011; Ahmadian, Pillow, & Paninski, 2011; Latimer, Chichilnisky, Rieke, & Pillow, 2014). We examine how well the GLMs capture the intrinsic features of the individual neuron types, such as tonic spiking and bursting, as well as the variability in the individual spike trains originating from the noisy input. We first present the simulation model and define how variation is injected through both the input and the model parameters. Then the GLM design is outlined along with tools used for model assessment. In a series of numerical analyses, we demonstrate how GLMs behave for various levels of noise and different deterministic model features. We discuss in what ways the noise may affect interpretation of the GLM and the use of this model class for spike train modeling by evaluating the goodness of fit of the models in terms of capturing both predictable and unpredictable structure. Finally, we highlight some of the shortcomings of simple multiplicatively separable GLMs and discuss extensions to the standard model that can mitigate some of the limitations identified in this letter.

## 2 Methods

### 2.1 Simulation of Neuron Activity

The dynamics in equation 2.1 generates spikes whenever passes a fixed threshold of 30. Immediately following a spike at time (where ), both variables are reset at time ; is reset to a fixed value given by the parameter , and is increased by an amount . The parameters determine the spike behavior; the parameters and act as a decay rate and a sensitivity parameter, respectively, and the parameters and determine how the variables are reset following a spike.

In order to introduce variation in simulations of equation 2.1, we let , where are constants and denotes a standard white noise process (Øksendahl, 2007). The parameter controls the level of noise in the process and, in turn, the variability of spiking behavior. We may interpret as the noise in the input driving the neuron. Another source of variability comes from the different parameter values used to simulate the spike trains. In this letter, we model six types of neurons that produce distinct firing patterns (Izhikevich, 2003) when the input is constant. When , the simulation model 2.1 is deterministic, and the neurons behave completely regularly, while for , the model evolves stochastically. Assuming an Itô interpretation, model 2.1 was simulated using a Euler-Maruyama scheme with time step ms, initial conditions , and varying parameters , and . We note that the 0.1 ms discretization is less than that used in Izhikevich (2003) and ensures that at most one spike occurs per discrete time bin. These simulations were then used to obtain spike trains. Although the Izhikevich neuron lacks the detailed biophysical mechanisms of more complex neuron models, this simple model serves as an excellent test bed for fitting the GLMs to diverse, realistic spike train behaviors and exemplifying the issues in interpreting the results.

### 2.2 Model Design

The approximate log likelihood, equation 2.3, shows that choosing the form of the intensity is a crucial step of the model design. To include history dependence, we include a trailing history of the past spike times, and in order to reduce the dimensionality of the model, we also introduce indicator basis functions that account for the spike windows in preceding time bins, wider than the observation bins of width . In order to simplify the model formulation, we assume that the influence of the previous spikes on the intensity is multiplicatively separable, a common model assumption when working with GLMs. Thus, when modeling the intensity process, we deal with a sum of components.

Defining the (exponential) filter , this acts as the modulation of at lagged time bins .

### 2.3 Penalized GLM Regression

Data with negligible variability can lead to convergence issues of parameter estimates in equation 2.5. If there is little or no variation in the data, there is a high chance of including perfect predictors in the model that will cause instability in the estimation of model parameters since the likelihood surface will be close to flat in certain directions (see Wedderburn, 1976). Using a penalized regression is one way to handle these convergence issues and keep the parameter estimates finite.

^{1}determines how strong the shrinkage effect is, and implies that the optimization depends on some fixed . Choosing is a crucial part of using penalized regression, and therefore the choice should agree with the aim of the analysis. For the purposes presented in this exposition, we settled on two options that both depend on a goodness-of-fit test of an estimated model (see section 2.5). The first choice was simply to choose such that the KS statistic was minimized; we refer to this choice as

*optimal KS*, where KS denotes the Kolmogorov-Smirnov test presented in section 2.5. The other choice was to choose the maximum penalization, , such that the goodness-of-fit -value was insignificant with respect to a predetermined threshold denoting the significance level. In what follows, we chose a threshold significance level of 0.05. We refer to the second choice for as

*maximum*. Obviously the two differ in behavior. The first choice of selects models with a better goodness of fit, whereas the second selects sparser models due to the choice of penalization in equation 2.6. We use the second choice (maximum ) as the default to impose sparsity. However, to illustrate certain points, we also refer to models chosen by optimal KS in some parts. In the case that no -values were above the threshold or if an optimal value could not be determined, the value of was set as the smallest values tried in the model estimation procedure: . This occurred for 66% of the simulations for nonbursting neurons and 12% of bursting neurons. We note that still includes a small penalty so that the divergence of estimates is avoided, but fewer parameters are set to zero with the LASSO regression.

### 2.4 Approximate Covariance Matrix

### 2.5 Goodness of Fit

In order to evaluate a statistical model, one must take into account both its structural and the random components. Intuitively one can think of this as measuring the model's ability to capture both the structure of the observed data (structural component) and generalizability by compensating for features that are not accounted for in the structural component through the statistical distribution (random component). Thus, with deterministic input, it is possible that a statistical model can describe the observed features to near perfection. However, such a model will rarely predict the features of some other data with only slight variations to the first. This lack of generalizability, caused by an inadequate fit of the random component in deterministic settings, will lead to poor statistical model diagnostics.

In addition to the KS statistic, equation 2.8, the relative deviance was used to evaluate the model fit. While the statistic measures how the empirical model deviates from a theoretical model, the relative deviance can reveal the amount of structure present in the data. The relative deviance measures how an estimated model captures data features. The smallest model deviance for a given data set occurs with the saturated model, where the number of parameters equals the number of data points. Due to the equal number of parameters and data points, there are no data left to estimate data variability. Therefore, the saturated model will completely describe the observed data set but will not generalize well to another data set. As such, we could say that the saturated model is the “most structural” we can define for a specific data set because it is purely descriptive. The other extreme is the null model, which is the one-parameter model including an intercept, , only. For the null model, only a single parameter is estimated, in this case defining a homogeneous Poisson process with no influence of past spiking. It is thus an example of the “most stochastic” model we can define, in the sense that it maximizes the entropy of the spike counting process. The relative deviance measures where an estimated model lies in the spectrum between these two extremes, with a relative deviance of zero corresponding to a fit equal to the saturated model and a relative deviance of one corresponding to a fit equal to the null model.

It is important to note that for a traditional deviance analysis, the goodness of fit of multiple models is compared by assessing their deviance on the same data set. Here, we examine for one particular class of GLMs the relative deviance across different data sets with different levels of input current noise. As such, a lower relative deviance of a different noise level should not be interpreted as an improved fit, but as one whose description of the data structure is closer to that of a saturated model.

## 3 Results

Neuron Type . | . | . | . | . | . |
---|---|---|---|---|---|

Tonic spiking | 0.02 | 0.20 | 65 | 6 | 14 |

Phasic spiking | 0.02 | 0.25 | 65 | 6 | 1 |

Tonic bursting | 0.02 | 0.20 | 50 | 2 | 10 |

Phasic bursting | 0.02 | 0.25 | 55 | 0.05 | 1 |

Mixed mode | 0.02 | 0.20 | 55 | 4 | 10 |

Spike frequency adaptation | 0.01 | 0.20 | 65 | 8 | 20 |

Neuron Type . | . | . | . | . | . |
---|---|---|---|---|---|

Tonic spiking | 0.02 | 0.20 | 65 | 6 | 14 |

Phasic spiking | 0.02 | 0.25 | 65 | 6 | 1 |

Tonic bursting | 0.02 | 0.20 | 50 | 2 | 10 |

Phasic bursting | 0.02 | 0.25 | 55 | 0.05 | 1 |

Mixed mode | 0.02 | 0.20 | 55 | 4 | 10 |

Spike frequency adaptation | 0.01 | 0.20 | 65 | 8 | 20 |

Source: Parameter values and neuron type from Izhikevich (2004).

For each type of neuron, 10 spike trains were simulated for 29 values of evenly distributed in the interval . Hence, the simulations ranged from almost deterministic to almost completely random spiking (). As is increased, the intrinsic features of the individual neuron types become progressively noise driven and indistinguishable. When , the spiking activity is based more on the randomness of the input than the model parameters. Each simulation consisted of observations at times , with time step ms, corresponding to 20 seconds of observations for each simulation. The spike trains were derived from the simulated voltage trajectories by determining when . Figure 1 presents the simulated spike trains for each type of neuron for varying values. An penalized regression for a Poisson GLM of the form 2.5 was fitted for each simulated spike train, where the value of in equation 2.6 was based on either optimal KS or maximum (see section 2.3), depending on the analysis.

The history dependence of the GLM was set to 100 ms, corresponding to time steps for a discretization step of ms. With the indicator width set to 1 ms (), the model, equation 2.5, had a maximum of 100 1 parameters. Some of the parameters are shrunk to zero by the penalized regression procedure, and as such, the resulting filters consisted of fewer than 101 nonzero parameters.

The numerical analysis was carried out with the statistical programming environment (R Core Team, 2016) using the package glmnet to perform penalized regression for GLMs.

### 3.1 Tonic Spiking and Bursting with Intermediate Noise

We first present a more detailed analysis of the tonic spiking and tonic bursting neurons for a single value of to illustrate the model fitting and the goodness-of-fit analyses. These neuron types were chosen specifically to investigate how the GLM handles the regular behavior of a tonic spiking neuron, which possesses a unimodal interspike interval (ISI) distribution, versus the switching behavior of a bursting neuron, which produces a bi-modal ISI distribution. The value of was chosen such that the neurons display both a predictable structure of interest as well as variability, as evident from Figures 1a and 1c. The analyses for the two types of neurons are presented in Figures 2 and 3 for the tonic spiking and tonic bursting neurons, respectively. These figures display an interval of simulated spiking activity for the neuron, with the estimated intensity function below, the corresponding histogram of ISIs, the histogram of rescaled spike times, the estimated filter, a KS plot of versus with approximate 95% confidence bounds, and a plot of the residual process . Penalization was set by maximum with a KS statistic significance threshold above 0.05 for both analyses.

The estimated intensity for the tonic spiking neuron (see Figure 2b) displays peaks that coincide with the spikes of the potential (see Figure 2a). There is a rise in the intensity prior to spiking and an instantaneous drop immediately following an observed spike, which implies that the model captures the predictable structure well. The ISI histogram for the tonic spiking neuron (see Figure 2d) is unimodal with a mean of 26.6 ms, and the width of the histogram implies some variability between spike times, which is expected for the choice of . The estimated filter (see Figure 2e) displays a refractory period up to 20 ms and shows peaks around multiples of 26.6 ms, indicating that the model captures the regularity of spiking; this result agrees well with the type of neuron considered. The multiple peaks in the filter suggest that at this level of , the current probability of spiking is well predicted by the most recent spike as well as the previous two spikes. The rescaled spike time histogram (see Figure 2f), overlaid with an distribution (red line), shows that the model agrees well with the theoretical distribution, which is further supported by the KS plot (see Figure 2g). Furthermore, the KS statistic, , yields a -value of 0.088, in accordance with the choice of such that the -value is . For reference, using optimal KS instead to set , the KS statistic was 0.022 with a -value of 0.88. Inspection of the residuals (first quarter of the total residual process is shown in Figure 2c) does not give rise to concerns regarding time-dependent trends.

For the tonic bursting model in Figure 3, the estimated intensity (see Figure 3b) shows that bursting is well captured; we note that increases in the estimated intensity occur just before the times of action potential bursts (see Figure 3a). The rise of the intensity before a burst varies slightly between individual bursts, and the drop after a period of bursting is not as sharp as for the case for tonic spiking. These observations suggest that it is more complicated to capture the beginnings and ends of bursts than single spikes. We note that the filter (see Figure 3e) does not possess a second peak between 40 ms and 50 ms, as seen in the ISI histogram (see Figure 3d). Instead, there is an inflection in the inhibitory trough of the filter starting around 40 ms. When convolved with the past bursting activity this inflection leads to a rapid rise in intensity approximately 40 ms after the final spike in the previous burst. The KS plot in Figure 3g and the residual process in Figure 3c provide further evidence that the GLM captures the structure in the bursting data well.

The rescaled ISI histogram in Figure 3d agrees with the distribution, and the KS statistic is , which is less than the value for the tonic spiking neuron: . However, the corresponding -value is significantly lower at 0.008, because for the tonic bursting neuron, the number of observed spikes is compared to for the tonic spiking neuron. The larger number of spikes results in a much lower -value for tonic bursting, since the KS test is sensitive to the number of observed spikes. Using optimal KS to set for the tonic bursting did not improve the model fit; the KS statistic was 0.040 with a -value of 0.009.

### 3.2 Spike History Filters

We now turn to a more general analysis of the models, across a wider range of firing properties and noise levels, . Figure 4 presents the estimated GLM filters for combinations of simulated neuron types and values. The penalization parameter was set by maximum with a threshold of 0.05 (see section 2.3). Filters for regularly spiking neurons, such as the tonic spiking neuron, Figure 4a, and the spike frequency adaption neuron, Figure 4f, display peaks in the filters for low to intermediate values of . For the filters approach collections of delta functions, with mass concentrated around multiples of the deterministic (ISI) period. For example, for the deterministic tonic spiking neuron, the ISI is ms; hence, the peaks in the filter appear near 27, 54, and 81 ms in agreement with the regular ISIs of the tonic spiking neuron. For the bursting neurons, Figures 4c and 4d, there is a narrow peak at approximately 2 ms to 5 ms, except for very small . Decreases in the filter follow the peaks at short times, indicating that refractory periods are captured by the model for all combinations of type and . However, the refractory period varies as increases: either increasing (for tonic spiking) or decreasing (phasic spiking), or remaining approximately unchanged (for tonic bursting). This indicates a minimal absolute refractory period for bursting neurons that is independent of the input noise. Common to all filters is a flattening of as increases, most visibly for regularly spiking neurons (tonic spiking and spike frequency adaption).

This implies that by increasing the injected noise , the spiking approaches a homogeneous Poisson process, and it becomes progressively difficult to extract mechanistic structure, such as the interval of regular spiking (see Figure 4a) or interbursting activity.

### 3.3 Goodness of Fit

KS tests were performed for combinations of neuron type and to assess each model's goodness of fit. Figure 5 presents an overview of these results. The plot displays the , where is the KS statistic (see equation 2.8) and the relative deviance, respectively. The statistics, , were log-transformed to emphasize trends as varies. Furthermore, in order to present implications of changing , the penalization parameter was set by optimal KS, such that different 's could lead to different penalizations. Had we instead chosen penalization by maximum , the KS statistics would not reveal an optimal range of .

The for all simulated neuron types in Figure 5 increases quickly as increases from 0 to 2, suggesting that for low values of , the GLM has difficulty in capturing the near-deterministic spiking distribution but does well at capturing this distribution once some variability is present. The corresponding -values are above 0.05 for for all neuron types, except phasic bursting and tonic bursting. For phasic bursting, the -values exceed 0.05 for , whereas for tonic bursting, the -values never exceed 0.05. Except for mixed mode and spike frequency adaption, all types display optimal values in a range of 's, where the mentioned types seem to settle at a plateau for . The optimal range, both the width and location, of depends on the type of neuron.

For the relative deviances, we chose to use maximum . The increasing trend in the relative deviance suggests that for low values of , there is more predictable structure captured by the model, whereas for higher noise levels, the estimated model approaches a homogeneous Poisson model, which includes only a baseline firing parameter. This general trend is consistent with our intuition for the model behavior; as the randomness of the input overshadows features of the Izhikevich model dynamics, the spike trains become better described by a homogeneous Poisson process.

We conclude from these two assessments that an optimal range for exists in which the GLM captures both the variability and predictable structure of the spike trains simulated using the noisy Izhikevich neurons. For very low values of , there is little variability to be captured by the GLM, while for very high values of , there is little predictable structure beyond the baseline firing rate. However, a general optimal range of cannot be explicitly defined as it depends on the type of neuron. Thus, multiple goodness-of-fit measures should be used when analyzing data to assess the degree to which the model fits the variability (e.g., KS test) and the predictable structure in the model (e.g., relative deviance).

### 3.4 Model Structure for Extreme Values

The results in Figure 4 display the estimated filter parameters as . We note that the maximum estimated filter values are capped at a value of 3 in Figure 4, so the full extent of the peak values is not directly visualized. The peaks, increasing in height and decreasing in width, resemble delta functions as the noise level approaches zero. In the other extreme of large , most of the visible structure in the filter vanishes, except for a refractory period. Except in the case of bursting neurons, the filters flatten (i.e., approach 1) for large , implying that less neuron-specific structure is captured by the models. This finding is consistent with the results in Figure 5, where the relative deviances approach 1 (the null model) with low values of the corresponding KS statistics, indicating that the simulated spike trains are well approximated by a GLM with an intercept (baseline) term only, which corresponds to a homogeneous Poisson process.

The estimated correlation matrices (see section 2.4) for the tonic spiking neuron in Figure 6 reveal that for low values, there is a positive correlation (red) between parameters that are close to each other in the temporal dimension along the diagonal, but at the three peaks, the positive correlation extends further in time. There is negative correlation (blue) among variables that are far apart in the temporal dimension, but most interesting, the clusters of variables associated with each peak exhibit negative correlation.

The negative correlation between peak variables suggests that past spiking at these lags is highly correlated and that as predictors, they are largely redundant. We note that the neuron spike probability can be affected by the first, second, or third last-recorded spike, but it is the total accumulated effect of the past 100 ms that determines the current spike probability. As such, the filter should be evaluated as a whole rather than at individual points in time.

As , the correlation structure starts to disappear, and for higher values of , the correlation structure in the model is greatly reduced. Once more, this is in line with the findings in Figure 5 as models for very noisy data capture mostly baseline activity.

For the tonic bursting neuron, the correlation matrices (not shown) showed little or no structure beyond a few milliseconds off-diagonal. Based on Figure 4, this lack of correlation structure is not surprising. For higher values of , the filter captured only the increased probability of a spike with short delay, but as the remaining part of the filter approaches 1, the (irregular) interburst periods were not captured. Thus, any relatively strong dependence beyond the intraburst interval is not expected as increases. As in the case of the tonic spiking filters, the entire filter should be interpreted for tonic bursting. Thus, it is not only the time of the last spike that determines the current probability but the 100 ms history of spikes, which, in the case of bursting, can include multiple spikes in a burst or multiple burst periods, that affects the probability of spiking.

## 4 Discussion

In this letter, we have examined how well a commonly used class of GLMs performs when used to capture both predictable structure and variability of spike trains derived from simulated, noisy Izhikevich neurons. A useful model is one that allows for clear, simple interpretations, which depends critically on the form of the model. When referring to GLMs in this letter, we implicitly mean the specific class of multiplicatively separable history-dependent GLMs. These GLMs were designed to capture the influence of past spiking on the current firing intensity using a simple set of indicator basis functions and assuming that the influences of previous spikes are multiplicatively separable. This indicator basis is well suited to display the approximation of delta functions, when the noise level converges to 0 and the spike trains become increasingly deterministic. However, using indicator basis functions increases the number of parameters compared to other choices such as splines' bases and can lead to the problem of perfect separation. Therefore, it was necessary to use a penalized regression to fit the models with indicator basis functions for low noise levels where only a few parameters suffice to describe the structure. For this reason, we opted for an penalization, corresponding to LASSO regression, which shrinks redundant parameters to 0 by promoting sparsity. The implementation of the LASSO regression we used here utilized cyclical coordinate descent (Friedman et al., 2010), when optimizing the likelihood, whereas the standard GLM routine in R uses Fisher scoring to find maxima. Even when the penalization parameter was set to zero (i.e., unconstrained regression), these two methods can yield rather different results when the likelihood surface is nearly flat. However, even in such cases, the overall model predictions and diagnostics were indistinguishable.

As expected, the estimated filters resembled delta functions for low values of (see Figure 4). In general, the GLMs captured both predictable structure and variability from the input data. For near-deterministic spike trains, the variation is negligible, and as such, the interpretation and quality of the fit become highly sensitive to the choice of basis functions. Other commonly used basis functions, such as splines, which impose the smoothness of the influence of past spikes, are likely to lead to poorly fit models and incorrect interpretations when the spiking process is exactly or nearly deterministic. Although we focus here on simulations, we expect to encounter similar issues when applying the GLM (see equation 2.5) to analyze regular spike train activity recorded from a real neuron.

In contrast to low-noise regimes, spike trains simulated with a strong, noisy input signal produced spike patterns consistent with a homogeneous Poisson process (see Figure 1). In the corresponding filters for these highly stochastic spike trains, any structure besides a refractory period following an observed spike vanished. For tonic spiking, there was no positive modulation in very noisy regimes, whereas for tonic bursting, the intraburst interval at short delay was captured, while the interburst interval at longer delays was missing. This loss of structure was also evident from the relative deviance in Figure 5, which approached 1 as increased.

Our analyses suggest that multiple goodness-of-fit diagnostics are necessary to determine the extent to which statistical models capture the predictable structure and variability of neural spiking. For very low-noise regimes, measures that relate primarily to variability, such as the KS statistic, may indicate lack of fit, even for models that capture the predictable structure of the data well. When the random component is ill fitting, the model does not generalize well to data of similar, but slightly mismatching, structure, thus resulting in poor values of the KS measure. In general, goodness-of-fit measures should account for how well a model captures both the variability (e.g., KS test) and predictable structure (e.g., relative deviance) in the data.

Analyzing tonic spiking and tonic bursting neurons, we found that the GLM captures both types quite well, although the KS statistic remained significant for tonic bursting. This was despite the fact that the KS statistics for both types were of equal magnitude and driven by the higher number of observed spikes for the tonic bursting neuron compared to the tonic spiking neuron. However, by interpreting the estimated filters, we found that both exhibited a clear structure inherent to the neuron type with features that are intuitively understandable.

In conclusion, the results of this letter suggest that for the commonly used GLM for spike train data with multiplicatively separable history dependence, there is a range of input noise values for the Izhikevich neurons in which the GLM optimally captures both predictable structure and variability. This range depends on which properties are intrinsic to the spike train: intraburst intervals are captured even at high noise levels, while regular spiking and interburst spiking intervals are not captured at high noise levels. At low noise levels, the basis functions implemented here are delta functions, which can predict data structure but do not generalize well to data from slightly altered generative models. Thus, the choice of basis functions becomes crucial to the credibility of the model for spike trains with little or no variation. However, although an optimal range is evident from the results in this letter, extensions to the model formulation 2.5, such as equations 4.1 and 4.2, could possibly lead to improved model descriptions of specific neuron properties such as bursting and thus extend the optimal range. Expanding the optimal working range of the GLMs would be of interest, for instance, in order to classify neurons according to both type and noise level.

## Notes

^{1}

We chose here to denote the penalization by , rather than the standard , in order not to confuse the penalization factor in the model-fitting procedure with the intensity process, denoted .

## Acknowledgments

J.Ø. is a participant of the Dynamical Systems Interdisciplinary Network, University of Copenhagen. M.A.K. was partially supported by the NSF CAREER grant NSF-DMS 1451384. U.T.E. was partially supported by a grant from the Simons Foundation (320136).