## Abstract

One approach for understanding the encoding of information by spike trains is to fit statistical models and then test their goodness of fit. The time-rescaling theorem provides a goodness-of-fit test consistent with the point process nature of spike trains. The interspike intervals (ISIs) are rescaled (as a function of the model's spike probability) to be independent and exponentially distributed if the model is accurate. A Kolmogorov-Smirnov (KS) test between the rescaled ISIs and the exponential distribution is then used to check goodness of fit. This rescaling relies on assumptions of continuously defined time and instantaneous events. However, spikes have finite width, and statistical models of spike trains almost always discretize time into bins. Here we demonstrate that finite temporal resolution of discrete time models prevents their rescaled ISIs from being exponentially distributed. Poor goodness of fit may be erroneously indicated even if the model is exactly correct. We present two adaptations of the time-rescaling theorem to discrete time models. In the first we propose that instead of assuming the rescaled times to be exponential, the reference distribution be estimated through direct simulation by the fitted model. In the second, we prove a discrete time version of the time-rescaling theorem that analytically corrects for the effects of finite resolution. This allows us to define a rescaled time that is exponentially distributed, even at arbitrary temporal discretizations. We demonstrate the efficacy of both techniques by fitting generalized linear models to both simulated spike trains and spike trains recorded experimentally in monkey V1 cortex. Both techniques give nearly identical results, reducing the false-positive rate of the KS test and greatly increasing the reliability of model evaluation based on the time-rescaling theorem.

## 1. Introduction

One strategy for understanding the encoding and maintenance of information by neural activity is to fit statistical models of the temporally varying and spike history–dependent spike probability (conditional intensity function) to experimental data. Such models can then be used to deduce the influence of stimuli and other covariates on the spiking. Numerous model types and techniques for fitting them exist, but all require a test of model goodness of fit, which is crucial to determine a model's accuracy before making inferences from it. Any measure of goodness of fit to spike train data must take the binary nature of such data into account (e.g., discretized in time, a spike train is a series of zeros and ones). This makes standard goodness-of-fit tests, which often rely on assumptions of asymptotic normality, problematic. Further, typical distance measures such as the average sum of squared deviations between recorded data values and estimated values from the model often cannot be computed for point process data.

One technique, proposed by Brown, Barbieri, Ventura, Kass, and Frank (2001) for checking the goodness of fit of statistical models of neural spiking, makes use of the time-rescaling theorem. This theorem states that if the conditional intensity function is known, then the interspike intervals (ISIs) of any spike train (or indeed any point process) can be rescaled so that they are Poisson with unit rate, that is, independent and exponentially distributed. Checking goodness of fit is then easily accomplished by comparing the rescaled ISI distribution to the exponential distribution using a Kolmogorov-Smirnov (KS) test (Press, Teukolsky, Vetterling, & Flannery, 2007; Massey, 1951). The beauty of this approach is not only its theoretical rigor, but also its simplicity, as the rescaling requires only the calculation of a single integral. Further, a second transformation takes the exponentially distributed rescaled times to a uniform distribution, and the KS test can then be performed graphically using a simple plot of the cumulative density function (CDF) of the rescaled times versus the CDF of the uniform distribution to determine if the rescaled times lie within analytically defined confidence bounds. Due to its many appeals, the time rescaling theorem has been extensively used to test model goodness of fit to spike train data (Frank, Eden, Solo, Wilson, & Brown, 2002; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005; Czanner et al., 2008; Song et al., 2006).

There are, however, certain neurophysiological situations in which the standard time-rescaling approach can give misleading results, indicating poor goodness of fit when model fit may in fact be very good. This is a consequence of the practical numerical consideration that when a statistical model is fit to spike data one almost always discretizes time into bins. The time-rescaling theorem applies exactly to a continuous time point process (e.g., if we have infinite temporal precision and if the events, that is the spikes, are instantaneous). In a practical neuroscience setting, however, we usually do not have infinite temporal precision. First, a spike is an event that lasts for a finite (∼1 msec) period of time, and any temporal resolution far below this lacks physical relevance.^{1} Second, from a computational perspective, the fitting of statistical models requires much less computer time when the temporal discretization is coarser. Temporal discretization therefore imposes both physical and practical numerical constraints on the problem.

Often the probability per bin of a spike is small, and the distinction between continuous and discrete time of no concern, because the width of a spike is very short compared to the average interspike interval. Nevertheless, there are cases for which firing rates can be very high due to strong stimuli and ISIs short due to burst-type dynamics, and here the the per bin spike probability can be large even at 1 msec resolution or less. Such situations can arise in, for example, primate visual experiments where neurons can be extremely active (De Valois, Yund, & Hepler, 1982; MacEvoy, Hanks, & Paradiso, 2007; also see section 3 of this article), exhibiting firing rates of up to 100 Hz or more. In such situations, it is important to ensure that the rescaled ISIs are still (approximately) exponentially distributed and if not, to determine the correct distribution before performing the KS test.

Our aim in this article is to develop simple and easily applied goodness-of-fit tests for the discrete time case. We first restate the standard, continuous time form of the time-rescaling theorem for point processes and then demonstrate the discretization problem using a simple homogeneous Bernoulli (discretized homogeneous Poisson) process. We show theoretically that the discrete nature of the Bernoulli process results in first a lower bound on the smallest possible rescaled ISI, and second, because there can be only one spike per bin, a spike probability less than that which would be estimated by a continuous time model. These differences lead to biases in the KS plot caused by fundamental differences in the shapes of the geometric and exponential distributions, not by poor spike sampling or poor numerical integration techniques. We demonstrate further that these biases persist for more complicated simulated neural data with inhomogeneous firing rates and burst-type spike history effects. We show that the biases increase when spike history effects are present.

We then propose two computationally tractable modifications to the time-rescaling theorem applicable to discrete time data. The first is similar in spirit to a bootstrap and involves direct simulation of confidence bounds on the rescaled ISI distribution using the statistical model being tested. In the second method, by randomly choosing exact spike times within each bin and introducing a correction to the fitted discrete spike probabilities, we define an analytic rescaled time that is exponentially distributed at arbitrary temporal discretizations. Use of this analytical method gives results nearly identical to the numerical approach. In this article, we use generalized linear models (GLMs) with logistic link functions (McCullagh & Nelder, 1989; Wasserman, 2004). However, we emphasize that both procedures will apply to any discrete time statistical model of the time-varying spike probability, not only GLMs. We demonstrate both approaches using simulated data and also data recorded from real V1 neurons during monkey vision experiments. In all our examples, the KS plot biases are eliminated. Models for which the original KS plots originally lay outside 95% confidence bounds are demonstrated to in fact be very well fit to the data, with the modified KS plots lying well within the bounds. In addition to providing more accurate statistical tests for discrete time spiking models, our approaches allow the use of larger time bin sizes and therefore can substantially decrease the computation time required for model fitting.

## 2. Theory

*H*in our notation, such conditioning on the previous spiking history is always implied. Intuitively, the ISIs are stretched or shrunk as a function of total spike probability over the ISI interval so that the rescaled ISIs are centered about a mean of 1. Several proofs of this theorem exist (Brown et al., 2001). Here we present a simple proof of the exponential distribution of the rescaled ISIs. A proof of their independence is in appendix A.

_{t}The proof proceeds by discretizing time into bins of width , writing down the probability for each discrete ISI, and then taking the continuous time limit: . The discrete time bins are indexed as *k*, and the bins within which the spikes occur are denoted as *k _{i}*. Further, we define

*p*as the discrete probability mass of a spike in bin

_{k}*k*, and like , it should be taken as conditionally dependent on the previous spiking history.

*i*th ISI is the probability that there is a spike in bin

*k*given that the preceding spikes were located in bins : where

_{i}*L*is defined such that

_{i}*k*

_{i−1}+

*L*=

_{i}*k*. This is simply the product of the probabilities that there are no spikes in bins multiplied by the probability that there is a spike in bin

_{i}*k*=

*k*. For simplicity, we now drop the

_{i}*i*subscripts.

*p*: for all bins. This implies that , allowing the above equation to be rewritten as Note that the upper limit of the sum has been changed from

*L*−1 to

*L*with the justification that we are in a regime where all the

*p*'s are very small. We define the lower and upper spike times as and , define such that ,

^{2}and also define the ISI probability density

*P*(

*t*) such that . After substituting these into equation 2.3 and converting the sum to an integral, we obtain Consulting equation 2.1, we note that the integral in the exponent is, by definition, . Further, applying the fundamental theorem of calculus to this integral gives

^{3}Changing variables from

*t*to , we finally obtain which is now exponentially distributed and completes the proof.

*z*into ascending order and plot them along the

_{i}*y*-axis versus the uniform grid of values , where

*N*is the number of ISIs and . If the rescaled ISIs

*z*are indeed uniformly distributed, then this plot should lie along the 45 degree line. Essentially the cumulative density function (CDF) of the rescaled ISIs

_{i}*z*is being plotted against the CDF of the uniform distribution (the

_{i}*b*'s). We show an example of such a plot in Figure 1. Such a plot can be thought of as a visualization of a KS test, which compares two CDFs and is usually referred to as a KS plot. Formally we can state the null hypothesis

_{i}*H*

_{0}of this test as follows:

*H*_{0}: Given a model of the conditional intensity function that is statistically adequate, the experimentally recorded ISIs can be rescaled so that they are distributed in the same manner as a Poisson process (exponentially distributed) with unit rate.

Under the null hypothesis, the maximum distance between the two CDFs will, in 95% of cases, be less than , where *N* is the number of rescaled ISIs (Brown et al., 2001; Johnson & Kotz, 1970). Equivalently, the plotted line of rescaled ISIs will lie within the bounds in 95% of cases under the null hypothesis. It should be kept in mind that this is not equivalent to saying that the line of rescaled ISIs lying within these bounds implies a 95% chance of the model being correct.

### 2.1. Temporal Discretization Imposes KS Plot Bias.

*p*is in fact large. For example, 50 Hz spiking sampled at 1 msec implies , and under many conditions, the firing rate can be much higher, at least over some subset of the recording (e.g., during bursting). In such cases, the rescaled times will not be exponentially distributed, and the KS plot will exhibit significant biases (divergences from the 45% line) even if the discrete time model for

_{k}*p*is exactly correct. We demonstrate this in Figure 1 where two KS plots generated using the exact same spikes and time-varying firing rate are shown, but a temporal discretization was imposed for one of the plots.

_{k}These biases originate in two distinct consequences of discretizing a continuous process. First, there is a lower bound on the smallest possible ISI (one bin), which leads to a lower bound on the smallest possible rescaled time *z*. Second, because only a single spike per bin is allowed, using a discrete time model to estimate the firing rate of a continuous time process results in an underestimation of the firing rate. To demonstrate these issues fully, we now consider the simple case of a homogeneous Bernoulli process with a constant spike probability *p _{k}*=

*p*per bin for which the the CDF of the

*z*'s can be calculated analytically and the KS plot determined exactly.

*n*is an integer greater than zero and is the bin width. In the case of a homogeneous Bernoulli process, the rescaled ISIs are and and the discrete probability distribution of interspike interval times (and rescaled times) is As in equation 2.2, this is merely the product of the probability of no spike for

*n*−1 bins, followed by the probability of a spike in the last (

*n*th) bin. The

*B*subscript indicates the Bernoulli process.

*P*(

_{B}*n*) is not an exponential distribution, as would be expected for a homogeneous Poisson process. It is a geometric distribution, although in the limit of small

*p*it reduces to an exponential distribution.

^{4}The CDF of this ISI distribution is easily calculated by summing the geometric series and combining terms: To get the CDF of the rescaled ISIs

*z*, equation 2.8 is inverted to get and substituted into equation 2.10:

In Figure 2 we use equation 2.11 to generate the KS plot for various spikes per bin probabilities *p*. Even at *p*=0.04, which would correspond to 40 Hz firing at 1 msec discretization, the CDF is highly nonuniform with a steplike structure caused by the discrete values that the rescaled ISIs can take. Such “steps” will be smoothed out if an inhomogeneous Bernoulli process is used instead. There is, however, another more serious divergence from uniformity: a distinct positive bias at low (close to 0) rescaled ISIs and a distinct negative bias at high (close to 1) rescaled ISIs. This bias will not disappear if an inhomogeneous Poisson process is used.

*N*is again the number of rescaled ISIs. Plotted this way, one can clearly see the positive bias at low values of the rescaled ISIs and the negative bias at high values of the rescaled ISIs. We emphasize that since these KS and differential KS plots are calculated using the exact homogeneous Bernoulli distribution, the biases are not finite sampling effects.

*t*=0. Then for a homogeneous Poisson process, the probability density for the waiting time

*t*until the next spike is . Integrating, the probability that the next spike lies within any interval can be obtained: Defining the bin index

_{w}*n*such that and discretizing, we get where we have defined in the last line. Discretizing time transforms the homogeneous Poisson process into a homogeneous Bernoulli process, but with a per bin probability of a spike . In fact, by expanding the exponential as a Taylor series, it can be seen that The continuous Poisson process still has an expected number of spikes per interval of width of , but such an interval could have more than one spike in it. In contrast, the discrete Bernoulli process can have only 0 or 1 spikes per bin. Therefore the per bin “spike probability”

*p*calculated above is not the expected number of spikes of the continuous point process within an interval . It is the expected number of first spikes in an interval , which is, of course, less than the total number of expected spikes. Any chance of there being more than one spike in a time window has been eliminated by discretizing into bins.

The breakdown of the first-order expansion of the exponent is the source of the negative KS plot bias at high (*z* close to 1) rescaled ISIs. It is a fundamental consequence of discretizing a continuous time point process and is closely connected to how the conditional intensity function is generally defined, that is, as the small bin size limit of a counting process (see Snyder, 1975). More specifically the conditional intensity function is the probability density of single spike in an infinitesimal interval . As shown above, this probability density is actually , and the equality holds only in the limit. Thus, is not a good approximation for when the bin size is too large, and this causes the time-rescaling theorem to break down.

### 2.2. Inhomogeneous Bernoulli Processes.

The same positive (negative) bias in the KS plot at low (high) rescaled ISIs remains when the spiking process is not homogeneous Bernoulli. We now we define three inhomogeneous spiking models in continuous time and subsequently discretize them. We use these inhomogeneous discrete time models to simulate spikes and then calculate the rescaled ISIs using the exact discrete time model used to generate the spikes in the first place. The goal is to show that even if the exact discrete time generative model is known, the continuous time-rescaling theorem can fail for sufficiently coarse discretizations.

*k*is defined as the most recent bin prior to bin

_{ls}*k*that has a spike in it. We then simulated 10 minutes worth of spikes for each model and discretization.

^{5}After generating the spikes, we then calculated the rescaled times and CDF difference plots according to

Figures 3C and 3D show the results for the inhomogeneous Bernoulli model. Comparison with Figure 2 reveals that the main effect of inhomogeneity is to smooth out the steps. The positive (negative) biases at low (high) rescaled times remain, and, as expected, they are smaller for finer temporal discretizations. Figures 3E to 3H show the results when the spike history–dependent term is added to both the homogeneous and inhomogeneous Bernoulli models. The important point is that the biases are worse for both models when spike history effects are included, even though the models are constructed so that the mean firing rate remains 40 Hz. The reason is that the history-dependent term is constructed so that the spike train exhibits burstlike behavior. Specifically, after a short (2 msec) refractory period there is an increased probability of a spike. This increases the number of short ISIs. It also increases the smallest possible rescaled ISI *z* because the probability of a spike goes up immediately following a prior spike, and this ends up shifting distributional weight to short ISIs. This is an important point because it implies that in real experimentally recorded spike trains, which may exhibit burst-type behavior, the bias in the KS plot will be worse than would be expected by a simple estimate based on the mean firing rate, as given in equation 2.13.

### 2.3. Unbiased Discrete Time Rescaling Test Using Model Simulation.

In a previous section, we showed analytically that when discrete time models are used, the rescaled ISIs may not be exponentially distributed even if the model is exactly correct and that this manifests in the KS plot as systematic biases. Our first proposed solution (we present a second in the following section) to the bias problem is not to assume that the rescaled ISIs are exponentially (or uniformly) distributed, but to instead use a procedure similar to bootstrapping. This proceeds by noting that if a candidate model accurately describes the recorded spikes, then the rescaled ISI distribution of the spikes and the rescaled ISI distribution expected by the fitted model should be statistically indistinguishable. If instead the model form is inappropriate to describe the spiking data, then the rescaled ISI distribution expected by the candidate model will not match that of the experimentally recorded spikes, because the model does not describe the recorded spikes accurately. Although the expected distribution of rescaled ISIs is implicitly defined by the fitted model, in practice an explicit analytical form for this distribution may be hard to come by. It can, however, be sampled numerically using the fitted model to generate spikes and rescaling the resulting ISIs as a function of the model used to generate them.^{6}

Specifically, after a candidate model is proposed and fit to the recorded spike train data (any type of candidate model may be used as long as it provides an estimate of the conditional intensity function ), we use the model to simulate spikes, rescale the resulting simulated ISIs, and then use a two-sample KS test to determine if the sample of estimated rescaled ISIs and the sample of experimentally recorded rescaled ISIs are consistent with being drawn from the same underlying distribution (Press et al., 2007). Formally, the null hypothesis of the KS test has been changed from that stated in section 2 and adapted to the case of discrete time data using a model-based approach:

: Given a model of the conditional intensity function that is statistically adequate, the set of experimentally recorded ISIs can be rescaled so that they are distributed in the same manner as a sample of rescaled ISIs generated by the statistical model itself.

*N*dictated by experiment and the size of the simulated distribution

_{exp}*N*, which we can chose. The two-sample KS test determines the maximum difference between the CDFs of the experimentally recorded rescaled ISIs and the set of rescaled ISIs simulated using the model . If the maximum difference between the two CDFs is less than a certain value, specifically, then the null hypothesis is confirmed at the significance level (Press et al., 2007). Alternatively a differential KS plot (as already discussed) will have 95% confidence bounds of where we have written .

_{sim}Since *N _{exp}* is fixed by experiment, the test will be strictest (tightest confidence bounds) when or, equivalently, as increases. Formally, increasing

*N*increases the power of the KS test and reduces the number of false positives (false rejections of the null hypothesis). Fortunately

_{sim}*N*need not be overly large. Already at , the confidence bounds are only a factor of 1.02 wider than they would be in the infinite limit () in which the exact distribution would be known. This implies that a simulated spike train 20 times longer than the original, experimentally recorded, spike train provides a test power close to optimal and is sufficient to approximate the confidence bounds extremely well. In section 3, we use simulated spike trains 100 times the original experimental length (), which widens the confidence bounds by a factor of only 1.0005.

_{sim}Specifically, this technique proceeds as follows:

**Procedure for Numerical Correction**

Discretize the spike train into bins of width , and fit a discrete time statistical model.

Rescale the experimentally recorded ISIs using equation 2.7 to obtain the set of rescaled ISIs {

*z*}._{exp}Use the statistical model to estimate the rescaled ISI distribution. Simulate spike trains of the same length as the original experimentally recorded spike train, and rescale the simulated ISIs {

*z*}._{sim}Construct the CDFs of both

*z*and_{exp}*z*, take the difference, and plot it on the interval [0, 1] with the confidence bounds ._{sim}

### 2.4. Discrete Time Version of the Time-Rescaling Theorem.

We now prove a discrete time version of the time-rescaling theorem that corrects for both sources of KS plot bias. Specifically, we demonstrate how to write a new rescaled time , which is exponentially distributed for arbitrary temporal discretization. The proof given here assumes that an underlying continuous time point process is sampled at finite resolution .

*Suppose a continuous time point process is sampled at finite resolution so that the observation interval from is partitioned into bins of width . Denote the bin in which the ith spike is located as k*

_{i}and that of the next spike as bin k_{i+1}=k_{i}+L_{i}. Let be the discrete time conditional spike probabilities evaluated in bins k_{i+l}for . Define the random variable*where*

*and is a random variable determined by first drawing a uniform random variable and then calculating*

*Then has an exponential PDF with unit rate. For clarity of notation we drop the subscript i in the following proof. It should be taken as implicit*.

*k*and the next spike in bin

*k*+

*L*. If we knew the underlying continuous time conditional intensity function and the exact spike time in bin

*k*+

*L*() then using the continuous time version of the time-rescaling theorem, we could write the probability of this event as which is exponentially distributed in . Since we know neither nor precisely, we must recast in terms of what we do know: the discrete bin-wise probabilities

*p*

_{k+l}.

*p*

_{k+l}'s can be written in terms of the underlying continuous process . Consider any bin

*k*+

*l*. Since discretization enforces at most one spike per bin,

*p*

_{k+l}does not equal the integral of over the bin, but rather the probability (measured from the start of the bin) that the first spike waiting time is less than :

*q*

_{k+l}can be written directly in terms of

*p*

_{k+l}:

*k*+

*L*, we can pick any functional form as long as it obeys the constraint that its integral over the bin equals

*q*

_{k+L}=−log(1−

*p*

_{k+L}). One choice is .

^{7}It then follows that has now been rewritten in terms of what we know: the

*q*

_{k+l}'s (implicitly the

*p*

_{k+l}'s). We have defined the rescaled time as (where ) to distinguish it from the rescaled time of the continuous time version of the theorem, which directly sums the

*p*

_{k+l}and does not require random sampling of the exact spike time .

*k*+

*L*. The denominator is a normalization obtained by integrating the numerator between 0 and .

^{8}To draw from this distribution, we integrate it to obtain its CDF, set this cdf equal to a uniform random variable

*r*; and then solve for This completes the proof.

^{9}

*p*(or ). The second difference is that we randomly choose an exact spike time according to the distribution given in equation 2.31. This is done because there is no information about where exactly in bin

*k*+

*L*the spike is located, and for the rescaled time to be exponentially distributed, it must be continuously valued. In the continuous time limit, both of these distinctions vanish.

The hypothesis for testing goodness of fit is now exactly the same as that of the original time-rescaling theorem, except that the rescaling is modified to take into account the discretization. Reintroducing the subscript *i* to denote the individual spike times *k _{i}*, the procedure for performing the KS test is simply described.

**Procedure for Analytical Correction**

Discretize the spike train into bins of width with the spikes in bins and fit a discrete time statistical model resulting in the spike per bin probabilities

*p*._{k}

## 3. Results

In this section we fit GLMs to spike trains both simulated and experimentally recorded in awake monkey V1 cortex during visual stimulation. We then check goodness of fit using both the standard KS test and our methods. We demonstrate dramatic and nearly identical improvement in KS test accuracy for both techniques. Although we emphasize that any discrete time statistical model may be used, we chose a GLM, specifically the logistic regression (logit link function) form, because of the discrete binary nature of spike train data. Standard linear regression assumes continuous variables and is therefore inappropriate for the problem. Further reasons for using GLMs are their already wide application to the analysis of neural spiking activity (Frank et al., 2002; Truccolo et al., 2005; Czanner et al., 2008; Paninski, 2004a; Kass & Ventura, 2001), their optimality properties (Pawitan, 2001), and the ease of fitting them via maximum likelihood. Methods for fitting GLMs exist in most statistical packages, including Matlab and R.

### 3.1. Simulated Data.

Using the three continuous time point process models of the previous section (inhomogeneous Poisson, homogeneous Poisson with spike history dependence, and inhomogeneous Poisson with spike history dependence), we simulated 10 minutes of spikes from each model at very fine 10^{−10} msec discretization, essentially continuous time. These spike trains are the experimental data. We emphasize that all of our simulated data used a realistic mean firing rate of 40 Hz, and that many experimental situations exist for which the mean firing rates are much higher (De Valois et al., 1982; MacEvoy et al., 2007). The spikes were then discretized into 1 msec bins, and a GLM was fit to each simulated spike train. This procedure mimics the usual approach taken in fitting a GLM to real data. We used a logistic regression type GLM (logit link function) appropriate for discrete time binary data. Each model's spike train was fit using one of the following GLM forms:

- •
- •
- •

The are periodic B-spline basis functions with knots spaced 50 msec apart. These are continuously defined (even though we use discrete time bins), temporally localized basis functions, similar in shape to gaussians, which can be computed recursively (Wasserman, 2007). Their use allows for a PSTH-like fit, but one that is cubic polynomial smooth. The *g*(*k*−*r*) are indicator functions equal to 1 if the most recent spike is *r* bins prior to *k* and 0 otherwise. This functional form of is standard (Truccolo et al., 2005; Czanner et al., 2008). The and are parameters to be fit via maximum likelihood.

Next, following the first procedure described in section 2, we used the fitted GLM to simulate 100 10 minute spike trains, rescaled both the experimental and simulated ISIs, and constructed both the KS and CDF difference plots. The results are shown in Figure 4, where the blue lines correspond to the comparison of the experimental CDF with the uniform CDF and the red lines to the comparison of the experimental CDF with the CDF estimated from the GLM, as described in section 2.3. For all three models, the differential KS plots reveal strong biases when the experimental rescaled ISIs are compared with the uniform distribution and a complete elimination of the bias when the distribution simulated from the GLM is used. Further, use of the GLM simulated distribution makes the difference between the differential KS plot lying within or outside the 95% confidence bounds. This was true even when spike history effects were included and KS plot biases much worse than in their absence. Finally we applied the analytical discrete time-rescaling theorem described in section 2.4 and plotted the results in green. The analytically corrected differential KS plot is nearly identical to the numerically simulated one. This indicates that the analytical correction, which is simpler to apply, is sufficient to test model goodness of fit.

### 3.2. Monkey V1 Receptive Field Data.

Next we used spiking data recorded in V1 of two adult female rhesus monkeys (*Macaca mulatta*) during a fixation task. (See appendix B for details on the experimental procedure.) The visual stimuli consisted of a high-contrast light bar (50 cd/m^{2}; bar width, 0.2 or 5 pixels) moving with a constant velocity (/s or 425 pixels/s). The bar was presented in a square aperture of size ( or 600 × 600 pixels centered over the receptive fields of the neurons being recorded. During stimulus presentation, the monkey was required to maintain fixation within a virtual window (window size, ) centered on the fixation point.

*x*represents the

_{j}*j*th covariate that encodes the stimulus input (which may be in the form of either a feature of the visual stimulus, a PSTH profile, or a specific basis function). The

*g*(

*k*−

*r*) is, as in the previous section, an indicator function representing whether the most recent spike was in the past

*r*th temporal window, and represents the associated weight coefficient (a negative implies an inhibitory effect that might account for the refractory period of neuronal firing, and a positive implies an excitatory effect). The first term of the right-hand side of equation 2.1 is summarized by a stimulus-dependent response factor denoted by , and the last term represents a spiking history–dependent factor denoted by .

*J*denotes the number of knots or control points. Note that the values of control points affect only the shape of locally due to the piecewise definition. For the data shown here, 12 control points are nonevenly placed on the 2 s time interval

As with our simulated data, we see in Figure 5 that when the standard KS test was used, the KS and differential KS plots lay outside the 95% confidence bounds. However, when temporal discretization was taken into account and our two techniques were used, the plots lay well within confidence bounds, and the GLM model was shown to be very well fit to the data. Thus, again, the simple analytical method is found to be sufficient to account for discretization-induced KS plot bias.

## 4. Discussion

It is vital to check a model's goodness of fit before making inferences from it. The time-rescaling theorem provides a powerful yet simple-to-implement statistical test applicable to a spike train, or other point process, for which the data are binary rather than continuously valued. The theorem states that the ISIs of a continuous time point process can be rescaled (through a variable transformation) so that they are independent and exponentially distributed. The rescaled ISIs can then be compared to the exponential distribution using a KS test or further rescaled to a uniform distribution and the KS test performed graphically (Brown et al., 2001). Each ISI is rescaled as a function of the time-varying spike probability over that particular ISI. Thus, time rescaling considers the probabilities of individual ISIs and provides a much stronger statistical test than, for example, tests based on the unscaled ISI distribution. Practical numerical considerations dictate that the fitting of a statistical model usually requires the discretization of time into bins. For the purposes of the time-rescaling theorem, if the spike rate is low, ISIs long, and the probability per bin of a spike small, the distinction between discrete and continuous time will often not be important. In this article, we addressed the case where the spike rate is high, ISIs short, and the probability per bin of a spike large so that the distinction between discrete and continuous time matters.

When the probability per time bin of a spike is not sufficiently small, the standard, continuous time KS plot exhibits biases at both low and high rescaled ISIs. The source of these biases is twofold and originates in the consequences of discretizing a continuous time point process. First, the uncertainty as to where exactly in a bin a spike is located causes discrete time models to place a lower bound on the size of the smallest rescaled ISI *z* This leads to a positive KS plot bias at low *z*. Second, because discrete binary models allow only for a single spike per bin, they estimate per bin spike probabilities *p _{k}* that are less than with the integral over bin

*k*. We demonstrated both of these points theoretically using a homogeneous Poisson process, which we discretized into a homogeneous Bernoulli process, and also in our proof of the discrete time version of the theorem. These biases can be numerically relevant even at moderate spike rates and reasonable temporal discretizations. In this article, we considered mainly 40 Hz spiking at 1 msec discretization, (

*p*=0.04), but under some neurophysiological conditions, the spike rate can be much higher. For example, the awake monkey data presented in section 2 exhibited firing rates that at times exceeded 100 Hz.

Under such conditions, KS plots will exhibit biases at both low and high rescaled ISIs, which cannot be removed through more accurate numerical integration techniques or increased data sampling. In fact, sampling a longer spike train will make the issue more critical because the 95% confidence bounds on the KS plot scale as , where *N _{exp}* is the number of experimentally recorded ISIs. In cases of long recording times, the confidence bounds can be quite tight, and it can be difficult to see variations in the fit using the standard KS plot even if those variations are statistically significant. We therefore introduced a new type of plot, the differential KS plot, in which we plot the difference between the CDFs of the empirical and simulated ISI distributions along with analytical 95% confidence bounds. This new type of plot displays the same information as the original KS plot but in a more visually accessible manner.

To handle KS plot bias, we proposed and implemented two procedures, both capable of testing the statistical sufficiency of any model that provides a measure of the discrete time conditional intensity function. The first procedure operates purely in discrete time and uses numerical simulation, in a manner similar in spirit to a bootstrap, to estimate the distribution of rescaled ISIs directly from a fitted statistical model. Model goodness of fit is tested by comparing the estimated and experimentally recorded rescaled ISI distributions using a KS test. The confidence bounds on this two-sample KS test scale as . This procedure is therefore computationally tractable because a simulated spike train 20 times longer than the original experimentally recorded spike train will result in a KS test with confidence bounds only 1.02 times as wide as if the exact rescaled ISI distribution were known. For the second technique, we presented and proved a discrete time version of the time-rescaling theorem. This presumes an underlying continuous time point process that is sampled at finite resolution , analytically corrects for the discretization, and defines a rescaled time that is exponentially distributed at arbitrary temporal discretizations. We applied these two techniques to both simulated spike trains and spike trains recorded in awake monkey V1 cortex and demonstrated an elimination of KS plot bias when our techniques were used. The performance of both techniques was nearly identical, revealing high goodness of fit even when the fitted model failed the standard continuous time application of the KS test. Therefore, either method might be used, although the analytical method is perhaps preferable, if only because it is quicker to compute.

The discrete time-rescaling theorem is appropriate for point process type data such as spike trains, which are equally well described by either their spike times or their interspike intervals. It is, however, a test of model sufficiency, namely, whether a proposed statistical model is sufficient to describe the data. It does not, in and of itself, address issues of model complexity (overfitting) or whether the model form chosen is appropriate for describing the data in the first place.^{10} Overfitting can be guarded against by splitting one's data into training and test data. After fitting the model parameters using the training data, the fitted model and the discrete time-rescaling theorem can be applied to the test data. Of course, we do not mean to imply that the discrete time-rescaling theorem is the only statistical test that should be employed for selecting and validating an appropriate model. Other statistical tests and criteria—for example, log likelihood ratio tests and the Akaike and Bayesian information criteria—should also be employed to rigorously judge goodness of fit and model complexity.

One might reasonably ask, Why not simply fit a statistical model with extremely fine temporal discretization so that the time-rescaling theorem applies in its standard form? There are several issues. First, spikes are not instantaneous events but are spread out in time on the order of 1 msec or slightly less. Second, experimenters often exclude apparent spikes that occur less than a msec (or thereabouts) apart in a recording, as it is difficult to distinguish spike wave forms that essentially lie on top of each other. For both of these reasons, defining spikes as instantaneous events is physically problematic. Although the continuous time point process framework is theoretically appealing, there is usually no reason not to consider the data in discrete time, fit a discrete time model, and perform a discrete time goodness-of-fit test. Finally, there is the important issue of computation time and computer memory. When recording times are long and the number of spikes large, confidence bounds on the KS test will be very tight. Extremely fine temporal discretization will then be required for the biases to be less than the width of the confidence bounds. The amount of memory and computation time required under these conditions can rapidly become prohibitive. Further, since using the discrete time-rescaling theorem is almost as quick and simple a procedure as the standard KS test, we can see no reason not to use it. In closing, a failure of the standard KS test does not immediately imply poor model fit. Biases induced by temporal discretization may be a factor and should be considered before rejecting the model.

## Appendix A: Independence of Rescaled Times

*t*(or the spike bins

_{i}*k*) to the 's is one-to-one, we have that the following two events are equivalent: Therefore, the joint CDF of the 's is The last line follows from the multiplication rule of probability (Miller, 2006).

_{i}## Appendix B: Experimental Procedures

Experimental procedures were approved by the National Committee on Animal Welfare (Regierungspraesidium Hessen, Darmstadt) in compliance with the guidelines of the European Community for the care and use of laboratory animals (European Union directive 86/609/EEC). Neuronal spiking activities were recorded in awake and head-fixed monkeys in opercular region of V1 (RFs centers, 2–5 of eccentricity) and, on some occasions, from the superior bank of the calcarine sulcus (8–12 of eccentricity).

Quartz-insulated tungsten-platinum electrodes (diameter 80 m, 0.3–1.0 M impedance; Thomas Recording) were used to record the extracellular activities from three to five sites in both superficial and deep layers of the striate cortex (digitally bandpass filtered, 0.7–6.0 kHz; Plexon Inc.). Spikes were detected by amplitude thresholding, which was set interactively based on online visualization of the spike waveforms (typically, 2–3 SD above the noise level). Trials with artifacts were rejected during which the monkey did not maintain fixation or showed no response or incorrect behavior.

## Acknowledgments

We thank Sergio Neuenschwander and Bruss Lima for the generous use of their data. This work was supported by NIH grants K25 NS052422-02, DP1 OD003646-01, MH59733-07, the Hertie Foundation, the Max Planck Society, and EU grant FP6-2005-NEST-Path-043309.

## Notes

^{1}

This statement applies if one considers the spike as an event, as we do here. If one instead is interested in the shape and timing of the spike waveform—for example, the exact time of the waveform peak—then temporal resolutions of msec may be physically relevant.

^{2}

, where the average is taken over the time bin *k*. This definition holds only when the bin size is very small. We will show that for moderately sized bins, , and that this leads to biases in the KS plot.

^{4}

Setting and , , when the limit is taken.

^{5}

For the spike history–dependent models, the generation of a spike in bin *k* modifies the firing probabilities in bins . Thus, the simulation proceeded bin by bin, and on generation of a spike, the firing probabilities in the following bins were updated according to equation 2.18 before generating the next observation (spike or no spike) in bin *k*+1.

^{6}

Most generally, the conditional intensity function will have the form , where *x*(*t _{k}*) is the set of time-varying external covariates and

*H*(

*t*) is the previous spiking history. As the originally recorded spike train was of length T and the external covariates were defined over this time interval, it is simplest to simulate multiple spike trains the length of the original recording time T. For each spike train simulation,

_{k}*x*(

*t*) remains the same, but

*H*(

*t*) will differ depending on the exact spike times.

^{7}

In fact any form for within bin *k* + *L* could be chosen. Choosing it to be constant merely allows easier random sampling.

^{8}

Or Bayes' rule could be used: .

^{9}

Since is chosen randomly, rescaling will give slightly different results if performed multiple times. For all results presented in this article, such variation was negligible when considered at the scale of the KS plot's 95% confidence bounds. Further, 95% confidence bounds on the variability can be calculated analytically for a discretized homogeneous Poisson process. These bounds are given by and are always smaller than , the bounds of a two-sample KS test.

^{10}

In this article, we used GLMs. Such models are widely applied to the analysis of spike train data. They also have an interpretation as a sort of integrate-and-fire neuron (see, e.g., Paninski, 2004b). However, nothing in the discrete time-rescaling theorem precludes its use for testing the fit of a statistical model of the spiking probability that is not a GLM.