## Abstract

Statistical models of neural activity are integral to modern neuroscience. Recently interest has grown in modeling the spiking activity of populations of simultaneously recorded neurons to study the effects of correlations and functional connectivity on neural information processing. However, any statistical model must be validated by an appropriate goodness-of-fit test. Kolmogorov-Smirnov tests based on the time-rescaling theorem have proven to be useful for evaluating point-process-based statistical models of single-neuron spike trains. Here we discuss the extension of the time-rescaling theorem to the multivariate (neural population) case. We show that even in the presence of strong correlations between spike trains, models that neglect couplings between neurons can be erroneously passed by the univariate time-rescaling test. We present the multivariate version of the time-rescaling theorem and provide a practical step-by-step procedure for applying it to testing the sufficiency of neural population models. Using several simple analytically tractable models and more complex simulated and real data sets, we demonstrate that important features of the population activity can be detected only using the multivariate extension of the test.

## 1. Introduction

The strongly recurrent topologies of neuronal networks couple the activity of their component neurons together. Such coupling often manifests in experimentally observable correlations between the spiking of individual neurons. However, the importance of these correlations, and neuronal coupling in general, for information processing is still strongly debated. A possible first step toward understanding the computational role of neuronal correlations is to define and fit statistical models of neural population spiking that include the effect of possible mutual dependencies among neurons. After effectively characterizing the population activity through such a model, one can then proceed to ask whether these dependencies are in fact important for information processing or neural computation, or both. Numerous models and techniques for fitting them exist; however, all models must be checked for goodness of fit. In the case of neural population models, it is necessary for the activity of the population, not merely the activity of the individual neurons, to be accurately described. This is important because even if the statistics of the individual neurons are accurately described, this does not necessarily imply that the collective activity of the population is. The goal of this letter is to present a practical statistical test applicable to population models of neural spiking.

A goodness-of-fit measure must be geared toward both the model being tested and the data to which the model is being fit. In the case of population spiking activity, the data may be effectively represented as a set of point processes (binarized spike trains). Two main modeling approaches for such data have recently become popular in the neuroscience community. In the first, one directly models the joint distribution of spike patterns across neurons. An example of this approach is the pairwise Ising (maximum entropy) model first proposed for the analysis of population activity by Martignon et al. (2000) and others (Schneidman, Berry, Segev, & Bialek, 2006). Using this technique, they were able to demonstrate that significant pairwise correlations are present in at least some neuronal populations. Ising models treat the neuronal population activity as stationary in time (Tang et al., 2008), what is modeled is a time-independent joint probability distribution of the many-neuron spiking pattern. However, neural population activity is dynamic and highly variable with time, and a classical Ising model cannot capture this.^{1}

A second approach, which is capable of modeling dynamic changes in spiking probability at arbitrarily fine temporal scales, is the point-process-based generalized linear model (GLM) technique promoted independently by Brown, Paninski, Pillow, and others (Brown, Nguyen, Frank, Wilson, & Solo, 2001; Brown, Barbieri, Eden, & Frank, 2003; Paninski, 2003; Kass, Ventura, & Brown, 2005; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005; Pillow, 2007). This method models the time-varying spiking probability (conditional intensity function) of each neuron in the population independently but conditioned on the past history of all spikes in the neuronal population. The assumption being made is one of causality: due to the time delays involved in the synaptic transmission of information, neurons interact with some slight time lag.^{2} Thus, although the model is independently defined and fit to the spikes of each neuron, the collection of models describing all neurons still constitutes a population model. This approach has also been used to characterize pairwise dependencies in neuronal populations and demonstrate their importance for describing the spike statistics of individual neurons (see, e.g., Pillow et al., 2008). But what about the importance of the correlations for the collective spike statistics of the population?

This is the issue addressed by this letter: How can one determine how accurately a set of single-neuron models describes the population statistics? We describe and present a practical methodology for such a statistical test, based on a multivariate form of the time-rescaling theorem for point processes. In its univariate form, the time-rescaling theorem states that if the conditional intensity function of a point process is known, then its interspike intervals (ISIs) can be transformed, or rescaled, as an integral over that conditional intensity function so that they are independent and exponentially distributed. In contrast, the multivariate form of the theorem states that the ISIs of a population of point processes can be rescaled so that not only are each neuron's rescaled ISIs exponentially distributed and independent of each other but they are also independent of all other neurons’ rescaled ISIs.

We propose a step-by-step procedure for turning the multivariate version of the time-rescaling theorem into a goodness-of-fit measure for point-process-based population models. We demonstrate that even if each neuron's own spiking activity is sufficiently described by its conditional intensity function, the collective activity of the population may not be and that the distinction between the two cases can be made by applying the multivariate test. Toward this end, we apply the tests derived from both the univariate and multivariate time-rescaling theorem to simplified toy models, more complicated many-neuron models, and also spikes from real neuronal populations recorded in awake monkey V1 during natural scenes stimulation. Thus, the multivariate extension of the time-rescaling theorem provides a means for more accurately detecting situations in which couplings between neurons significantly modulate the population dynamics and are likely to be of importance for neuronal information processing.

## 2. Methods

### 2.1. Point-Process-Based Neural Models.

*H*and may be formally written as where

_{t}*N*(

*t*) is the counting function of the point process, stating the number of events inside the time interval [0,

*t*). If known exactly, the conditional intensity completely characterizes the statistical properties of the individual point process under weak assumptions. In practice, one generally hypothesizes a parametric form (which may be a function of stimuli, the neurons’ own spiking history, and so on) and fits it to the recorded spikes, generally using maximum likelihood estimation.

^{3}This approach has become standard for simultaneously quantifying the influence of multiple external covariates (such as stimuli) and also the neuron's own spike history on the output spikes (Brillinger, 1988; Chornoboy, Schramm, & Karr, 1988; Kass & Ventura, 2001; Brown et al., 2003; Paninski, 2003; Truccolo et al., 2005; Okatan, Wilson, & Brown, 2005; Kass et al., 2005; Rigat, de Gunst, & van Pelt, 2006; Pillow, 2007; Stevenson, Rebesco, Miller, & Körding, 2008; Pillow et al., 2008; Czanner et al., 2008; Wojcik, Mochol, Jakuczan, Wypych, & Waleszczyk, 2009).

*K*neurons in the population: Models that describe activity of only a single neuron are referred to as univariate models. In contrast, models that attempt to explain the firing of a whole population of neurons are referred to as multivariate models. This distinction is irrespective of whether the models make use of only the neuron's own spiking activity

*H*or that of other neurons or external variables.

_{t}### 2.2. The Univariate Time-Rescaling Theorem.

Regardless of whether a univariate or multivariate model is being considered, it must be tested for compatibility with the observed data. One possible test applicable to point process models, such as spike trains, is based on the time-rescaling theorem. This theorem and the associated test were introduced by Brown et al. to the neuroscientific community to evaluate univariate models, that is, models of single-neuron spike trains (Brown, Barbieri, Ventura, Kass, & Frank, 2002).^{4} The theorem is a special case of the theorems discussed by Meyer (1971) and Papangelou (1972) and has been applied to point process models in other scientific areas since then, for example, seismology (Ogata, 1988). We now briefly review the univariate version of the time-rescaling theorem before presenting the multivariate version and describing a new procedure for using the theorem to test multivariate models.

^{5}This rescaling takes the form where is the set of spike times and is the conditional intensity function of the single neuron that is to be modeled. Several proofs of this theorem exist, and we refer readers to Brown et al. (2002) for details. Although the can be compared to the exponential distribution, it is useful to note that a second transformation will rescale the ISIs to a uniform distribution: General practice is to sort the transformed variables

*z*into ascending order and plot them along the

_{n}*y*-axis versus the uniform grid of values , where

*N*is the number of interspike intervals and . If the transformed variables

*z*are indeed uniformly distributed, then the points should lie along the 45 degree line. Analytically defined confidence bounds for the maximal deviance from the diagonal can be determined and for even small sample sizes (

_{n}*N*>35) are well approximated by for the 95% confidence level (Massey, 1951; Brown et al., 2003). Essentially the cumulative density function (CDF) of

*z*s is being plotted against the CDF of the uniform distribution (the

_{n}*b*s). It can be thought of as a visualization of a Kolmogorov-Smirnov (KS) test, which compares two CDFs and is usually referred to as a KS plot. In practice, the residual

_{n}*z*−

_{n}*b*is plotted against

_{n}*b*. For this differential KS plot, the points should be close to the

_{n}*x*-axis inside the horizontal confidence bounds . (See Figure 1B for an example of such a plot.)

Beyond being uniformly distributed, the *z _{n}*s also have to be independent of each other. Since the independence of the quantities of a time series is hard to estimate, analysis is usually constrained to look for dependencies between subsequent intervals via a scatter plot or inspecting the autocorrelation structure of interspike interval lengths (Truccolo et al., 2005). For a scatter plot,

*z*

_{n+1}is plotted against

*z*, and in case of independence, the points should uniformly fill the unit area.

_{n}Any of these tests is designed to detect a specific departure of the rescaled times from the Poisson structure. Passing all tests is no guarantee that the rescaled times are indeed Poissonian. For example, noting that the (linear) autocorrelation between intervals up to a certain lag is not significantly different from zero implies neither uncorrelatedness for arbitrary lags nor independence, since uncorrelatedness is a weaker condition than independence; hence, intervals could be uncorrelated but still dependent (see the recent developments on copula models by Onken, Grünewälder, Munk, & Obermayer, 2009, as an example of neuroscientific relevance).

The KS test that is used to test the exponentiality of the rescaled intervals is adapted from the original work of Brown et al. (2002). However, we note that the KS test is not the only one suitable for comparing a set of samples with a proposed theoretical distribution (for a review, see Cox, 1955; Cox & Lewis, 1966; Naus, 1979). One possibility is the test, which, however, requires more samples and is less powerful than the KS test (Lilliefors, 1969; Massey, 1951). An alternative would be to plot confidence bounds based on the probability of individual data points to deviate from the diagonal, rather than testing the maximal deviance as in the KS test. More alternatives have been formulated, for example, by using the Anderson-Darling (Anderson & Darling, 1952), Kuiper (1962), or Cramer–von Mises statistics (Anderson, 1962). One may ask why there is such a variety of different tests. The answer is simple: There are many alternative hypotheses, against the null hypothesis, which states that the samples come from a given distribution. Generally, possible deviances are even more extensive for the hypothesis of a Poisson process. Each test is particularly sensitive to certain alternative hypotheses. A goodness-of-fit procedure should reject proposed models across a broad range of possible deviances from the true model. For this purpose, the KS statistics have been proven to be robust. If, however, the expected shape of the deviance is known, other tests than the KS test might be more powerful (Daley & Vere-Jones, 2002).

The time-rescaling transformation is turned into a goodness-of-fit measure by designing statistical tests that allow the rejection of the null hypothesis that the data from the single neuron were generated by the proposed model. However, if we are recording from multiple neurons, a more general version of the theorem applies, and more rigorous tests can be performed.

### 2.3. The Multivariate Time-Rescaling Theorem.

Within the conditional intensity function formalism, modeling the activity of several neurons at the same time amounts to modeling the conditional intensity functions of each neuron. A conceivable strategy to test such a population model would be to apply the univariate time-rescaling theorem to each of the neurons separately and reject the model if any of the individual tests fails (Truccolo et al., 2005; Rigat et al., 2006). However, this strategy may be insufficient since any test of the collective statistics must take into account the correlations across spike trains. A simple example serves to make this clear. Consider a unit rate Poissonian spike train and a second spike train that is an exact copy of the first one, except that all spikes are shifted in time by a constant lag. Each of the spike trains can obviously be individually described as a unit rate Poisson process, and such a population model would pass the univariate version of the time-rescaling test (apart from false rejections that are controlled by the significance level of the test). This description strikingly misses the strong temporally lagged correlation between the two spike trains. The population model is, in short, wrong. A refined version of this example is more thoroughly discussed in section 3.1.

Without a test that depends on the collective statistics, it is impossible to determine if all interactions are correctly described, whether or not the entire population is included in the conditioning. Since the univariate time-rescaling theorem is not sufficient to test population models, it needs to be generalized. How does the time-rescaling theorem extend to the multivariate case? As stated in the original work of Meyer (1971) and Papangelou (1972), each rescaled point process in the population should be a unit-rate Poisson train that is independent of all of the other rescaled point processes.^{6} The independence between the rescaled processes of different neurons is a nontrivial outcome of the generalized theorems. To our knowledge, this generalized theorem has yet to be applied to neuroscientific data.

It is important to distinguish between the statement of the mathematical theorem and the statistical tests that make use of it to validate neural models. Here, assuming that we have rescaled each individual spike train with its estimated population-dependent according to the theorem, how do we test that these rescaled processes are independent Poisson processes, and how do we design a statistical test that is able to reject the null hypothesis that the population's collective activity is generated according to a specific model? One strategy is to note that if each rescaled spike train constitutes a Poisson process independent of the others, then their superposition should also be a Poisson process. Along this line, we now develop the testing scheme that is based on the mathematical formulation of the time-rescaling theorem and that is shown schematically in Figure 1 and summarized in section 2.4.

*K*spike trains is rescaled according to equation 2.4, that is, the transformed spike times of neuron

*i*are given by According to the theorem, all rescaled processes should be unit-rate Poissonian. Hence, the first step is to apply the univariate testing procedure to each of the rescaled processes using the methods described in section 2.2. Since more than one statistical test is performed, an appropriate multiple-testing correction has to be applied on the significance level. One common choice is to use Bonferroni-corrected

^{7}significance levels for the individual tests (e.g., 5/

*K*%). If any of these tests fails, at least one neuron is not properly modeled and the overall population model has to be rejected.

The next step is to test the independence of the rescaled processes, as required by the multivariate theorem. To this end, we note that a superposition of Poisson processes on the same domain leads again to a Poisson process with a rate equal to the sum of the individual rates. The rescaled processes, however, in general have different lengths; their domain is given by: with , where *T* is the length of the trial in the original process. To obtain support on the unit domain, each transformed spike time of neuron *i* is multiplied by a factor of . All processes can now be superposed into a single point process on the unit domain. To recover unit rate, time is finally rescaled with a factor of . The Poissonian nature (independent and exponentially distributed intervals) of the superposed process can be tested with the same tools that are used for the univariate version, notably the KS test on the intervals and scatter plots of intervals for a qualitative assessment of the independence of consecutive intervals.

Even if the individual processes and their superposition are Poisson processes, this does not guarantee the independence of the processes (Jacod, 1975, but see He & Wang, 1996). Therefore, as a final test, we define a mark sequence by placing discrete marks *m _{k}* on each spike denoting the corresponding neuron it comes from (). If the individual point processes are Poisson and independent, the resulting superposition will also be a Poisson process with unit rate. Moreover, the marks will form an independent multinomial series with weights given by . The mark sequence should be an independent and identically distributed (i.i.d) sample from a multinomial series, that is, a random sequence of numbers. One out of many conceivable tests of this is based on cross-tabulating the pairs of consecutive numbers that appear in the sequence. Let denote the sample frequencies, based on the observed number of spikes. If the sequence is random, hence uncorrelated, the frequency of observing a particular pair of numbers is given by . Let the entries of the cross-table

*c*be normalized by the total number of entries. The deviation from the observed and theoretical frequencies is assessed by a test whose test statistic is given by and follows a distribution with (

_{ij}*K*−1)

^{2}degrees of freedom. The population model is rejected if either the KS test or the test rejects its null hypothesis based on a significantly low p-value.

### 2.4. Proposed Test Procedure.

The statistical tests presented in section 2.3 are summarized in the following step-by-step procedure:

Apply the time-rescaling transformation to each spike train. Test each of the

*K*rescaled processes for its Poissonian structure (using the classical univariate testing procedure of section 2.2).Let be the total rescaled time of each neuron

*i*—. Denote by the ratio of the total rescaled time of neuron*i*compared to the total rescaled time of the entire population. Normalize the rescaled ISIs of each neuron*i*by so that its average rescaled firing rate becomes (as opposed to 1).Superimpose all the rescaled processes into a single point process. Place discrete marks

*m*at each spike denoting the corresponding neuron it comes from ()._{k}Test that the superimposed process is Poisson with unit rate using the same tests as were applied for the univariate case.

Test the independence of the mark series using cross-tabulation of pairs of subsequent marks and a test.

## 3. Results

We now present a series of example cases for which a population of neurons passes the univariate time-rescaling theorem but not the multivariate extension. These will clearly show that unless the multivariate versions of the tests are used, neuronal models that ignore interactions may be judged adequate even when the interactions between neurons are in fact significant and strong. We begin with a simple toy model of two neurons that are mutually coupled. Two more analytically tractable examples are presented in the appendix. Next, we present results using simulated data generated from a many-neuron model with second-order couplings. Finally we show results obtained by fitting generalized linear models (GLMs) with cross-history terms to a population of V1 neurons recorded in awake macaque monkeys during natural scenes stimulation. Unless otherwise stated, the discrete time version of the univariate time-rescaling theorem (Haslinger et al., 2010) is used.

### 3.1. Example: Mutually Coupled Neurons.

Assume a system of two model neurons, each mutually coupled with stochastic synaptic delays. That means that after neuron 1 emits a spike, neuron 2 will spike with a delay drawn from a stationary probability distribution. Because the coupling is mutual, the next spike of neuron 1 is generated after a delay to the second neuron's spike. The spike times *t*^{(1)}_{n} and *t*^{(2)}_{n} form a bivariate, simple point process. Formally, this corresponds to an alternating renewal process: intervals are drawn from one of two fixed distributions in an alternating fashion. Let the two delay distributions be normal with means and variances (with and truncated at zero to avoid negative intervals). Without loss of generality, we define the first spike of neuron 1 to be at *t*=0. Figure 2 shows an example spike train. For the numerical examples, we chose s, s, s, and s and *N*=10, 000 spikes for each neuron.

*t*and

*p*(

*t*) denotes the probability density function (PDF) on the interspike intervals (Brown et al., 2003). In our case,

*p*(

*t*) equals the PDF of a gaussian variable:

*z*can be directly calculated from the original intervals via where is the c.d.f (cumulative density function) of a gaussian distribution with specified mean and variance. The transformation will result in rescaled spike trains that form unit-rate Poisson processes (up to statistical fluctuations). This can be numerically confirmed by the KS test (to test the exponentiality) and qualitatively using the scatter plots (to test dependence of subsequent intervals) (see Figures 3A, 4A, and 4B). From this finding, the null hypothesis of the data being generated by the univariate model cannot be rejected.

However, this model is incorrect; the spike trains of the neurons are not independent. Following section 2.4, we find that the superposed spike train is not Poissonian as a consequence of the interdependence between the two rescaled point processes. The multivariate KS test, multivariate scatter plots, and test for independence of the mark series all fail (see Figures 3B and 4C; KS test *KS*(*N*=20, 000)=.059, *p*<.001, *p*-value for the linear correlation between intervals *p*<.001 (Pearson's ), , *p*<.001). Hence, following the multivariate formulation of the time-rescaling theorem, we have to reject this model, which lacks interneuron interactions.

To build a sufficient model, we have to take the interactions into account. The correct coupled model is:

- •
For neuron 1: The

*i*th spike has a delay to the (*i*−1)th spike of neuron 2 according to a normal distribution with parameters and . - •
For neuron 2: The

*i*th spike has a delay to the*i*th spike of neuron 1 according to a normal distribution with parameters and .

Using this model to rescale both spike trains (using the direct transformation of equation 3.3, we find that the superposed process is indistinguishable from a Poissonian one (see Figures 3B and 4D, *KS*(*N*=20, 000)=.0043, *p*>.05, n.s. (not significant); *p*-value for the linear correlation between intervals being zero *p*>.05, n. s. (Pearson's )). Moreover, the mark distribution is an independent realization of a binomial distribution (, *p*>.05, n. s.).

### 3.2. Correlated Many-Neuron Model.

As a more complex example, we present the application of the proposed test to a simulated data set with many interacting neurons. Several methods have been proposed to generate spike trains with a given (pair-wise) correlational structure (Niebur, 2007; Krumin & Shoham, 2009; Macke, Berens, Ecker, Tolias, & Bethge, 2009; Gutnisky & Josic, 2010). Gutnisky and Josic (2010) describe a method to generate pair-wise correlated binary vectors by sampling from a multivariate autoregressive process whose covariance matrix is a function of the desired correlational structure. We used this method to generate simulated spike trains of length *T* = 3 s and *N* = 30 neurons. Each of them is assigned an average firing rate of Hz. A sequence of binary vectors is generated, representing the activation of each neuron per time bin. We set the width of the time bin to 1 ms. Exact spike times are sampled according to the procedure described by Haslinger et al. (2010). This ensures that the point process of the population is orderly and the multivariate theorem can be applied. The processes are restricted to have no significant autocorrelation and cross-correlations that decay exponentially in time with a time constant of ms. The correlations are assumed to be positive and of the same peak value for all pairs of neurons. The overall strength of the introduced pair-wise correlations can be controlled by modulating .

*t*is modeled as a nonlinear transformation of a linear sum of activities of the other neurons:

*m*=6 basis splines for the history of each other neuron. Knot points are spaced on a support of up to 15 ms. Let the

_{e}*j*th spline be of shape . Then the contributing term to the above equation is where denotes the spike times of neuron

*i*, and denotes the coefficients of the regression model. All model parameters are estimated using maximum likelihood. Such a logistic regression model completely captures the pair-wise cross-correlations present in the data. If each neuron is modeled this way, the population model is not rejected by the multivariate test for arbitrary correlation strengths (e.g., for : KS test

*KS*(

*N*=1843)=.012,

*p*>.05, n. s., ,

*p*>.05, n. s.).

### 3.3. Natural Stimulation of Macaque V1.

*m*=11 basis splines. Knot points are spaced on a logarithmic scale up to 64 ms. Let the

_{h}*j*th spline be of shape . Then the contributing term to equation 3.7 for neuron

*i*is where denotes the spike times of neuron

*i*. The stimulus term is modeled using basis splines so that the firing rate can vary as a function of the time since stimulus onset.

*m*=74 basis splines

_{s}*B*(

_{j}*t*) with an equidistant spacing over the trial length were used. The stimulus term is thus given by , , and are parameters determined by maximum likelihood estimation using the standard iteratively reweighted least-squares algorithm (McCullagh & Nelder, 1989). All 42 independent neuron models passed univariate KS tests (significance level 5/42%). However the multivariate procedure rejected the set of independent neuron models as a sufficient model of the collective population activity (see Figure 6A; KS test

*KS*(

*N*=81085)=.0096,

*p*<.001, ,

*p*>.05, n. s.).

In the above, the same spline filters were used as with the self-history terms. This model with cross-couplings yields 100% performance on the univariate testing, as did the uncoupled model. This time, however, the multivariate test does not reject the population model (*p*-value for interspike intervals of superposed process, KS test *KS*(*N*=81085)=.0033, *p*>.05, n. s., , *p*>.05, n.s.; see Figure 6B). The cross-interaction terms thus improve the model performance considerably and would not have been detected as significant using the univariate test procedure.

## 4. Discussion

Simultaneous recordings from neural populations have become commonplace. Although treating neurons in a population as independent is still a popular modeling approach (Nirenberg, Carcieri, Jacobs, & Latham, 2001; Oram, Hatsopoulos, Richmond, & Donoghue, 2001; Petersen, Panzeri, & Diamond, 2001; Averbeck & Lee, 2003; Jacobs et al., 2009), it has become increasingly clear that interactions between neurons matter, at least for describing the collective spike statistics, and likely for neuronal computation, although this is still hotly debated. Thus robust multivariate population models (Iyengar, 2001; Brown, Kass, & Mitra, 2004; Truccolo et al., 2005) for characterizing such activity are critical, as are appropriate statistical tests for validating these models. As yet, however, few tests of the collective statistics are being employed. When modeling neural populations, most prior studies either neglected any goodness-of-fit analysis (Chornoboy et al., 1988; Brillinger, 1988; Itskov, Curto, & Harris, 2008), or used univariate tests such as the classical time-rescaling theorem separately for each modeled spike train (Truccolo et al., 2005; Rigat et al., 2006). Using both simulated and real data, we demonstrated that this approach is insufficient because it does not test the collective statistics of the population. We presented a multivariate version of the time-rescaling theorem and proposed a test methodology suitable for determining the statistical sufficiency of neural population models. Our methods generalize time-rescaling-based goodness-of-fit tests to population models and provide one option for testing their statistical sufficiency, limited only by the finite amount of data and the statistical power of the tests.

The efficacy of our methods was demonstrated using both simulated (for which the true neuronal coupling is known) and real data sets (for which it must be inferred). The examples of mutually coupled neurons (see section 3.1) and a larger population of neurons (in section 3.2) show substantial cross-neuron dependencies that might easily be missed if each neuron is looked at only in isolation. In the appendix, we present further examples of practical interest. The example with synchronous triplets covers the case of higher-order spike patterns. Assessing the significance of higher-order spike patterns has been proven difficult in other studies (Martignon et al., 2000; Pipa, Riehle, & Grün, 2007; Pipa, Wheeler, Singer, & Nikolić, 2008). Here, a model that fails to incorporate significant higher-order interactions is likely to be rejected by tests based on the multivariate time-rescaling theorem. The second example in the appendix represents the class of common-input models (Kulkarni & Paninski, 2007; Paninski et al., 2010) or more generally latent variable models in which parts of the network that contribute to spiking are unobserved (Smith & Brown, 2003; Eden, Frank, Barbieri, Solo, & Brown, 2004; Jackson, 2004; Nykamp, 2007; Koyama & Paninski, 2009). Here, the proposed test procedure is able to detect the dependencies that are introduced by the common input or globally acting latent processes and is therefore a strong tool for these kind of studies. Finally, we applied our method to a set of experimental data obtained from the visual cortex of the macaque monkey. It demonstrates that using the univariate theorem may erroneously indicate a good fit for independent encoding models. The lack of fit is detected by the multivariate extension and is (at least partly) corrected for by including additional cross-interaction terms in the GLM.

One has to carefully distinguish the mathematical theorem and the statistical tests that are performed. The multivariate theorem states that rescaling the individual spike trains with a proper model intensity results in a set of statistically independent Poisson processes. We chose a particular procedure to test this hypothesis by superposing all rescaled processes to obtain again a Poisson process with an independent mark sequence. Since in general, the superposition of arbitrary independent point processes will asymptotically result in a Poisson-like point process (but see Lindner, 2006), it is necessary to check that the individual processes are Poissonian (step 1 of the test procedure of section 2.4). The time rescaling requires not only the superposed but also the individual processes to be Poissonian. In addition, through this first step, the univariate version of the time-rescaling theorem is included, and the multivariate test has by definition at least the same test power. One alternative to the superposition of the processes into a single point process after time rescaling is to study the cross-correlations between pairs of the rescaled processes. If the processes are mutually independent, their cross-correlation vanishes for arbitrary lags. While this procedure is in principle feasible, it requires on the order of *K*^{2} tests and controls only for pair-wise correlations. Our particular choice of the testing procedure has the considerable advantage that it can also detect higher-order dependencies, and the outcome is just a single point process that has to be tested for its Poisson properties. Hence, the proposed testing method inherits all the possibilities for tests that are already available for the univariate case. Although the rescaling and superposition potentially obscure temporal correlations that have not been properly modeled, we believe that the procedure works well when correlations are stationary in time and extend over the entire recording time. The additional test on the independence of the mark sequence is a simple and well-established statistical test, although there are certainly many other statistical tests to assess the randomness of an integer sequence (for a review, see Marsaglia, 1985).

The test procedure we have proposed is derived from the multivariate time-rescaling theorem for point processes and is applicable to all population models based on the conditional intensity formalism. Modeling the conditional intensity of a point process covers a range of model classes that have been used in the neuroscientific context. Inhomogeneous Poisson processes (Olson, Gettner, Ventura, Carta, & Kass, 2000; Kass & Ventura, 2001), as well as stationary renewal processes for the interspike interval distributions—typically involving gamma (Brown et al., 2003; Maimon & Assad, 2009), log normal (Barbieri et al., 2001) or inverse gaussian distributions (Iyengar & Liao, 1997))—can be described using the conditional intensity. Moreover, inhomogeneous Markov interval processes (IMI) are a subclass of conditional intensity models (Johnson, 1996; Kass & Ventura, 2001; Brown et al., 2002; Muller, Buesing, Schemmel, & Meier, 2007; Wojcik et al., 2009). The conditional intensity can also be nonparametrically estimated (Berry & Meister, 1998; Jacobs et al., 2009). Finally, Poisson and Bernoulli GLMs are inside the conditional intensity framework and therefore amenable to the proposed goodness-of-fit procedure. Besides modeling spike history effects, these GLMs can study the influence of external covariates, such as stimuli or the spiking activity of neighboring neurons (Harris, Csicsvari, Hirase, Dragoi, & Buzsaki, 2003; Paninski, 2003; Kass et al., 2005; Okatan et al., 2005; Ventura, Cai, & Kass, 2005; Pillow et al., 2008; Stevenson et al., 2008; Lewi, Butera, & Paninski, 2009; Toyoizumi, Rad, & Paninski, 2009). Since the conditional intensity works on the level of point processes, it cannot be applied to models for continuous data (like modeling neuronal voltage dynamics or locally averaged electrical activity of a network of neurons). It can nevertheless be used to measure the influence of local field potentials to the spiking activity of single neurons (Andersen, Musallam, & Pesaran, 2004; Quiroga & Panzeri, 2009).

A distinction should be made between absolute and relative goodness-of-fit measures. For instance, the classical time-rescaling theorem was designed as an absolute goodness-of-fit test, yielding a binary decision about a specific model. Another example is the test on the deviance of a regression model (Montgomery, Peck, & Vining, 2006). Relative goodness of fit measures compare different models and may be useful even if none of the models corresponds to the true model that generated the data. There are, in general, many measures available for assessing relative goodness of fit of neuronal models. Most commonly, the likelihood of a part of the data that was not used for fitting is evaluated (Pillow et al., 2008; Shlens et al., 2009; Itskov et al., 2008). If the experimental paradigm allows it, a model can be judged based on its ability to decode input stimuli or other conditions (Brown et al., 2004; Kass et al., 2005; Churchland, Yu, Sahani, & Shenoy, 2007; Stevenson et al., 2008). Furthermore, information-theoretic measures like the BIC (Schwarz, 1978) or AIC (Akaike, 1973) can be used (see Truccolo et al., 2005; Czanner et al., 2008). Often the power of a model to predict times of spiking is quantified (Gerstner & Naud, 2009; Truccolo, Hochberg, & Donoghue, 2010). In the case of deterministic models, their predictions can be compared to the observed data set using spike train metrics (e.g., the coincidence factor—Kistler, Gerstner, & Hemmen, 1997; Jolivet et al., 2008). In the case of stochastic models, evaluations have been made using receiver-operating characteristics (Truccolo et al., 2010) and, more extensively, the (univariate) time-rescaling theorem (see Ogata, 1988; Barbieri et al., 2001; Smith & Brown, 2003; Brown et al., 2003; Rigat et al., 2006; Koyama & Kass, 2008; Wojcik et al., 2009; Shimokawa & Shinomoto, 2009). It should be noted that both the classical and the multivariate extension of the time-rescaling theorem can be used as a relative measure by ranking models according to the absolute value of KS statistics.

Passing any absolute goodness-of-fit test does not guarantee that one has found the true model that underlies the data. It merely indicates that the observed data are consistent with a specific model within the bounds that are set by the power of the employed statistical tests. The time-rescaling procedure can be seen as a residual analysis that detects structure in the data that have not been accounted for by the model. In this way, it serves as a diagnostic tool. For example, in the first step of the multivariate procedure, each neuron is individually tested. If one or more models are rejected at this stage, the behavior of these neurons was not accurately described. If all neurons pass the individual test but the population model is rejected based on the interval distribution of the superposed process or the mark sequence, then dependencies or higher-order interactions are contained in the residual traces and indicate an incomplete population model. By performing the superposition only on a subset of neurons, the cause of the model failure may be further localized.

In this letter, we have provided one possible test of population goodness of fit. Our methodology is simple to implement and builds on well-established univariate tests of statistical sufficiency. Other techniques should be employed as well, but the multivariate time-rescaling theorem and its derived tests are a powerful addition to the statistical sufficiency test toolbox and should be widely applicable.

## Appendix: Additional Examples

Here, we give two more simple examples that are analytically tractable. With these, we demonstrate the need to extend the univariate time-rescaling theorem: Even when we can estimate the conditional intensities separately for each neuron, the testing for model sufficiency has to be done in a unified way.

### A.1. Example: Neurons with Synchronous Triplets.

Consider three neurons, each with its own ground firing rate . On top of that, synchronous triplet events are inserted independently with a rate of . The resulting processes are again Poisson processes with rate ; however, these processes will be correlated through the triplet events and are hence dependent. Note that we can summarize the three processes with one marked “simple point process,” where a mark *S* is placed for each triplet event and for spikes from the corresponding ground processes with rate . The observed spike train of neuron *i* consists of the superposition of the events with marks *S* and *i*.

For the numerical example, we used the same stationary rate of Hz for every neuron, and triplets were inserted with a rate of Hz. The simulated spike trains had a length of *T* = 200 s, using a time discretization of ms. Similar to the example in section 3.2, exact spike times are sampled using the procedure of the discrete time-rescaling procedure (Haslinger et al., 2010) so that the resulting point process is simple. The synchronicity of the triplet events is therefore confined to a temporal window of ms. Example spike trains are shown in Figure 7.

If the univariate test was applied to each neuron individually, then a simple model such as for the activity of the individual neurons would already be statistically sufficient to describe the spikes (see Figure 8A). A constant-rate model corresponds to a uniform rescaling of all intervals with a constant factor. However, this model fails the multivariate test, as can be seen in Figure 8B (*KS*(*N*=29316)=.015, *p*<.001, , *p*<.001). Time-rescaling the individual spike trains does not affect the dependencies between the spike trains since all three spike trains are rescaled by the same constant factor.

*S*). For each spike train, consider the remaining spikes and model the occurrences of these with a Poisson model of rate each. When time is discretized into small bins of width , we get: After time rescaling, one obtains three processes whose Poisson nature cannot be rejected. The same holds for the superposed process (

*KS*(

*N*=29316)=.005,

*p*>.05, n. s., ,

*p*>.05, n.s.; see Figure 8B).

### A.2. Example: Neurons with Common Input.

This example is an instantiation of the multiple interaction process (MIP) of Kuhn, Aertsen, and Rotter (2003). First, a latent Poisson process with constant ground rate is generated. The spike trains of the observed neurons are independently thinned versions of ; each spike from the ground process is included in the spike train of neuron *i* with probability *p _{i}*. The resulting processes are Poisson processes with rate . However, these processes will be correlated, hence dependent.

For the numerical example, we used a latent ground process with a stationary rate of 50 Hz and six neurons with *p _{i}*=

*p*=0.2. A possible realization of such a process is shown in Figure 9. Exact spike times are sampled using the procedure of the discrete time-rescaling procedure, and the resulting point process is simple.

A simple model for the conditional intensity such as for the activity of the individual neurons would already be sufficient as each daughter process constitutes a Poisson process itself (see Figure 10A). The time-rescaling procedure scales each spike train by a constant factor to obtain unit rate Poisson processes. This would clearly not affect the dependencies between the spike trains. Using the tests of the multivariate time-rescaling theorem, however, we reject the independent neuron population model (see Figure 10B; *KS*(*N*=5897)=.13, *p*<.001, , *p*<.001).

*KS*(

*N*=5897)=.013,

*p*>.05, n. s., ,

*p*>.05, n.s.; see Figure 10B).

### A.3. Experimental Details.

Data were obtained with one rhesus monkey. Experimental procedures were approved by local authorities (Regierungspraesidium Hessen, Darmstadt, Germany) and were in accord with the guidelines of the European Community for the care and use of laboratory animals (European Union directive 86/609/EEC).

For each trial, fixation was required throughout the trial length of 5 seconds. Test stimuli consisted of natural scene movies recorded with a digital video camera (resolution 960 times 720 pixels at 30 frames per second, noninterleaved, Panasonic DVCPRO-HD format). Video sequences consisted of images of leaves, garden trees, or scenes in our laboratory obtained after a single panning movement of the camera (i.e., the video sequences always contained a single predominant global movement component).

All video clips were fully desaturated and converted into bit map image sequences cropped to a size of 936 times 702 pixels. The sequences were displayed at 100 Hz (the same frame was presented twice) using a standard graphical board controlled by ActiveStim (average luminance, 10 cd/*m*^{2}).

Recordings were made from the opercular region of V1 (receptive fields centers, 2.0 to 3.0 degrees eccentricity) and from the superior bank of the calcarine sulcus (10.0 to 13.0 degrees eccentricity). Electrodes were inserted independently into the cortex via guide tubes positioned above the dura (diameter, 300 m; Ehrhardt Söhne, Germany), assembled in a customized recording device. Quartz-insulated tungsten-platinum electrodes (Thomas Recording, Germany; diameter, 80 m) with impedances ranging from 0.3 to 1.0 M were used to record simultaneously the extracellular activity from four to five sites in both superficial and deep layers of the cortex.

The spiking activity of small groups of neurons (MUA) was obtained by amplifying (1000×) and bandpass filtering (0.7 to 6.0 kHz) the recorded signals with a customized 32 channel Plexon preamplifier connected to an HST16o25 headset (Plexon Inc., U.S.A.). Additional 10× signal amplification was done by on-board amplifiers (E-series acquisition boards, National Instruments, U.S.A.). The signals were digitized and stored using a LabVIEW-based acquisition system developed in our laboratory. Spikes were detected by amplitude thresholding, which was set interactively after online visualization of the spike waveforms (typically two to three standard deviations above noise level). Spike events and corresponding waveforms were sampled at 32 kS/s (spike waveform length, 1.2 ms).

Offline spike sorting was performed using a dynamic template-matching method implemented in a custom software package. Sorting was initiated by an automatic procedure that defined up to 12 different clusters. Afterward, various displays, such as tuning curves, autocorrelograms, and measurements of recording stability, were used to guide interactively which cluster to merge or delete. Only clusters well separated in 2D and 3D plots of spike principal component analysis scores were assigned to single units if a refractory period was confirmed in interspike interval distributions.

## Acknowledgments

We are grateful to Sergio Neuenschwander and Bruss Lima for supplying the macaque V1 recordings discussed in section 3.3. This work was supported by NIH grant K25 NS052422-02, the Max Planck Society, and EU Grant PHOCUS, 240763, FP7-ICT-2009-C. F.G. acknowledges partial support by the Stiftung Polytechnische Gesellschaft (Frankfurt am Main, Germany) and support by the Swiss National Science Foundation under grant number 200020-117975.

## Notes

^{1}

It should be noted that Marre and others have extended the Ising model to include spike history dependence by adding spikes in time bins up to some truncated past to the Ising model and effectively inflating the population size (Martignon et al., 2000; Tang et al., 2008; Marre, Boustani, Frégnac, & Destexhe, 2009). This results in greatly increased model fitting times because for models of the full joint distribution, a normalization constant (partition function) must be calculated that involves a sum over the probability of all 2^{K} possible patterns (across *K* neurons) of spikes. This is extremely computationally intensive and must usually be done using Monte Carlo techniques, although we note that several mean field methods have recently been proposed (Broderick, Dudik, Tkacik, Schapire, & Bialek, 2007; Roudi, Tyrcha, & Hertz, 2009).

^{2}

The size of this lag need be no more than one time bin, and the bin size may be arbitrarily small.

^{4}

It appeared as a “rate-rescaling theorem” in Barbieri, Quirk, Frank, Wilson, and Brown (2001).

^{5}

The time-rescaling theorem applies exactly when continuous time is used and the spikes are defined as instantaneous events. As a practical matter, most statistical models discretize time into bins, and this discretization can cause a well-fitted model to, in certain instances, be erroneously rejected if the firing rate is too high compared to the bin size. Two of the authors have recently proposed a discrete-time version of the time-rescaling theorem that eliminates such problems (Haslinger, Pipa, & Brown, 2010).

^{7}

When a statistical test is repeated *K* times and the global false-positive rate shall be bounded by , the Bonferroni-corrected significance levels for each individual test are given by . Note that in the context of goodness-of-fit evaluation, false positives correspond to erroneously rejecting a true model. Another possibility is to control the false discovery rate by the Benjamini-Hochberg (Benjamini & Hochberg, 1995) or Simes's procedure (Simes, 1986), for example.