## Abstract

This letter presents a general framework based on reproducing kernel Hilbert spaces (RKHS) to mathematically describe and manipulate spike trains. The main idea is the definition of inner products to allow spike train signal processing from basic principles while incorporating their statistical description as point processes. Moreover, because many inner products can be formulated, a particular definition can be crafted to best fit an application. These ideas are illustrated by the definition of a number of spike train inner products. To further elicit the advantages of the RKHS framework, a family of these inner products, the cross-intensity (CI) kernels, is analyzed in detail. This inner product family encapsulates the statistical description from the conditional intensity functions of spike trains. The problem of their estimation is also addressed. The simplest of the spike train kernels in this family provide an interesting perspective to others' work, as will be demonstrated in terms of spike train distance measures. Finally, as an application example, the RKHS framework is used to derive a clustering algorithm for spike trains from simple principles.

## 1. Introduction

Spike trains can be observed when studying either real or artificial neurons. In neurophysiological studies, spike trains result from the activity of neurons in multiple single-unit recordings by ignoring the stereotypical shape of action potentials (Dayan & Abbott, 2001). And more recently, there has also been great interest in using spike trains for biologically inspired computation paradigms such as the liquid-state machine (Maass, Natschläger, & Markram, 2002; Maass & Bishop, 1998) and spiking neural networks (Bohte, Kok, & Poutré 2002; Maass & Bishop, 1998). Regardless of the nature of the process producing the spike trains, the ultimate goal is to analyze, classify, and decode the information expressed by spike trains.

A spike train is a sequence of ordered spike times corresponding to the time instants in the interval at which a neuron fires. Unfortunately, this formulation does not allow the application of the usual signal processing operations to filter, eigendecompose, classify, or cluster spike trains, which have been proven so useful when manipulating real-world signals and form the bases to extract more information from experimental data. From a different perspective, spike trains are realizations of stochastic point processes. Therefore, they can be analyzed statistically to infer the underlying process they represent. The main limitation in this formulation is that when multiple spike trains are analyzed, they typically need to be assumed independent to avoid handling the high-dimensional joint distribution.

Nevertheless, statistical analysis of spike trains is quite important, as can be asserted from the large number of methodologies proposed in the literature (see Brown, Kass, & Mitra, 2004, for a review). One of the fundamental descriptors of spike trains is the intensity function of the process giving rise to the observed spike train. If the spike train is assumed to be well modeled by an inhomogeneous Poisson process, then many methods have been proposed for the estimation of the intensity function (Kass, Ventura, & Cai, 2003; Richmond, Optican, & Spitzer, 1990; Reiss, 1993). However, in the general case, the problem is intractable since the intensity function depends on the whole history of the realization. Recently Kass, Ventura, and Brown (2005) proposed a new spike train model simple enough to be estimated from data and yet sufficiently powerful to cope with processes more general than renewal processes. This work by Kass et al. was extended by Truccolo, Eden, Fellows, Donoghue, and Brown (2005) to allow more general dependencies. Still, these advances depend on the availability of multiple realizations (e.g., spike trains from several trials) or spike trains of many seconds, and they provide no tools to either the practitioner or the theoretician on how to analyze single realizations of multiple spike trains.

Instead, we submit that a systematic description of the theory behind single realizations of multiple spike train analysis generalizing the methods of cross-correlation (Perkel, Gerstein, & Moore, 1967) is still needed and will enable the development of new operators for spike trains capable of transcending the results obtained with current techniques. Indeed, applying cross-correlation to spike timings is not straightforward and is the reason that, traditionally, it is applied to “binned” data. But most important, binning is related to instantaneous firing rate estimation, and thus cross-correlation of binned spike trains cannot account for deviations from the Poisson point process model. The caveats associated with binned spike trains, in particular for temporal coding, motivated the development of methodologies directly involving spike times. This is noticeable in several spike train measures (Victor & Purpura, 1997; van Rossum, 2001; Schreiber, Fellous, Whitmer, Tiesinga, & Sejnowski, 2003) and recent attempts to use kernels to estimate and generalize these distances (Schrauwen & Campenhout, 2007). Yet in spite of the fact that distances are very useful in classification and pattern analysis, they do not provide a suitable foundation to carry out and develop spike train signal processing algorithms.^{1}

In this letter, a reproducing kernel Hilbert space (RKHS) framework for spike trains is introduced with two key advantages: (1) mapping spike trains to the RKHS allows the study of spike trains as continuous-time random functionals, thereby bypassing the limitations that lead to the use of binning, and (2) these functionals incorporate a statistical description of spike trains. In this space, a number of different signal processing algorithms to filter, eigendecompose, classify, or cluster spike trains can then be developed using the linearity of the space and its inner product. Unlike approaches based on discrete representations of spike trains (such as binning) in which the dimensionality of the space becomes a problem, the dimensionality of the space in the RKHS framework is naturally dealt with through the inner product.

For continuous and discrete random processes, RKHS theory has already been proven essential in a number of applications, such as statistical signal processing (Parzen, 1959) and detection (Kailath, 1971; Kailath & Duttweiler, 1972), as well as statistical learning theory (Schölkopf, Burges, & Smola, 1999; Vapnik, 1995; Wahba, 1990). Indeed, as Parzen (1959) showed, several statistical signal processing algorithms can be stated and solved easily as optimization problems in the RKHS. Although frequently overlooked, RKHS theory is perhaps an even more pivotal concept in machine learning (Schölkopf et al., 1999; Vapnik, 1995) because it is the reason for the famed kernel trick that allows the otherwise seemingly impossible task of deriving and applying these algorithms.

In the following, we introduce a number of inner products for spike trains illustrating the generality of this methodology. We follow a systematic approach that builds the RKHS from the ground up based on the intensity functions of spike trains and basic requirements for the construction of an RKHS. As a result, we obtain a general and mathematically precise methodology that can be easily interpreted intuitively. Then these inner products are studied in detail to show that they have the necessary properties for spike train signal processing, and we propose how they can be estimated. Moreover, we discuss the RKHS and congruent spaces associated with the simplest of these inner product forms, adding to our understanding.^{2} We then build on this knowledge by showing that previous work in spike train measures arises naturally and effortlessly in one of the constructed RKHSs and is elegantly unified in this framework. Finally, we demonstrate a practical application of the RKHS by showing how clustering of spike trains can be easily achieved using a spike train kernel.

## 2. Inner Product for Spike Times

Denote the *m*th spike time in a spike train indexed by as , with *m* ∈ {1, 2, …, *N _{i}*} and

*N*the number of spike times in the spike train. To simplify the notation, the explicit reference to the spike train index will be omitted if it is irrelevant or obvious from the context.

_{i}The simplest inner product that can be defined for spike trains operates with only two spike times at a time, as observed by Carnell and Richardson (2005). In the general case, such an inner product can be defined in terms of a kernel function defined on into the reals, with the interval of spike times. Let κ denote such a kernel. Conceptually this kernel operates in the same way as the kernels operating on data samples in machine learning (Schölkopf et al., 1999) and information-theoretic learning (Príncipe, Xu, & Fisher, 2000). Although it operates with only two spike times, it will play a major role whenever we operate with complete realizations of spike trains. Indeed, the estimator for one of the spike train kernels defined next relies on this simple kernel as an elementary operation for computation or composite operations.

_{m}is the element in corresponding to

*t*(that is, the transformed spike time).

_{m}*t*,

_{m}*t*) = κ(

_{n}*t*−

_{m}*t*).

_{n}For any symmetric, shift-invariant, and positive-definite kernel, it is known that κ(0) ≥ |κ(θ)|.^{3} This is important in establishing κ as a similarity measure between spike times. As usual, an inner product should intuitively measure some form of interdependence between spike times. However, the conditions posed do not restrict this study to a single kernel. On the contrary, any kernel satisfying the above requirements is theoretically valid and understood under the framework proposed here, although the practical results may vary.

*p*≤ 2. Some well-known kernels, such as the widely used gaussian and Laplacian kernel, are special cases of this family for

*p*= 2 and

*p*= 1, respectively.

_{m}and Φ

_{n}, is given by Put differently, from the geometry of the transformed spike times, the kernel function is proportional to the cosine of the angle between two points in this space. Because the kernel is nonnegative, the maximum angle is π/2, which restricts the manifold of transformed spike times to a small area of the surface of the sphere. With the kernel inducing the above metric, the manifold of the transformed points forms a Riemannian space. This space is not a linear space. Its span, however, is obviously a linear space. In fact, it equals the RKHS associated with the kernel. Computing with the transformed points will almost surely yield points outside the manifold of transformed spike times. This means that such points cannot be mapped back to the input space directly. This restriction, however, is generally not a problem since most applications deal exclusively with the projections of points in the space, and if a representation in the input space is desired, it may be obtained from the projection to the manifold of transformed input points.

*spiking pattern matching*(SPM) kernel.

## 3. Inner Products for Spike Trains

At the end of the previous section, we briefly illustrated in the SPM kernel how inner products for spike trains can be built from kernels for spike times as traditionally done in machine learning. Obviously many other spike train kernels that operate directly from data characteristics could be defined for diverse applications in a similar manner. However, in doing so, it is often unclear what the statistical structure embodied or point process model assumed by the kernel is.

Rather than doing this directly, in this section, we first define general inner products for spike trains from the intensity functions, which are fundamental statistical descriptors of the point processes. This bottom-up construction of the kernels for spike trains is unlike the approach taken in the previous section and is rarely taken in machine learning, but it provides direct access to the properties of the kernels defined and the RKHS they induce. In other words, in the methodology presented in this section, we focus on the inner product as a statistical descriptor and only then derive the corresponding estimators from data.

A spike train can be interpreted as a realization of an underlying stochastic point process (Snyder, 1975). In general, to completely characterize a point process, the conditional intensity function λ(*t* ∣ *H _{t}*) is needed, where denotes the time coordinate and

*H*is the history of the process up to time

_{t}*t*. Notice that to be a well-defined function of time, λ(

*t*∣

*H*) requires a realization (so that

_{t}*H*can be established), as always occurs when dealing with spike trains. This shall be implicitly assumed henceforth.

_{t}*t*

_{*}is the spike time immediately preceding

*t*. This restricted form gives rise to inhomogeneous Markov interval (IMI) processes (Kass & Ventura, 2001). In this way, it is possible to estimate the intensity functions from spike trains and then utilize the above inner product definition to operate with them. This view is very interesting to the analysis of spike trains, but since we aim to compare the general principles presented to more typical approaches, it will not be pursued in this letter.

*H*, which yields an intensity function solely depending on time, that is, This expression is a direct consequence of the general limit theorem for point processes (Snyder, 1975) and is the reason that, for example, the combined set of spike trains corresponding to multiple trials is quite well modeled as a Poisson process (Kass & Ventura, 2001). An alternate perspective is to directly assume Poisson processes to be a reasonable model for spike trains. The difference between the two perspectives is that in the second case, the intensity functions can be estimated from single realizations in a plausible manner. In any case, the kernel becomes simply

_{t}Starting from the most general definition of inner product, we have proposed several kernels from constrained forms of conditional intensity functions for use in applications. One can think that the definition of equation 3.2 gives rise to a family of cross-intensity kernels defined explicitly as an inner product, as is important for signal processing. Specific kernels are obtained from equation 3.2 by imposing some particular form on how to account for the dependence on the history of the process or allowing a nonlinear coupling between spike trains. Two fundamental advantages of this construction methodology are: (1) it is possible to obtain a continuous functional space where no binning is necessary and (2) the generality of the approach allows inner products to be crafted to fit a particular problem one is trying to solve.

The kernels defined so far in this section are linear operators in the space spanned by the intensity functions and are the ones that relate the most with existing analysis methods for spike trains. However, kernels between spike trains can be made nonlinear by introducing a nonlinear weighting between the intensity functions in the inner product. With this approach, additional information can be extracted from the data since the nonlinearity implicitly incorporates in the measurement higher-order couplings between the intensity functions. This is of special importance for the study of doubly stochastic point processes, as some theories of the brain function have put forward (Lowen & Teich, 2005).

From the suggested definitions, the memoryless cross-intensity (mCI) kernel given in equation 3.5 clearly adopts the simplest form since the influence of the history of the process is neglected by the kernel. This simple kernel defines an RKHS that is equivalent to cross-correlation analysis so widespread in spike train analysis (Paiva, Park, & Príncipe, 2008), but this derivation clearly shows that it is the simplest of the cases. Still, it fits the goal of this letter as an example of the RKHS framework since it provides an interestingly broad perspective to several other works presented in the literature and suggests how methods can be reformulated to operate directly with spike trains, as we show next.

## 4. Analysis of Cross-Intensity Kernels

### 4.1. Properties.

In this section we present some relevant properties of the CI kernels defined in the general form of equation 3.2. In addition to the knowledge they provide, they are necessary for establishing that the CI kernels are well defined, induce an RKHS, and aid in the understanding of the following sections.

**Property 1**. CI kernels are symmetric, nonnegative, and linear operators in the space of the intensity functions.

Because the CI kernels operate on elements of and correspond to the usual dot product from *L*_{2}, this property is a direct consequence of the properties inherited. More specifically, this property guarantees that the CI kernels are valid inner products.

The proof is given in the appendix. Through the work of Moore (1916) and due to the Moore-Aronszajn theorem (Aronszajn, 1950), the following two properties result as corollaries of property 2:

**Property 4**. There exists a Hilbert space for which a CI kernel is a reproducing kernel.

Actually, property 3 can be obtained explicitly by verifying that the inequality of equation 4.1 is implied by equations A.1 and A.2 in the proof of property 2 (see the appendix).

Properties 2 through 4 are equivalent in the sense that any of these properties implies the other two. In our case, property 2 is used to establish the other two. The most important consequence of these properties, explicitly stated through property 4, is that a CI kernel induces a unique RKHS, denoted in general by . In the particular case of the mCI kernel, the RKHS is denoted .

Properties 2 through 5 can also be easily proved for the nonlinear CI kernels. For the definition in equation 3.6, the results in Berg et al. (1984), can be used to establish that the norm is symmetric negative definite and consequently that is a symmetric and positive-definite kernel, thus proving property 3. Properties 2, 4, and 5 follow as corollaries. Similarly, for the definition in equation 3.7, the proof of the properties follows the same route as for the general linear CI kernel using the linearity of the RKHS associated with the scalar kernel .

### 4.2. Estimation.

*n*th discrete-time instant,

*g*'s are general transformations of independent functions ν

_{m}_{m}(·), θ

_{m}'s are the parameter of the GLM, and

*q*is the number of parameters. Thus, GLM estimation can be used under a Poisson distribution with a log link function. The terms are called the predictor variables in the GLM framework, and if one considers the conditional intensity to depend only linearly on the spiking history, then the

*g*'s can be simply delays. In general, the intensity can depend nonlinearly on the history or an external factor such as stimuli. Based on the estimated conditional intensity function, any of the inner products introduced in section 3 can be evaluated numerically.

_{m}Although quite general, the approach by Truccolo et al. (2005) has a main drawback: since *q* must be larger than the average interspike interval, a large number of parameters need to be estimated, thus requiring long spike trains (more than 10 seconds). Notice that estimation of the conditional intensity function without greatly sacrificing the temporal precision requires small bins, which means that *q*, and therefore the duration of the spike train used for estimation, must be increased.

In the case of the mCI kernel, defined in equation 3.5, a much simpler estimator can be derived. We now focus on this case. Since we are interested in estimating the mCI kernel from single-trial spike trains, for this kernel and for the reasons presented before, it is assumed henceforth that spike trains are realizations of Poisson processes. Then, using kernel smoothing (Dayan & Abbott, 2001; Reiss, 1993; Richmond et al., 1990) for the estimation of the intensity function, we can derive an estimator for the kernel. The advantage of this route is that a statistical interpretation is available while simultaneously approaching the problem from a practical point of view. Moreover, in this case, the connection between the mCI kernel and κ will now become obvious.

*s*comprising spike times , the estimated intensity function is where

_{i}*h*is the smoothing function. This function must be nonnegative and integrate to one over the real line (just like a probability density function, PDF). Commonly used smoothing functions are the gaussian, Laplacian, and α-function, among others.

From a filtering perspective, equation 4.4 can be seen as a linear convolution between the filter impulse response given by *h*(*t*) and the spike train written as a sum of Dirac functionals centered at the spike times. In particular, binning is nothing but a special case of this procedure, in which *h* is a rectangular window and the spike times are first quantized according to the width of the rectangular window (Dayan & Abbott, 2001). Moreover, it is interesting to observe that the intensity estimation as shown above is directly related to the problem of PDF estimation with Parzen windows (Parzen, 1962) except for a normalization term, a connection made clear by Diggle and Marron (1988).

*h*with itself. A well-known example for

*h*is the gaussian function in which case κ is also the gaussian function (with σ scaled by ). Another example for

*h*is the one-sided exponential function, which yields κ as the Laplacian kernel. In general, if a kernel is selected first and

*h*is assumed to be symmetric, then κ equals the autocorrelation of

*h*, and thus

*h*can be found by evaluating the inverse Fourier transform of the square root of the Fourier transform of κ.

The accuracy of this estimator depends only on the accuracy of the estimated intensity functions. If enough data are available such that the estimation of the intensity functions can be made exact, then the mCI kernel estimation error is zero. Despite this direct dependency, the estimator effectively bypasses the estimation of the intensity functions and operates directly on the spike times of the whole realization without loss of resolution and in a computationally efficient manner since it takes advantage of the typically sparse occurrence of events.

As equation 4.5 shows, if κ is chosen such that it satisfies the requirements in section 2, then the mCI kernel ultimately corresponds to a linear combination of κ operating on all pairwise spike time differences, one pair of spike times at a time. In other words, the mCI kernel is a linear combination of the pairwise inner products between spike times of the spike trains. Put this way, we can now clearly see how the mCI inner product estimator builds on the inner product on spike times presented in section 2, denoted by κ.

## 5. RKHS Induced by the Memoryless Cross-Intensity Kernel and Congruent Spaces

Some considerations about the RKHS space induced by the mCI kernel and congruent spaces are made in this section. The relationship between and its congruent spaces provides alternative perspectives and a better understanding of the mCI kernel. Figure 1 provides a diagram of the relationships among the various spaces discussed next.

## 5.1. Space Spanned by Intensity Functions.

In the introduction of the mCI kernel, the usual dot product in , the space of square integrable intensity functions defined on , was utilized. The definition of the inner product in this space provides an intuitive understanding to the reasoning involved. is clearly a Hilbert space with inner product defined in equation 3.5 and is obtained from the span of all intensity functions. Notice that this space also contains functions that are not valid intensity functions resulting from the linear span of the space (intensity functions are always nonnegative). However, since our interest is mainly on the evaluation of the inner product, this is of no consequence. The main limitation is that is not an RKHS. This should be clear because elements in this space are functions defined on , whereas elements in the RKHS must be functions defined on .

## 5.2. Induced RKHS.

*I*(

*s*, ·). In other words, given any spike train , this spike train is mapped to , given explicitly (although unknown in closed form) as . Then equation 5.3 can be restated in the more usual form as It must be remarked that is in fact a functional space. More specifically, points in are functions of spike trains; that is, they are functions defined on . This is a key difference between the space of intensity functions explained before and the RKHS , in that the latter allows statistics of the transformed spike trains to be estimated as functions of spike trains.

_{i}## 5.3. Memoryless CI Kernel and the RKHS Induced by κ.

*E*

_{}{Φ

^{i}

_{}},

*E*

_{}{Φ

^{i}

_{}} denotes the expectation of the transformed spike times and are the expected number of spikes for spike trains

*s*and

_{i}*s*, respectively.

_{j}*s*is (Mardia & Jupp, 2000) So the norm of the mean transformed spike times is inversely proportional to the variance of the elements in . This means that the inner product between two spike trains depends also on the dispersion of these average points. This fact is important because data reduction techniques, for example, heavily rely on optimization with the data variance. For instance, kernel principal component analysis (Schölkopf, Smola, & Müller, 1998) directly maximizes the variance expressed by equation 5.8 (Paiva, Xu, & Príncipe, 2006).

_{i}## 5.4. Memoryless CI Kernel as a Covariance Kernel.

In section 4.1 it was shown that the mCI kernel is indeed a symmetric positive-definite kernel. Parzen (1959) showed that any symmetric and positive-definite kernel is also a covariance function of gaussian distributed random processes defined in the original space of the kernel, and the two spaces are congruent (see Wahba, 1990, for a review). In the case of the spike train kernels defined here, this means the random processes are indexed by spike trains on . This is an important result as it sets up a correspondence between the inner product due to a kernel in the RKHS to our intuitive understanding of the covariance function and associated linear statistics. Simply put, due to the congruence between the two spaces, an algorithm can be derived and interpreted in any of the spaces.

*X*denote this random process. Then for any ,

*X*(

*s*) is a random variable on a probability space with measure

_{i}*P*. As proved by Parzen, this random process is gaussian distributed with zero mean and covariance function: Notice that the expectation is over ω ∈ Ω since

*X*(

*s*) is a random variable defined on Ω, a situation that can be written explicitly as

_{i}*X*(

*s*, ω), , ω ∈ Ω. This means that

_{i}*X*is actually a doubly stochastic random process. An intriguing perspective is that for any given ω,

*X*(

*s*, ω) corresponds to an ordered and almost surely nonuniform random sampling of

_{i}*X*(·, ω). The space spanned by these random variables is since

*X*is obviously square integrable (that is,

*X*has finite covariance).

The RKHS induced by the mCI kernel and the space of random functions are congruent. This fact is obvious since there is clearly a congruence mapping between the two spaces. In the light of this theory, we can henceforth reason about the mCI kernel also as a covariance function of random variables directly dependent on the spike trains, with well-defined statistical properties. Allied to our familiarity and intuitive knowledge of the use of covariance (which is nothing but cross-correlation between centered random variables), this concept can be of great importance in the design of optimal learning algorithms that work with spike trains. This is because linear methods are known to be optimal for gaussian distributed random variables.

## 6. Spike Train Distances

The concept of distance is very useful in classification and analysis of data. Spike trains are no exception. The importance of distance can be observed from the attention it has received in the literature (Victor & Purpura, 1997; van Rossum, 2001; Victor, 2005). In this section, we show how the mCI kernel (or any of the other presented kernels, for that matter) could be used to easily define distances between spike trains in a rigorous manner. The aim of this section is not to propose any new distance but to highlight this natural connection and convey the generality of the RKHS framework by suggesting how several spike train distances can be formulated from basic principles as special cases.

### 6.1. Norm Distance.

*d*is a valid distance since, for any spike trains , it satisfies the three distance axioms:

_{ND}- •
Symmetry:

*d*(_{ND}*s*,_{i}*s*) =_{j}*d*(_{ND}*s*,_{j}*s*)_{i} - •
Positiveness:

*d*(_{ND}*s*,_{i}*s*) ≥ 0, with equality holding if and only if_{j}*s*=_{i}*s*_{j} - •
Triangle inequality:

*d*(_{ND}*s*,_{i}*s*) ≤_{j}*d*(_{ND}*s*,_{i}*s*) +_{k}*d*(_{ND}*s*,_{k}*s*)_{j}

This distance is basically a generalization of the idea behind the Euclidean distance in a continuous space of functions.

As already noted, this distance could be formulated directly and with the same result in . Then if one considers this situation with a causal decaying exponential function as the smoothing kernel, we immediately observe that *d _{ND}* corresponds, in this particular case, to the distance proposed by van Rossum (2001). If instead a rectangular smoothing function is used, the distance then resembles the distance proposed by Victor and Purpura (1997), as pointed by Schrauwen and Campenhout (2007), although its definition prevents an exact formulation in terms of the mCI kernel. Finally, when a gaussian kernel is used, the same distance used by Maass et al. (2002) is obtained. Although it had already been noted that other cost (i.e., kernel) functions between spike times could be used instead of those initially described (Schrauwen & Campenhout, 2007), the framework given here fully characterizes the class of valid kernels and explains their role in the time domain.

### 6.2. Cauchy-Schwarz Distance.

*L*

_{2}spaces, alternative measures for spike trains can be defined. In particular, based on the Cauchy-Schwarz inequality (property 5), we can define the Cauchy-Schwarz (CS) distance between two spike trains as From properties 1 and 5 of the mCI kernel, it follows that

*d*is symmetric and always positive, and thus verifies the first two axioms of distance. Since

_{CS}*d*is the angular distance between points, it also verifies the triangle inequality.

_{CS}The major difference between the normed distance and the CS distance is that the latter is not a Euclidean measure. Indeed, because it measures the angular distance between the spike trains, it is a Riemannian metric. This utilizes the same idea expressed in equation 2.5 in presenting the geodesic distance associated with any symmetric positive-definite kernel.

The Cauchy-Schwarz distance can be compared with the “correlation measure” between spike trains proposed by Schreiber et al. (2003). In fact, we can observe that the latter corresponds to the argument of the arc cosine and thus denotes the cosine of an angle between spike trains, with norm and inner product estimated using the gaussian kernel. Notice that the “correlation measure” of Schreiber et al. is only a premetric since it does not verify the triangle inequality. In *d _{CS}*, this is ensured by the arc cosine function.

## 7. Application Example: Clustering of Spike Trains

Having an RKHS framework for spike trains is important because it facilitates the development of new methods to operate with spike trains. Moreover, all of these methods are developed under the same principles provided by this general theory.

To exemplify the use of kernels proposed under the RKHS framework, we show how a clustering algorithm for spike trains can be obtained naturally from any of the spike train kernel definitions presented here (with results shown for the mCI and nCI kernels). Comparing these ideas with previous clustering algorithms for spike trains, we find that they result in simpler methods, derived in an integrated manner, with a clear understanding of the features being accounted for, and greater generality. Our emphasis here is not to propose another algorithm, but merely to illustrate the elegance and usefulness of the RKHS framework. For this reason, we dispense a comparison with other methods and a more thorough analysis of the performance of the algorithm at this time.

### 7.1. Algorithm.

For this example, we exemplify how spike train kernels defined in the RKHS framework provide the means for clustering spike trains. The algorithm is based on the ideas of spectral clustering. Spectral clustering is advantageous for the purpose of this example since the evaluation of the relation between spike trains and the actual clustering procedure is conceptually distinct. It should be possible to extend other established clustering algorithms, although one must introduce the inner product directly into the computation, which slightly complicates matters.

Spectral clustering of spike trains operates in two major steps. First, the affinity matrix of the spike trains is computed. Let {*s*_{1}, *s*_{2}, …, *s _{n}*} be the set of

*n*spike trains to be clustered into

*k*clusters. The affinity matrix is an

*n*×

*n*matrix describing the similarity among all pairs of spike trains. The second step of the algorithm is to use spectral clustering applied to this affinity matrix to find the actual clustering results. In particular, the spectral clustering algorithm proposed by Ng, Jordan, and Weiss (2001) was used for its simplicity and minimal use of parameters. (We refer the reader to Ng et al. for additional details on the spectral clustering algorithm.)

Clearly, the defining step for the use of this algorithm for clustering of spike trains is how to evaluate affinity among spike trains. Since inner products inherently quantify similarity, any of the kernels proposed could be used, in particular, the mCI and nCI kernels, for which we provide results. In this situation, the affinity matrix is simply the Gram matrix of the spike trains computed with the spike train kernel. The algorithm shown here is similar to the method by Paiva, Rao, Park, and Príncipe (2007) but is simpler, since no transformation is needed to map the distance evaluation to a similarity measurement and the need to adjust the corresponding parameter are avoided. Since distances are derived concepts and many times can be defined in terms of inner products, the approach taken is much more straightforward and principled. Moreover, the computation is reduced since the spike train kernels are simpler to evaluate than a distance. Most important, the algorithm can be generalized by using yet another spike train kernel.

### 7.2. Simulation.

The goal of this simulation example is to show the importance of spike train kernels that go beyond the first cross-moment (i.e., cross-correlation) between spike trains. For this reason, we applied the algorithm described in the previous section for clustering spike trains generated as homogeneous renewal point processes with a gamma interspike interval (ISI) distribution. This model was chosen since the Poisson process is a particular case and thus can be directly compared.

A three-cluster problem is considered, in which each cluster is defined by the ISI distribution of its spike trains (see Figure 2a). In other words, spike trains within the cluster were generated according to the same point process model. All spike trains were 1 s long and with a constant firing rate of 20 spikes per second. For each Monte Carlo run, 100 spike trains randomly assigned to one of the clusters were generated. The resulting statistics were estimated over 500 Monte Carlo runs. For both the mCI and nCI kernels, the gaussian function was used as a smoothing function, with these results for three values of the smoothing width: 2, 10, and 100 ms. In addition, the gaussian kernel was utilized for in the computation of the nCI kernel, with results for kernel sizes of σ = 1 and σ = 10.

The results of the simulation are shown in Figure 2c. The cluster with the shape parameter θ = 1 contained Poisson spike trains, and spike trains with shape parameter θ = 3 were more regular, whereas θ = 0.5 gave rise to more irregular (i.e., “bursty”) spike trains. The results with the mCI kernel are at most 1.4% better on average than random selection. This low performance is not entirely surprising since all spike trains have a constant firing rate. Using the nCI kernel with the larger smoothing width yielded an improvement of 14.7% for σ = 10 and 18% for σ = 1 on average. Smaller values of σ did not improve the clustering performance (σ = 0.1 resulted in the same performance as σ = 1), demonstrating that the selection of kernel size σ for the nCI kernel is not problematic. Most important, the results show that even though the formulation depends on only the memoryless intensity functions, in practice, the nonlinear kernel allows different spike train models to be discriminated. This improvement is due to the fact that enhances the slight differences in the estimated intensity functions due to the different point process model expressed in the spike trains (cf. Figure 2b).

## 8. Conclusion

The point process nature of spike trains has made the application of conventional signal processing methods to spike trains difficult and imprecise to apply (e.g., through binning) from first principles. The most powerful methodologies to spike train analysis are based on statistical approaches, but they face serious shortcomings with the widespread use of multielectrode array techniques since they are practical only using an independent assumption.

This letter presents a reproducing kernel Hilbert space formulation for the analysis of spike trains that has the potential to improve the set of algorithms that can be developed for spike train analysis of multielectrode array data. It presents the theory with sufficient detail to establish a solid foundation and, we hope, entice further work along this line of reasoning. Indeed, the letter's dual role is to elucidate the set of possibilities open by the RKHS formulation and to link a very special case of the theory to methods that are in common use in computational neuroscience. A lot of more work is needed to bring the possibilities opened by RKHS theory to fruition in spike train signal analysis.

At the theoretical level, we extend the early work of Parzen (1959) on stochastic processes to spike trains by defining from the bottom up the structure of the RKHS on the statistics of the point process, that is, its intensity function. This result provides a solid foundation for future work for both practical algorithm development and a simple way to bring more realistic assumptions about the statistics of spike trains into the analysis. Indeed, we show that the Poisson statistical model is behind the simplest definition of the RKHS (the memoryless cross-intensity kernel) and that this RKHS provides a linear space for doing signal processing in spike trains. However, the same framework can be applied to inhomogeneous Markov intervals of even more general point process models, which only now are beginning to be explored. We emphasize that building an RKHS from the bottom up is a much more principled approach than the conventional way that RKHS are derived in machine learning, where the link to data statistics is possible only at the level of the estimated quantities, not the statistical operators themselves.

The second theoretical contribution is to show the flexibility of the RKHS methodology. Indeed, it is possible to define alternate, and still unexplored, RKHSs for spike train analysis that are not linearly related to the intensity functions. Obviously this will provide many possible avenues for future research, and there is the hope that it will be possible to derive systematic approaches to tailor the RKHS definition to the goal of the data analysis. We basically see two types of RKHS that exactly mimic the two methodologies being developed in the machine learning and signal processing literatures: kernels that are data independent (κ) and kernels that are data dependent (CI kernels). Specifically, for point processes, we show in a specific case how the former may be used to compose the latter, but they work with the data in very different ways. What is interesting is that these two types of RKHS provide different features in the transformation to the space of functions. The former is a macroscopic descriptor of the spike time intervals that may be usable in a coarse analysis of the data. The latter is a functional descriptor of the data, but it is harder to compute. In computational neuroscience, only the latter is being pursued, but by analogy with the large impact of kernel methods in statistical learning, we foresee an equally important impact of the former in computational neuroscience. The theory and the operators we have presented in this letter will form the foundations for such future developments.

There are also practical implications of the RKHS methodology presented in this letter. Since the RKHS is a special vector space, all of the conventional signal-processing algorithms that involve inner product computations can be immediately implemented in the RKHS. We illustrate this with a spectral clustering application, but many other applications are possible, ranging from filtering to eigendecompositions of spike trains in the RKHS (Paiva, Park, & Príncipe, forthcoming). The spectral clustering algorithm shown could also be derived using common distance measures that have been defined for spike trains, as has been done before (Paiva et al., 2007). But we stress the elegance of the proposed formulation that first defines the structure of the space (the inner product) and then leaves to users the design of their intended algorithm, unlike other approaches presented that are specific to the application. It is also important to stress the computational savings for spike timing analysis provided by our RKHS methodology, which has a complexity independent of data sampling rates but depends on only the spike rates.

There are still many other topics that need to be fully researched for systematic use of the technique. Perhaps the most important one for practical applications is the kernel size parameter of the kernel function. The theory shows clearly the role of this free parameter: it sets the scale of the transformation by changing the inner product. So it provides flexibility to the researcher, but also suggests the need to find tools to help set this parameter according to the data and the analysis goal.

### Appendix: Proofs

The proofs for properties 2 and 5 given in section 4.1 are presented.

**Proof of Property 2**. The symmetry of the matrix results immediately from property 1. By definition, a matrix is nonnegative definite if and only if , for any

**a**

^{T}= [

*a*

_{1}, …,

*a*] with . So we have that which, making use of the general definition for CI kernels (see equation 3.2), yields,

_{n}## Acknowledgments

We thank Emory N. Brown for clarifying the estimation of the conditional intensity function using GLM. A.R.C. P. was supported by Fundação para a Ciência e a Tecnologia under grant SFRH/BD/18217/2004. This work was partially supported by NSF grants ECS-0422718 and CISE-0541241.

## Notes

^{1}

Distances define Banach spaces, but for signal processing, a Hilbert space (which automatically induces a Banach space) is needed.

^{2}

Two spaces are said to be congruent if there exists an isometric isomorphism, that is, a one-to-one inner-product-preserving mapping, between the two spaces. This mapping is called a *congruence*.

^{3}

This is a direct consequence of the fact that symmetric positive-definite kernels denote inner products that obey the Cauchy-Schwarz inequality.