## Abstract

Modeling and interpreting spike train data is a task of central importance in computational neuroscience, with significant translational implications. Two popular classes of data-driven models for this task are autoregressive point-process generalized linear models (PPGLM) and latent state-space models (SSM) with point-process observations. In this letter, we derive a mathematical connection between these two classes of models. By introducing an auxiliary history process, we represent exactly a PPGLM in terms of a latent, infinite-dimensional dynamical system, which can then be mapped onto an SSM by basis function projections and moment closure. This representation provides a new perspective on widely used methods for modeling spike data and also suggests novel algorithmic approaches to fitting such models. We illustrate our results on a phasic bursting neuron model, showing that our proposed approach provides an accurate and efficient way to capture neural dynamics.

## 1 Introduction

Connecting single-neuron spiking to the collective dynamics that emerge in neural populations remains a central challenge in systems neuroscience. As well as representing a major barrier in our understanding of fundamental neural function, this challenge has recently acquired new salience due to the rapid improvements in technologies that can measure neural population activity in vitro and in vivo at unprecedented temporal and spatial resolution (Jun et al., 2017; Maccione et al., 2014). Such technologies hold immense promise in elucidating both normal neural functioning and the etiology of many diseases, yet the high dimensionality and complexity of the resulting data pose formidable statistical challenges. In response to these needs, recent years have seen considerable efforts to develop strategies for extracting and modeling information from large-scale spiking neural recordings. Two of the most successful strategies that emerged in the past decade are latent state-space models (SSMs) and autoregressive point-process generalized linear models (PPGLMs).

Latent state-space models describe neural spiking as arising from the unobserved latent dynamics of an auxiliary intensity field, which can model both internal and external factors contributing to the dynamics (Macke, Buesing, & Sahani, 2015; Sussillo, Jozefowicz, Abbott, & Pandarinath, 2016; Zhao & Park, 2016; Yu et al., 2009; Smith & Brown, 2003). Mathematically, such models generally take the form of a Cox process (Kingman, 1993), where the intensity field obeys some (discrete or continuous time) evolution equations. This representation therefore recasts the analysis of spike trains within a well-established line of research in statistical signal processing, leveraging both classical tools and more recent developments (Smith & Brown, 2003; Wu, Roy, Keeley, & Pillow, 2017; Gao, Archer, Paninski, & Cunningham, 2016; Pfau, Pnevmatikakis, & Paninski, 2013; Zhao & Park, 2017; Surace, Kutschireiter, & Pfister, 2017). These models have been used in a variety of tasks, such as describing population spiking activity in the motor system (Aghagolzadeh & Truccolo, 2014, 2016; Churchland et al., 2012; Michaels, Dann, Intveld, & Scherberger, 2017). However, while such models can certainly lead to biological insights, latent state-space models remain phenomenological: the recurrent spiking activity itself does not implement the latent state-space dynamics (see Figure 2B).

Autoregressive PPGLMs (see Figure 2A) treat spiking events from neurons as point events arising from a latent inhomogeneous Poisson process (Truccolo, Eden, Fellows, Donoghue, & Brown, 2005; Truccolo, Hochberg, & Donoghue, 2010; Truccolo, 2010, 2016). To fit such models, generalized linear model (GLM) regression is used to map observed spiking events to both extrinsic variables, like stimuli or motor output, and intrinsic spiking history (Truccolo et al., 2005). PPGLMs are especially useful for statistical tests on sources of variability in neural spiking (Rule, Vargas-Irwin, Donoghue, & Truccolo, 2015, 2017) and benefit from a simple fitting procedure that can often be solved by convex optimization. However, they may require careful regularization to avoid instability (Hocker & Park, 2017; Gerhard, Deger, & Truccolo, 2017) and can fail to generalize outside regimes in which they were trained (Weber & Pillow, 2017). Importantly, PPGLMs suffer from confounds if there are unobserved sources of neural variability (Lawhern, Wu, Hatsopoulos, & Paninski, 2010). This is especially apparent when the recorded neural population is a small subsample of the population, and latent state-space models can be more accurate in decoding applications (Aghagolzadeh & Truccolo, 2016).

## 2 Results

We start by recapitulating some basic notations and definitions from both PPGLMs and SSMs. We then provide a detailed derivation of the mathematical connection between the two frameworks, highlighting all the approximations we make in the process. Finally, we illustrate the performance of the method in an application case study.

### 2.1 Point-Process Generalized Linear Models

### 2.2 Latent State-Space Point-Process Models

An alternative strategy for capturing slow dynamics in neural spike trains is to postulate a slow, latent dynamical system responsible for bursting and postburst inhibition. This approach is taken by latent state-space models (SSMs), which are often viewed as functionally distinct from autoregressive PPGLMs (see Figure 2B).

### 2.3 The Auxiliary History Process of a PPGLM

The (possibly infinite) history dependence makes autoregressive PPGLMs non-Markovian dynamical systems (Truccolo, 2016, equation 6). However, a crucial insight is that since the dependence on history is linear, we can reinterpret the history dependence as a linear filter in time and approximate its effect on the conditional intensity using a low-dimensional linear dynamical system. In some PPGLM formulations, the history basis may already be defined in terms of a linear filter impulse response—for example, the exponential filters considered in Toyoizumi, Rad, and Paninski (2009). In this case, the history process is already Markovian. As the Markovian case has been illustrated elsewhere, we show here how to convert the general non-Markovian point process into a Markovian one.

### 2.4 A Continuous Approximation

### 2.5 Gaussian Moment Closure

It is worth reflecting more on this analogy and on the limitations of the moment-closure representation. Spiking events are dramatic all-or-nothing events that cannot be approximated by a continuous stochastic process. Accordingly, one would expect the finite-dimensional moment closure system to fail to capture rapid fluctuations. However, for slow timescales, this gaussian approximation can be accurate even for a single neuron. In contrast to the neural field interpretation, which averages over a large population at each time instant, one can average over an extended time window and arrive at an approximation for slow timescales (see Figure 3). A pictorial description of the relationship of the proposed moment closure approach to PPGLMs, SSMs, and neural field models is summarized in Figure 2.

### 2.6 Case Study: The Izhikevich Neuron Model

We consider the effectiveness of this approach on the case study of a PPGLM emulation of the Izhikevich neuron model, considered in Weber and Pillow (2017). We compare the accuracy of the gaussian moment closure with a mean-field approach (see appendix B). Figure 3 illustrates moment closure of a phasic-bursting Izhikevich neuron emulated with a PPGLM (see Figure 1). By averaging over the history process, slow timescales in the autoregressive point process are captured in the gaussian moment closure. Unlike a mean-field model, which considers the large-population limit of weakly coupled neurons, moment closure is able to capture the influence of Poisson variability on the dynamics in a single neuron.

Additionally, mean field considers only a single path in the process history, whereas moment closure provides an approximation for a distribution over paths, with fluctuations and autocorrelations taken into account. This has the benefit that the moment-closure system is sensitive to the combined effects of self-excitation and point-process fluctuations and captures, for example, the self-excitation during a burst using the second-order, second-moment terms. This reveals another benefit of the moment-closure approach: runaway self-excitation (Hocker & Park, 2017; Gerhard et al., 2017; Weber & Pillow, 2017) is detected in the moment closure as a divergence of the mean or second-moment terms. This self-excitation, however, introduces some numerical challenges.

. | Normalized RMSE . | |||
---|---|---|---|---|

. | Langevin . | MF/LNA . | Gaussian MC . | Second Order . |

Log mean rate | 0.30 | 0.36 | 0.34 | 0.31 |

Mean log rate | 0.33 | 0.93 | 0.42 | 0.47 |

Log rate $\sigma $ | 0.41 | 0.90 | 0.86 | 0.53 |

. | Normalized RMSE . | |||
---|---|---|---|---|

. | Langevin . | MF/LNA . | Gaussian MC . | Second Order . |

Log mean rate | 0.30 | 0.36 | 0.34 | 0.31 |

Mean log rate | 0.33 | 0.93 | 0.42 | 0.47 |

Log rate $\sigma $ | 0.41 | 0.90 | 0.86 | 0.53 |

Notes: Accuracy of various methods compared to Monte Carlo sampling (10,000 samples) of the autoregressive PPGLM model for a phasic bursting neuron, simulated at $dt=1$ ms. Error is reported as root-mean-squared-error (RMSE) divided by the standard deviation of the relevant signal. The mean rate is captured with similar accuracy for all methods (first row, error computed on the logarithm of the mean rate), with the Langevin and second-order moment equations being the best. The average log rate is captured best by gaussian moment closure (Gaussian MC), whereas the log mean rate and standard deviation of the log rate are handled more accurately by the second-order moment equations. Errors were computed using a stimulus with a baseline inhibitory current of $-$0.5 pA and with of 49 current pulses, amplitudes ranging from 0.5 to 1.0 pA and durations ranging from 50 to 500 ms, with added OU noise with a steady-state variance of 1 $pA2$ and time-constant of 100 ms.

. | Log Likelihood . | |||
---|---|---|---|---|

. | GLM . | MF/LNA . | Gaussian MC . | Second Order . |

l.l. (bits/ms) | 0.055 | 0.024 | 0.013 | 0.020 |

Penalty (bits/ms) | - | 0.031 | 0.042 | 0.034 |

Relative penalty | - | 1.09 | 1.51 | 1.23 |

. | Log Likelihood . | |||
---|---|---|---|---|

. | GLM . | MF/LNA . | Gaussian MC . | Second Order . |

l.l. (bits/ms) | 0.055 | 0.024 | 0.013 | 0.020 |

Penalty (bits/ms) | - | 0.031 | 0.042 | 0.034 |

Relative penalty | - | 1.09 | 1.51 | 1.23 |

Notes: Here, we use Bayesian filtering to estimate the log likelihood of a PPGLM model (reported as normalized log-likelihood “l.l.” in bits/ms, first row; Ramirez & Paninski, 2014). The model is originally fit using maximum-likelihood GLM regression (first column), which neglects the dynamical effects of fluctuations. All three moment approximations broadly agree on a likelihood penalty (0.031–0.042 bits/sample), the relative size of which (last row; Fernandes et al., 2013) is on the same order as the distance between the GLM estimated likelihood and the theoretical maximum “saturated” normalized log likelihood of 0.083 bits/ms.

This second-order gaussian moment closure is not only more accurate in the second moment (see Figure 3D and Table 1), it is also less stiff and easier to integrate. This highlights a major benefit of the moment-closure approach: numerical issues that prove difficult or intractable in the original GLM representation can be more readily addressed in a state-space model. Importantly, the basis-projected moment-closure system (see appendix B) is an ordinary differential equation with a form reminiscent of nonlinear (extended) continuous-time Kalman-Bucy filtering (Kalman & Bucy, 1961) and can be viewed as a state-space model (see equations 2.4 and 2.5) explaining the observed point process. This highlights that the moment-closure state-space equations allow tools for reasoning about the stability of ordinary differential equations to be applied to PPGLMs.

## 3 Discussion

In this letter, we have introduced a mathematical connection between PPGLMs and SSMs that provides an explicit, constructive procedure to fit neural spike train data. Autoregressive point processes and state-space models have been combined before (Zhao & Park, 2016; Smith & Brown, 2003; Lawhern et al., 2010; Eden, Frank, Barbieri, Solo, & Brown, 2004), but so far always in a manner that treats the latent state-space as an extrinsic driver of neural activity. Importantly, the generative, dynamical effects of point-process fluctuations and process autocorrelations are not directly addressed in previous approaches. Additionally, although PPGLMs can condition on population spiking history during training, this conditioning addresses only a single sample path of the process history and does not reflect a recurrent dynamical model in which spiking outputs and their fluctuations lead to the emergence of collective dynamics. The moment-closure approach outlined here can be used to incorporate autohistory effects into a latent state-space model, where conditioning on the process history is replaced by a Bayesian filtering update that updates the moments of the history process at each time step.

Our results highlight the capacity of PPGLMs to implicitly learn hidden causes of neural firing through the autoregressive history filter. For example, a collective network mode at 20 Hz may induce spiking rhythmicity that is detected in the point-process history filter, even if isolated neurons do not exhibit this oscillation. The history dependence of autoregressive PPGLMs defines a latent variable process that captures both process autohistory effects and the influence of unobserved hidden modes or inputs. The moments of the population spiking history can be identified with the latent variables explaining population spiking. This interpretation replaces pairwise coupling in a neural population in the PPGLM formulation with a coupling of single neurons to a shared latent-variable history process.

The identification of latent states with moments of the population history opens up a new interpretation connecting both PPGLMs and latent state-space models to neural field models, a rich area of research in theoretical neuroscience (Amari, 1975, 1977, 1983; Wilson et al., 1972). It suggests that under some conditions, neural field models may be interpreted as latent variable models and fit using modern techniques for latent state-space models. Conversely, this new connection illustrates that some latent-state space models may be viewed not only as modeling hidden causes of spiking, but also as capturing statistical moments of the population that are relevant for neural dynamics in a neural field sense. The precise convergence of a moment closure PPGLM to a neural field model remains to be better explored mathematically. Convergence of moment closure approximations has been studied extensively in the area of stochastic chemical reactions (Schnoerr, Sanguinetti, & Grima, 2014; Schnoerr et al., 2017). Indeed, our approach was partly inspired by recent work on chemical reaction-diffusion systems (Schnoerr, Grima, & Sanguinetti, 2016), in which pairwise interactions between pointwise agents in space are replaced by a coupling of single agents to a statistical field. In contrast to chemical reaction systems, however, we model self-interactions of a point process over time and also capture the effects of fluctuations on the system.

Here, we employed a gaussian moment closure, but other distributional assumptions may lead to even more accurate state-space formulations. Gaussian moment closure neglects the contribution of higher moments (e.g., skewness) that may arise owing to the influence of Poisson (spiking) noise. More generally, the expected rate (see equation 2.11) is connected to the expected log-likelihood integral for PPGLMs (Park & Pillow, 2011; Ramirez & Paninski, 2014). Other distributional assumptions can be used to compute this integral by matching the first two moments provided from the gaussian moment-closure system. Alternatively, moment equations could also be derived using an alternative parameterization of the history process, for example, log-gaussian.

There are two major benefits of the moment-closure representation of PPGLMs. First, autoregressive time dependencies are converted to a low-dimensional system of ordinary differential equations, reinterpreting the PPGLM as a dynamical latent-state-space model of a similar form as phenomenological latent-dynamics models and population neural field models. Second, moment-closure equations open up new strategies for estimating PPGLMs. A major challenge to fitting PPGLMs to large populations is estimating a large number of pairwise interactions. Our work suggests a different avenue toward estimating such large models: a low-dimensional latent-variable stochastic process with a suitable nonlinearity, and Poisson noise can be interpreted as a process governing the moments of an PPGLM model. This allows the extensive methodological advancements toward identifying low-dimensional state-space models to be applied to autoregressive point processes. For example, efficient estimation methods exist for recurrent linear models (Pachitariu, Petreska, & Sahani, 2013), which can be thought of as a discrete-time, mean-field limit of the system derived here. The interaction of the moment-closure formulation with recent advances for estimating PPGLMs (Sahani, Bohner, & Meyer, 2016; Ramirez & Paninski, 2014) also remains to be explored. In addition to establishing a formal connection to the stochastic process defined by PPGLMs, moment closure provides a second-order model of fluctuations and correlations. This could be especially useful in systems in which the spiking output, and fluctuations therein, influences the population dynamics.

Another challenge in estimating PPGLMs is ensuring that the fitted model accurately captures dynamics (Hocker & Park, 2017; Gerhard et al., 2017). The moment-closure equations outlined here allow process moments to be estimated, along with model likelihood, using Bayesian filtering. In addition to filtering over a distribution of paths in the process history, filtering can also average over models and thus implicitly capture both fluctuation effects and model uncertainty. However, it remains the subject of future work to apply the moment-closure approach in inference. Other methods, such as particle filtering, may be useful in situations where the latent state-space distribution is highly nongaussian.

## Appendix A: Time Evolution of the Second Moment

## Appendix B: Basis Projection

## Appendix C: Numerical Implementation

We assess the accuracy of approximations to the original PP-GLM as a stochastic process by comparing the single-time marginals for the mean log rate, the log mean rate, and the standard deviation of the log rate (see Table 1). Ground-truth estimates of these moments are sampled from the “true” PPGLM using Monte Carlo sampling (10,000 samples) and compared to the predictions from the LNA, Langevin approximation, gaussian moment closure, and second-order moment closure. Of the moment equations, the second-order moment closure best recovers the variance, and the gaussian moment closure best recovers the mean.

Model likelihoods are also a widely used measure of model fit in the point-process literature (see Fernandes, Stevenson, Phillips, Segraves, & Kording, 2013; Ramirez & Paninski, 2014); these are appropriate for the singly stochastic interpretation of the PPGLMs, which treats the latent log intensity as a deterministic, nonstationary parameter. However, the true likelihood of the PPGLM as a doubly stochastic process in principle would result from a computationally intractable marginalization of the latent variables, and it is thus not readily accessible. As a coarse approximation, however, moment equations (see equations B.9–B.11) allow us to estimate a likelihood penalty related to slow dynamics (see Table 2). We estimated likelihood penalties via Bayesian filtering on the moment equations as a state-space model, handling the nonconjugate log-gaussian Poisson measurement update via moment matching. All three moment approximations broadly agree on the magnitude of the likelihood penalty. This slow-timescale likelihood estimate could potentially be merged with fast dynamics to build robust, dynamically accurate estimators for autoregressive PPGLM models, and approximation methods for the likelihood of the doubly stochastic interpretation of the PPGLM remain to be explored in future work.

## Acknowledgments

We acknowledge support from the Engineering and Physical Sciences Research Council of the United Kingdom under grant EP/L027208/1, Large Scale Spatio-Temporal Point Processes: Novel Machine Learning Methodologies and Application to Neural Multi-Electrode Arrays. We are indebted to David Schnoerr, Matthias Hennig, Wilson Truccolo, and the anonymous referees for valuable comments on preliminary versions of this letter.