## Abstract

Neurons communicate nonlinearly through spike activities. Generalized linear models (GLMs) describe spike activities with a cascade of a linear combination across inputs, a static nonlinear function, and an inhomogeneous Bernoulli or Poisson process, or Cox process if a self-history term is considered. This structure considers the output nonlinearity in spike generation but excludes the nonlinear interaction among input neurons. Recent studies extend GLMs by modeling the interaction among input neurons with a quadratic function, which considers the interaction between every pair of input spikes. However, quadratic effects may not fully capture the nonlinear nature of input interaction. We therefore propose a staged point-process model to describe the nonlinear interaction among inputs using a few hidden units, which follows the idea of artificial neural networks. The output firing probability conditioned on inputs is formed as a cascade of two linear-nonlinear (a linear combination plus a static nonlinear function) stages and an inhomogeneous Bernoulli process. Parameters of this model are estimated by maximizing the log likelihood on output spike trains. Unlike the iterative reweighted least squares algorithm used in GLMs, where the performance is guaranteed by the concave condition, we propose a modified Levenberg-Marquardt (L-M) algorithm, which directly calculates the Hessian matrix of the log likelihood, for the nonlinear optimization in our model. The proposed model is tested on both synthetic data and real spike train data recorded from the dorsal premotor cortex and primary motor cortex of a monkey performing a center-out task. Performances are evaluated by discrete-time rescaled Kolmogorov-Smirnov tests, where our model statistically outperforms a GLM and its quadratic extension, with a higher goodness-of-fit in the prediction results. In addition, the staged point-process model describes nonlinear interaction among input neurons with fewer parameters than quadratic models, and the modified L-M algorithm also demonstrates fast convergence.

## 1  Introduction

Neurons communicate through spike activities, where the temporal pattern of spike timing varies to pass information. The invasive electrophysiology recording technique allows the simultaneous acquisition of neuronal population activities. Neuroscientists are now able to study and describe how the information is transferred through spike activities with respect to inputs such as visual stimuli (Chichilnisky, 2001; Schwartz, Chichilnisky, & Simoncelli, 2002; Pillow & Simoncelli, 2003; Rust, Schwartz, Movshon, & Simoncelli, 2004), and outputs such as movement generation (Wessberg et al., 2000; Sanchez et al., 2002; Ergun, Barbieri, Eden, Wilson, & Brown, 2007; Wang, Paiva, Príncipe, & Sanchez, 2009), or among different areas in the brain (Okatan, Wilson, & Brown, 2005; Zanos et al., 2008; Eldawlatly, Jin, & Oweiss, 2009; Stevenson et al., 2009; Quinn, Coleman, Kiyavash, & Hatsopoulos, 2011). Wessberg et al. (2000) designed a tunable prosthetic retina encoder to partially restore visual function for humans with retinal degenerative dysfunctions. Serruya, Hatsopoulos, Paninski, Fellows, and Donoghue (2002) developed a brain-machine interface that allows robot arm control using spike activities recoded from a monkey's primary motor cortex. Song et al. (2007) proposed to design cognitive prostheses that can compensate for the loss of transregional communication caused by brain injury (Berger et al., 2005, 2012; Guggenmos et al., 2013).

Many computational methods have been proposed to analyze the functional relationship that encodes the stimulus into spike trains or decodes the states from the spike trains and predicts spikes from spike inputs. The encoding model describes how the neurons respond to the stimulus (or receptive fields) and uses two groups of methods to estimate parameters. The first set of methods, the classic moment-based methods, characterizes individual neural gain using the first two moments of spike-triggered stimulus distribution, namely, the spike-triggered average (STA) (De Boer & Kuyper, 1968; Marmarelis & Naka, 1972; Bryant & Segundo, 1976; Eggermont, Johannesma, & Aertsen, 1983; de Ruyter van Steveninck & Bialek, 1988; Ringach, Sapiro, & Shapley, 1997; Chichilnisky, 2001) and the spike-triggered covariance matrix (STC) (Schwartz et al., 2002; Sharpee, Rust, & Bialek, 2004; Horwitz, Chichilnisky, & Albright, 2005; Slee, Higgs, Fairhall, & Spain, 2005; Park & Pillow, 2011; Sharpee, Atencio, & Schreiner, 2011). Both the STA and STC rely on the assumption that inputs follow a gaussian distribution (Chichilnisky, 2001; Paninski, 2003; Schwartz, Pillow, Rust, & Simoncelli, 2006; Samengo & Gollisch, 2013) and cannot be directly used to describe the spike-in and spike-out relationship. The second set of methods, generalized linear models (GLMs), investigates the functional relationship between the stimulus and the spike output by maximizing the log likelihood on output spike trains (Brillinger, 1988; Brown, Kass, & Mitra, 2004), which has also been successfully used in characterizing receptive fields in sensory areas (Calabrese, Schumacher, Schneider, Paninski, & Woolley, 2011), sensorimotor areas (Park, Meister, Huk, & Pillow, 2014) and the hippocampus (Brillinger, 1988; Brown, Kass, & Mitra, 2004). GLMs are a powerful technique in fitting output spike trains and are a generalization of linear regression over each input history and are followed by a canonical link function (Brown, Frank, Tang, Quirk, and Wilson, 1998; Paninski, 2004; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005). However, they assume that the output spike probability depends on the linear combination of inputs, which ignores the nonlinear interaction across the input neurons.

Many efforts have been made in modeling input nonlinearity. Most of them introduce nonlinearity preprocessing before the GLM. For example, Ahrens, Paninski, and Sahani (2008) proposed a generalized bilinear model, which performs a pointwise nonlinear transformation before the linear filtering of the GLM, while McFarland, Cui, and Butts (2013) added a rectified linear unit to extract visual stimulus patterns as a neuronal input before the GLM. Artificial neural networks (ANNs) have also been developed to encode neural activities from stimulus inputs. Prenger, Wu, David, and Gallant (2004) designed a three-layer feedforward network to predict the smoothed spike rate of V1 neurons from natural image stimuli, and Wu, Park, and Pillow (2015) developed a three-layered neural subunit model to encode spikes of the primary visual cortex from white noise stimulus. In the model, the first layer is formed by an identical and spatially shifted convolutional kernel and a static nonlinearity. Finally, Benjamin et al. (2017) implemented a recurrent neural network (long short-term memory (LSTM)) to predict M1 spikes from macaque hand movement trajectories.

The above studies apply to encoding external stimulus into neural activities, but they are not appropriate models to investigate spike prediction from spiking input neurons. Examples of spike-in/spike-out prediction models include building the relation between spike trains recorded from hippocampal CA1 and CA3 (Song et al., 2007, 2009), the relation between spike trains from neurons in different prefrontal cortical columnar layers (Hampson et al., 2012; Marmarelis et al., 2014), and the relation between spike trains recorded from PMd (dorsal premotor cortex) and M1 (primary motor cortex) (Stevenson et al., 2009; Xing et al., 2017). A few of these have also considered the interaction among input neurons. In Song et al. (2007), a modified Volterra kernel approach, in which the Volterra kernel is used to approximate the correlations between each pair of neurons, was proposed to formulate the nonlinear interaction among hippocampus CA3 neurons. Results of experiments showed that the prediction accuracies increased with kernel complexity, and the second- and third-order Volterra kernel models were found to successfully predict the CA1 output spike distribution from CA3 input spike trains. The introduction of the Volterra kernel into the GLM enables nonlinear interaction among input neurons but increases the number of parameters due to the correlation calculated in a pairwise manner.

To mitigate the overfitting problem introduced by Volterra kernels, Marmarelis and his colleagues introduced and developed the Laguerre expansion techniques (Marmarelis, 1993), which greatly reduce the number of free parameters in a given-ordered Volterra system. Marmarelis et al. (2013) treated the Laguerre filters and polynomial nonlinearity as several “principal dynamic modes,” which were estimated using matrix decomposition techniques (eigenvalue decomposition and singular value decomposition). Alataris, Berger, and Marmarelis (2000) reframed the Laguerre expanded Volterra model as a Laguerre-Volterra network (LVN) with a polynomial activation function in its hidden layer. Geng and Marmarelis (2017) extended the LVN into a recurrent LVN (or autoregressive sparse LVN) by adding an L1 regularization term in its cost function and using one-lag-delayed output as the autoregressive input. Geng et al. (2018) extended the LVN into a neuronal mode network by replacing the polynomial activation function with a nonparametric multidimensional histogram.

The above Laguerre-Volterra modeling studies are based on dimensionality reduction only up to third-order Volterra systems. Although any continuous functions could be approximated by a finite-order Volterra kernel, the approximation of the nonlinear function always requires a considerably high kernel order (Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1999), as lower-order Volterra kernels may have limited power in expressing nonlinearity (Fitzgerald, Rowekamp, Sincich, & Sharpee, 2011; Wu et al., 2015). The brain is a highly correlated dynamic nonlinear system, where neurons with dendrites and synapses communicate though intraregional or interregional neuronal connections (Shepherd, 2004; Andersen, Morris, Amaral, Bliss, & O'Keefe, 2007). The neural interaction among input neurons could be arbitrary. In addition, the recording of neural activity is only the sampling of the whole dynamic system. The information transmission between input and output neurons may be affected by spike information generated by other unrecorded neurons. These unrecorded hidden neurons interconnect and form a neural pathway, where the feedforward activity of downstream output neurons may directly or indirectly feed back to upstream input neurons (Gerstner & Kistler, 2002; Trappenberg, 2002). Therefore, the functional relationship between the input and output neurons could be arbitrarily nonlinear. In a feedforward computational model on output spike prediction, the interaction among the recorded input neurons could also be regarded as arbitrarily nonlinear to the generation of the outputs.

In this letter, we follow the idea of ANNs (Cybenko, 1989; Kåurková, 1991) to use hidden units to allow interaction among input neurons as well as the spike generation of output neurons to be arbitrarily nonlinear. Importantly, our model describes the nonlinear interaction among input neurons with two cascaded linear-nonlinear (LN) stages, followed by an inhomogeneous Bernoulli process on the output spike generation. By maximizing the log likelihood on output spike trains, we theoretically demonstrate that this staged point-process model will be equivalent to building a two-layered ANN but probabilistically dealing with spike observation. A new training algorithm is developed to estimate model parameters using a modified L-M method. The model is validated on synthetic spike data and real spike train data recorded from PMd and M1 in a monkey performing a four-direction center-out task (Wang et al., 2015). The performance of the proposed method is evaluated by the goodness-of-fit using a discrete-time rescaling Kolmogorov-Smirnov (DTR-KS) test (Haslinger, Pipa, & Brown, 2010).

The staged point-process model and the modified L-M algorithm are presented in section 2, together with a short review of GLMs and GLMs with second-order polynomial series (second-order GLMs; Song et al., 2007). Performance of our model, GLMs, and second-order GLMs is evaluated and compared in section 3. The characteristics, limitations, and possible extensions are discussed in section 4, and a conclusion is given.

## 2  Method

In this section, we start with a brief description of GLMs and second-order GLMs and then propose the staged point-process model based on the point-process framework. This model is designed to predict spike trains of an output neuron (Y) from the spike trains of input neurons (X). The novelty of the proposed model is to include the hidden units (z) that allow nonlinear interaction among input neurons. Here, parameters of the staged point-process model are estimated by maximum-likelihood estimation (M-LE) probabilistically using a modified L-M optimization algorithm. In addition, the connection between our model and ANNs is theoretically revealed in the appendix.

### 2.1  GLMs and Second-Order GLMs

Given the observation interval $(0,T]$, $Xik$ denotes the spike observation of the $ith$ input neuron during the discrete-time interval $(tk-1,tk]$, where discrete-time intervals are indexed with $k$. $Yk$ denotes the spike observation of the output neuron Y during the time interval $(tk-1,tk]$, $λYk$ denotes the firing probability of the output neuron Y during the time interval $(tk-1,tk]$, and $Y^k$ denotes the spike prediction of the neuron Y during the time interval $(tk-1,tk]$. Let $NX$ be the number of input neurons in X. Hence, $X1:NX1:k$ represents the input spike trains over all neurons in X during the time interval $(0,tk]$, $Y1:k$ represents the spike train of the neuron Y during the time interval $(0,tk]$, $Y^1:k$ represents the predicted spike train of the neuron Y during the time interval $(0,tk]$, and $λY1:k$ represents the firing probability time series of the neuron Y during the time interval $(0,tk]$.

A GLM includes a linear filter $ω$, an invertible nonlinearity $σ(·)$, and an exponential-family probabilistic spiking model $P(·)$ and can be expressed as
$spikeprobability:λYk=σ∑i=1NX∑h=1HXik-h+1ωi,hj+ω0j,spiking:Y^k∣λYk∼P(λYk),$
(2.1)
where $H$ is the temporal history of the input spike train, $P(λ)$ denotes the distribution of output spikes, and $σ$ is the mean function related to $P$ (McCullagh, 1984; Truccolo et al., 2005). From equation (2.1), we can see that a GLM makes the strong assumption that the output spike $Yk$ depends on a linear combination of the input spike trains $X1:NXk-H+1:k$.
A second-order GLM considers the nonlinear correlation among input neurons with the second-order Volterra kernel and can be written as
$spikeprobability:λYk=σ(Q(X1:NXk-H+1:k)),spiking:Y^k∣λYk∼P(λYk),$
(2.2)
where $Q(X)=XTCX+bTX+a$ denotes a quadratic function of $X$. $C$ is a symmetric matrix composed by the second-order coefficients, $b$ is a vector of the first-order coefficients, and $a$ represents the background level of firing at Y. The input interaction of second-order GLM is described by a second-order Volterra kernel that precedes the GLM.
The weight parameters of GLMs and second-order GLMs can be estimated by maximizing the log likelihood described in Truccolo et al. (2005), which can be expressed in terms of the firing probability defined in equations 2.1 and 2.2. Here, if we assume that the firing of the output spike follows a Bernoulli distribution and nonlinear function $σ(·)=1/{1+exp[-(·)]}$ (McCullagh, 1984; Truccolo et al., 2005), we have
$L=∑k=1K[YklnλYk+(1-Yk)ln(1-λYk)],$
(2.3)
where $K$ denotes the number of time bins over the whole observation interval. To avoid overfitting and acquire a sparse estimation of GLM and second-order GLM, an L1-regularization term is introduced to the log-likelihood function defined in equation 2.3,
$L'=∑k=1K[YklnλYk+(1-Yk)ln(1-λYk)]-αR∥U∥1,$
(2.4)
where $U$ denotes the vector of free parameters in GLM or second-order GLM (For GLMs, $U={[(ωi,h)i=1NX]h=1H,ω0}$; for second-order GLMs, $U={[(Cm,n)m=2NX*H]n=1m-1,b,a}$), and $∥·∥1$ is L1 norm (Tibshirani, 1996; Kelly, Smith, Kass, & Lee, 2010). Here, the L1 regularization coefficient $αR$ is explored within $[0,30]$, and best results are selected for further comparison. The L1-regularized log likelihood is maximized using the iterative reweighted least squares (IRLS) algorithm (Holland & Welsch, 1977). This algorithm applies the Newton-Raphson method to the reweighted least squares problem, which is efficient and robust. Adding the vector of weight parameters $U$ in Newton-Raphson method is updated by
$Unew=Uold+∂2L'∂U∂U-1dL'dU,$
(2.5)
where $Unew$ is the vector of free parameters updated from $Uold$.

### 2.2  The Staged Point-Process Model

Here, we add an extra layer of hidden neurons that allows the arbitrary nonlinear interaction among input neurons. Let $Zjk$ be the $jth$ hidden neuron, $λZj,k$ be the $jth$ hidden unit representing the firing probability of $Zjk$ during the time interval $(tk-1,tk]$, and $NZ$ be the number of hidden neurons in Z. Thus, $λZ1:NZ,1:k$ denotes the time series of all hidden units' values during the time interval $(0,tk]$.

The staged point-process model can be described as a cascade structure containing two LN stages and an inhomogeneous Bernoulli or Poisson process. The cascaded LN stages approximate an ANN structure, which allows arbitrary nonlinear mapping between input and output spikes (Cybenko, 1989; Kůrková, 1991). Here, specifically, it allows the nonlinear interaction between input spikes. And different from an ANN, our model uses an inhomogeneous Bernoulli or Poisson process to generate spikes for output neuron predictions.

Based on the assumption of nonlinear interaction among input neurons, we probabilistically derive the staged point-process model (the see appendix), which can be specified by
$firstlayer:λZj,k=σ∑i=1NX∑h=1HXik-h+1ωi,hj+ω0j,secondlayer:λYk=σ∑j=1NZθjλZj,k+θ0,spiking:Y^k∣λYk∼P(λYk),$
(2.6)
where $ωi,hj$ represents the relationship between the spike of the $ith$ neuron during $(tk-h,tk-h+1]$ and the $jth$ hidden unit, and $θ$ is the second-stage linear filter pooling the effect of hidden units to the firing of Y. The computational model in equation 2.6 forms a structure that is illustrated in Figure 1.
Figure 1:

General structure of the staged point-process model between a spike of neuron Y and the spike trains of input neurons X. The hidden units are denoted by $λZ$. The first stage allows the nonlinear interaction among X through $ωi$ and $σ(·)$, where $λZi$ represents the firing probability of the $ith$ hidden neurons $Zi$. The second stage maps hidden units' firing probability to generate the spike of Y by weighted ($θ$) pooling the effect of hidden units plus a static nonlinear transformation $σ(·)$.

Figure 1:

General structure of the staged point-process model between a spike of neuron Y and the spike trains of input neurons X. The hidden units are denoted by $λZ$. The first stage allows the nonlinear interaction among X through $ωi$ and $σ(·)$, where $λZi$ represents the firing probability of the $ith$ hidden neurons $Zi$. The second stage maps hidden units' firing probability to generate the spike of Y by weighted ($θ$) pooling the effect of hidden units plus a static nonlinear transformation $σ(·)$.

### 2.3  MLE by Modified L-M Algorithm

In this section, we estimate weight parameters of the staged point-process model by maximizing the log-likelihood $L$ described in equation 2.3. Here, we define the vector of weight parameters in the staged point-process model as
$W={[((ωi,hj)i=1NX)h=1H,ω0j]j=1NZ,(θj)j=1NZ,θ0}.$
(2.7)
As our model (see equation 2.6) contains two cascaded LN stages, the optimization surface of log-likelihood function may not be concave (Boyd & Vandenberghe, 2004), which may lead to weak performance when applying the IRLS algorithm. An alternative choice is a gradient-based searching algorithm, like the gradient ascent algorithm, whose weight vector is updated as
$Wnew=Wold+νG(L),$
(2.8)
where $Wnew$ is the weight vector updated from $Wold$ and $G(L)$ is the gradient vector of the log likelihood defined in equation 2.3, and $ν$ is the learning rate. However, the gradient ascent algorithm is relatively slow because it searches only in the direction of the gradient vector (Wilamowski & Yu, 2010). Therefore, in order to maximize the log likelihood of the staged point-process model, we propose a new algorithm, based on the modification of the L-M algorithm (Levenberg, 1944; Marquardt, 1963). The L-M algorithm balances between the Newton-Raphson algorithm and gradient ascent algorithm. The L-M algorithm considers the curvature information of optimization surface by using a Hessian matrix; thus, it takes a relatively close route to the optimum. This algorithm has been mainly used in nonlinear least squares problems, where the optimization goal is a least squares error. The Hessian matrix of the least squares error can be approximated using a Jacobian matrix (Hagan & Menhaj, 1994), but the Hessian matrix of the log likelihood cannot be approximated in the same way (the residue using Jacobian matrix approximation is not practically negligible). Therefore, we modify the L-M algorithm by directly calculating the Hessian matrix. The vector of the weight parameters is updated by
$Wnew=Wold+(H(L)+μI)-1G(L),$
(2.9)
where $H(L)$ specifically denotes the Hessian matrix of the log likelihood defined in equation 2.3 and $I$ is the identity matrix. In addition, $μ$ is a positive combination coefficient that enables the L-M algorithm to shift between the Newton-Raphson algorithm ($μ→0,dW≈H(L)-1G(L)$) and the gradient ascent algorithm ($μ→∞,Wnew≈Wold+G(L)/μ$). By taking the derivative of the log likelihood, the gradient vector $G(L)$ in equations 2.8 and 2.9 can be given as
$G(L)=∂L∂ωi,hji=1NXh=1H,∂L∂ω0jj=1NZ,∂L∂θjj=1NZ,∂L∂θ0∂L∂θj=(Y1:K-λY1:K)(λZj,1:K)T∂L∂θ0=∑k=1K(Yk-λYk)∂L∂ωi,hj=θj[(Y1:K-λY1:K)∘λZj,1:K∘(1→-λZj,1:K)]XiH-h+1:K-h+1∂L∂ω0j=∑k=1Kθj(Yk-λYk)λZj,k(1-λZj,k),$
(2.10)
where $∘$ denotes the elementwise product of matrix, $(·)T$ denotes the transpose of a matrix, and $1→$ denotes a vector which has $K$ elements equal to 1. And the Hessian matrix is given as
$H(L)=∂2L∂ω∂ω∂2L∂θ∂ω∂2L∂ω∂θ∂2L∂θ∂θ,$
(2.11)
where each part in equation 2.11 can be written as
$∂2L∂θ∂θ=λZ1:NZ˜diag{(1→-λY1:K)∘λY1:K}(λZ1:NZ˜)TλZ1:NZ˜[(1→-λY1:K)∘λY1:K]T[(1→-λY1:K)∘λY1:K](λZ1:NZ˜)T∑k=1K(1-λYk)λYk,∂2L∂ωi∂ωj=θiθjX˜diag{(1→-λY1:K)∘λY1:K∘λZi˜∘(1→-λZi˜)∘λZj˜∘(1→-λZj˜)}X˜T(i≠j),∂2L∂ωi∂ωi=X˜diag{θi2(1→-λY1:K)∘λY1:K∘[λZi˜∘(1→-λZi˜)]2+θi(Y1:K-λY1:K)∘(1→-2λZi˜)∘λZi˜}X˜T(i=j),∂2L∂ωi∂θj=θi[(1→-λY1:K)∘λY1:K∘λZi˜∘(1→-λZi˜)∘λZj˜]X˜T(i≠j),∂2L∂ωi∂θi=[θi(1→-λY1:K)∘λY1:K∘(λZi˜)2∘(1→-λZi˜),+(Y1:K-λY1:K)∘λZi˜∘(1→-λZi˜)]X˜T(i=j)∂2L∂ωi∂θ0=∂2L∂θ0∂ωiT=θi[(1→-λY1:K)∘λY1:K∘λZi˜∘(1→-λZi˜)]X˜T,$
(2.12)
where $diag{·}$ denotes the symmetric diagonal matrix whose diagonal entries starting from the upper left are the value of a vector, $ωi={[(ωj1,hi)j1=1NX]h=1H,ω0i}$ is the first-stage weight vector that pools the effect of the inputs into the $ith$ hidden unit, $X˜={[(Xi-h+1:K-h+1)h=1H]i=1NX,1}$ represents the spike train ensemble corresponding to corresponding to $λY1:K$, and $λZi˜$ represent $λZi,1:K$.

The modified L-M algorithm updates weight parameters not only by calculating the gradient vector and Hessian matrix, but also by adjusting the combination coefficient $μ$. When the update does not increase the log likelihood, $μ$ will multiply and the algorithm will be close to the gradient ascent algorithm. Otherwise, $μ$ will diminish and accept the weight update. The algorithm stops if the change of log likelihood is below a threshold $δ$ for several (seven in this letter) steps in succession. The initial value of $μ$ is set to 0.01 and $δ=0.001$ (Hagan & Menhaj, 1994). In a real case, it is important to try several different initial schemes of $μ$ and $δ$ to get a better optimization. Besides this, the optimal result is acquired by maximizing the log likelihood on the training data. In this letter, we calculate the goodness-of-fit on the validating data to acquire the optimal parameters of $μ$ and $δ$.

The estimation of the staged point-process model can be equivalent to the backpropagation training process of a two-layered ANN. But the cost function has been switched from the traditional least squares error to the cross-entropy error (the point-process log likelihood), which deals with 0-1 point-process observations. And due to this difference, the Hessian matrix here is directly calculated from the log likelihood of Bernoulli spike events without approximation using the Jacobian matrix.

### 2.4  Assessing Goodness-of-Fit

In this letter, goodness-of-fit is assessed by the DTR-KS test (Haslinger et al., 2010), where the maximal distance (denoted by $D$) between the DTR-KS curve and the 45$∘$ line is calculated for model evaluation. To acquire stable results, the statistic $D$ in this letter is averaged on $n$ repeated calculations ($n=10$). In addition, since the 95% confidence bound is influenced by the actual spike count of the output neuron (Brown, Barbieri, Ventura, Kass, & Frank, 2002; Haslinger et al., 2010), statistics $D$ with the same value may reflect-different goodness-of-fit when evaluating different output spike trains. Therefore, to evaluate the general spike prediction performance across different output neurons, we compare statistic $D$ to the 95% confidence bound and define the distance-bound ratio (DBR), which is written as
$DBR=D1.36Nspike,$
(2.13)
where $Nspike$ denotes the actual spike count of the given output spike train.

### 2.5  Data Collection and Preprocessing

To obtain real data, we simultaneously recorded the neural spike trains from the PMd and M1 cortical area of a rhesus macaque monkey in a behavioral experiment. The monkey was trained to perform a four-direction center-out task by controlling a joystick. In each trial, directional targets randomly appeared at one location out of four on a monitor, and the macaque pulled a joystick to move the cursor to match the target displayed on the screen 100 cm in front. Raw neural data were recorded by a Cerebus Data Acquisition System (Blackrock Microsystems, Salt Lake City, UT, U.S.A.) with a sampling rate of 30 kHz. A total 127 neurons of spike trains from the macaque's PMd cortex and 102 neurons from the M1 cortex were recorded for 636.4 seconds. The raw data were then filtered by a bandpass filter with the band ranging from 250 Hz to 7.5 kHz. An offline spike sorting was performed on the filtered neural data using the software Offline Sorter (Nicolelis et al., 2003) (Plexon, Dallas, TX, U.S.A.) to restore the spike timings for further processing. The bin size of the spike train was set to be 10 msec such that the time intervals having more than one spike would constitute 1.29% (PMd) and 1.05% (M1) of all the time slots containing spikes (Wang et al., 2009). The animal-handling procedures were approved by the Animal Care Committee at Zhejiang University, China, abiding strictly by the Guide for Care and Use of Laboratory Animals (China Ministry of Health). Further details of this experiment can be found in Xu et al. (2013) and Wang et al. (2015).

Since not all recorded neural activities are related to the movement task (Xu et al., 2013), we preprocessed the spike trains to identify neurons that are more correlated to behavior, as well as within two cortical areas. For each M1 spike train, the top 10 PMd neurons having maximum mutual information (Nirenberg & Latham, 2003) with the M1 spike train were selected out for the prediction model. The selection is helpful to reduce the parameter space for a second-order GLM, where the number of parameters grows quadratically with its input dimensionality. After preprocessing, spike trains from 54 M1 neurons and their corresponding top 10 PMd neurons were used for our study.

## 3  Results

The staged point-process model is evaluated first on synthetic spike data and then applied to the real neural data recorded from the PMd and M1 cortex of a monkey performing center-out tasks. The overall performance is compared with GLMs and second-order GLMs. Both the simulated and real data are divided into three parts: 60% for training, 10% for validation, and 30% for testing. The training stops when the log likelihood is maximized on the validation set. In both cases, the model performances on the test data are evaluated by the DBR defined in equation 2.13 and the correlation between the prediction and real spike train. Besides this, the performance using the modified L-M algorithm is evaluated for both the synthetic data and real data and compared with the gradient ascent algorithm and Newton-Raphson algorithm.

### 3.1  Simulation Study on Synthetic Spike Data

We start from synthesizing input spike trains for two interactive neurons, and we transform the inputs into one output spike sequence using a linear, quadratic, and nonlinear function, respectively. This implies that the generation of the output depends on the different levels of the nonlinear interaction between input neurons.

The spike probability of the input spike trains is designed to be a sinusoidal function $λX(t)=αXsin(βXt+γX)+δX$, where $αX$ is the fluctuating amplitude determining the distance between the max and min firing probability, $βX$ is the fluctuating frequency of the firing probability, $γX$ determines the initial phase of the fluctuation, and $δX$ determines the background firing probability. For each time bin, a spike is generated as a Bernoulli random variable with the probability determined by $λX(t)$, where the size of the time bins is set to be 10 msec (Wang et al., 2009). Here, the background firing probabilities for both input neurons are set to be 0.30, the parameter $αX$ is set to be 0.30, and the parameter $βX$ is set to be 0.01. Specifically, $γX$ is set to be 1.05 for input neuron 1 ($γX1$), and $-$1.05 for input neuron 2 ($γX2$). Here, the spike probability of the two input neurons ($λX1$ and $λX2$) shares the same fluctuating frequencies $βX$ but different initial phases $γX$. This could happen when both neurons are driven by the same external stimulus or the same behavior. An example of synthetic input spike trains is shown in Figure 2.

Figure 2:

Two simulated input spike trains for each input neuron. Only one segment of the input spike train is presented here. Panels A.1 and B.1 illustrate the synthetic spike train of input neurons 1 and 2. Panels A.2 and B.2 illustrate the synthetic firing probabilities of the two input neurons.

Figure 2:

Two simulated input spike trains for each input neuron. Only one segment of the input spike train is presented here. Panels A.1 and B.1 illustrate the synthetic spike train of input neurons 1 and 2. Panels A.2 and B.2 illustrate the synthetic firing probabilities of the two input neurons.

We design three different bivariate nonlinear functions $λY(t)=g(λX1(t),λX2(t))$ to generate the output spike trains from the inputs and test the expressive power of different models in approximating different levels of nonlinearity. Note that the firing probability is a nonnegative value and $λY(t)$ is 0 if the transformed value is negative. The first transfer function is a linear function written as $λY1(t)=g1(λX1,λX2)=α1λX1+β1λX2+γ1$ (illustrated in Figure 3A.1), where $α1$ and $β1$ modulate the linear dependence between the input and output spike and $γ1$ is the background level of this transformation. Here, in order to restrict the max and min firing probability of the output spike, we set $α1$ and $β1$ to be 0.83 and $γ1$ to be $-$0.25. Thus, the linearly transformed firing probability is illustrated in Figure 3A.3. One realization of the spike train generated according to the firing probability is illustrated in Figure 3A.2. We can see that both the raster and the firing probability of the output spike train are similar to that of the input spike trains due to the linear dependence between input and output spikes.

Figure 3:

Synthetic output spike trains with different dependencies. Only one segment of firing probabilities and the corresponding realization of the spike trains are presented here. Left-hand plots A.1, B.1, and C.1 represent the linear, quadratic, and nonlinear transformations defined in section 3.1, respectively, where the color represents the transformed firing probability of the output neuron. Right-hand curves A.3, B.3, and C.3 show the output firing probability transformed for three different dependencies. Right-hand raster plots A.2, B.2, and C.2 represent synthetic output spike trains correspondingly.

Figure 3:

Synthetic output spike trains with different dependencies. Only one segment of firing probabilities and the corresponding realization of the spike trains are presented here. Left-hand plots A.1, B.1, and C.1 represent the linear, quadratic, and nonlinear transformations defined in section 3.1, respectively, where the color represents the transformed firing probability of the output neuron. Right-hand curves A.3, B.3, and C.3 show the output firing probability transformed for three different dependencies. Right-hand raster plots A.2, B.2, and C.2 represent synthetic output spike trains correspondingly.

The second transfer function is a quadratic function written as $λY2(t)=g2(λX1,λX2)=α2λX12+β2λX22+γ2λX1λX2+δ2λX1+ε2λX2+ζ2$ (illustrated in Figure 3B.1), where $α2$ and $β2$ represent the nonlinearity within each input neuron, $γ2$ represents the nonlinear interaction between input neurons, $δ2$ and $ε2$ represent the linear dependence between the input and output spike, and $ζ2$ is the background level of this quadratic transformation. Here we set $α2$ to be $-$5.56, $β2$ to be $-$5.56, $γ2$ to be $-$11.11, $δ2$ to be 6.67, $ε2$ to be 6.67, and $ζ2$ to be $-$1.50. The quadratically transformed spike probability and one realization of the corresponding spike trains are illustrated in Figures 3B.3 and 3B.2, respectively.

The third transfer function is a sinusoidal function written as $λY3(t)=g3(λX1,λX2)=α3sin[β3λX1+γ3λX2+δ3]+ε3$ (illustrated in Figure 3C.1). Here the output spike train is generated using a nonlinear transformation function on two input neurons. If we expand the transfer function using a Taylor series, this nonlinear transfer function can be equivalently expressed by combining the different order information of the interaction between the two input neurons. For this sinusoidal case, apparently, the nonlinearity cannot be fully described using the information up to the third-order Volterra kernels. Here, the parameter $α3$ determines the amplitude of the fluctuation, $β3$ and $γ3$ determine the frequency of interactions that arise from different input signals, $δ3$ determines the initial phase, and $ε3$ is the background level of the transformation. Here, we set $α3$ to be 0.25, $β3$ to be 15.71, $γ3$ to be 15.71, $δ3$ to be 3.14, and $ε3$ to be 0.25. The nonlinearly transformed firing probability and one realization of the corresponding spike train are illustrated in Figures 3B.3 and 3B.2. Compared to the input firing probabilities (see Figures 2A.2 and 2B.2), we can see that the output firing probability fluctuates with more complex patterns due to the higher degree of nonlinearity.

We generate output spike trains (see Figures 3A.2, 3B.2, and 3C.2) for three types of dependence (see Figures 3A.1, 3B.1, and 3C.1). The temporal length of each spike train is 200 sec. The temporal history ($H$) of the input spike train is explored ranging from 10 msec to 1 sec for all models. The optimal input spike train history H is set to be 500 msec. The number of hidden units ($NZ$) is explored ranging from 2 to 100 for the stage point-process model, and the optimal number of hidden units $NZ$ is set to be 15. The maximum number of iterative steps is set to be 1000 for the modified L-M algorithm in the stage point-process model. For the model, the first-stage weight parameters $θ$ and the second-stage weight parameters $θ$ are initialized as uniform distributed random variables in $[-ξ1NZ,ξ1NZ]$ and $[-ξ2HNX,ξ2HNX]$, respectively, where $ξ1$ and $ξ2$ denote the initialized weight variance of $θ$ and $ω$. Similarly, weights of the GLM and second-order GLM are initialized as uniform distributed random variables in $[-ξNin,ξNin]$, where $ξ$ denotes the initialized weight variance. For the GLM, $Nin$ denotes the dimension of inputs. And for the second-order GLM, $Nin$ denotes the number of input spike multiplication pairs. Here, $ξ1$, $ξ2$, and $ξ$ are explored within $[0.01,10]$ during the training.

To reduce the randomness of the spike generation, the synthetic input and output spike trains for both training and validation are generated according to the firing probability 10 times. The DBRs of different models in approximating the three output transformations are shown in Table 1, where the number of hidden units ($NZ$) for the staged point-process model is set to be 15. We can see that the averaged DBR of the staged point-process model is below 1 in predicting all types of output spike trains. These results indicate the staged point-process model is able to predict output spike trains with different levels of nonlinear dependence on the interactive input spikes. One-tailed paired Student's $t$-tests are performed to compare the model performances on spike prediction. When predicting linearly transformed output spike trains, no significant difference in performance has been found. When predicting quadratically transformed output spike trains, the performance of the GLM is significantly lower than those of the staged point-process model ($p<$1e-6) and the second-order GLM ($p<$1e-6). And when predicting nonlinearly transformed output spike trains, the performance of the staged point-process model is significantly higher than those of the GLM ($p<$1e-8) and second-order GLM ($p<$1e-5).

Table 1:
Averaged DBR of Three Models on Output Spike Prediction.
 Model GLM Second-Order GLM Staged Point-Process Model Linear dependence $0.76±0.09$ $0.76±0.17$ $0.68±0.07$ Quadratic dependence $1.78±0.05$ $1.03±0.08$ $0.96±0.09$ Nonlinear dependence $1.90±0.06$ $1.59±0.11$ $0.92±0.14$
 Model GLM Second-Order GLM Staged Point-Process Model Linear dependence $0.76±0.09$ $0.76±0.17$ $0.68±0.07$ Quadratic dependence $1.78±0.05$ $1.03±0.08$ $0.96±0.09$ Nonlinear dependence $1.90±0.06$ $1.59±0.11$ $0.92±0.14$

To give a straightforward illustration of the learning capability of different models, we present the learned transformation across different initial phase differences ($ΔγX=γX1-γX2$) between two input neurons. By training the three models in all cases on phase differences and interpolating the output predictions with respect to two input firing probabilities $(λX1,λX2)$, we obtain the learned transformation patterns. The DTR-KS plot and prediction results using one phase difference as an example (columns 1 to 3), and learned transformations across all phase differences (column 4) are shown in Figures 4, 5, and 6. In addition, for the convenience of comparing the predicted and synthetic firing probabilities, in column 3 of Figures 4 to 6, the prediction results are smoothed using a 500 msec normalized gaussian kernel, with the smoothed curves illustrated in the same plot.

Figure 4:

Prediction results of linearly transformed spike trains. Only one segment of firing probabilities and spike trains is presented when the initial phase difference of two input neurons is $π/3$. (A.1) The color bar represents the level of the output firing probability. (A.2) The linearly transformed output firing probabilities. (A.3) The raster plot is the actual linearly transformed output spike train. (A.4) The actual linear transformation. (B.1, C.1, D.1) The DTR-KS plots of the GLM, second-order GLM, and staged point-process model, respectively. (B.2, C.2, D.2) Raster plots of the predicted spike trains of the GLM, second-order GLM, and staged point-process model, respectively. (B.3, C.3, D.3) Firing probabilities of the GLM, second-order GLM, and staged point-process model, respectively. (B.4, C.4, D.4) The learned transformations across all phase differences using the GLM, second-order GLM, and staged point-process model, respectively.

Figure 4:

Prediction results of linearly transformed spike trains. Only one segment of firing probabilities and spike trains is presented when the initial phase difference of two input neurons is $π/3$. (A.1) The color bar represents the level of the output firing probability. (A.2) The linearly transformed output firing probabilities. (A.3) The raster plot is the actual linearly transformed output spike train. (A.4) The actual linear transformation. (B.1, C.1, D.1) The DTR-KS plots of the GLM, second-order GLM, and staged point-process model, respectively. (B.2, C.2, D.2) Raster plots of the predicted spike trains of the GLM, second-order GLM, and staged point-process model, respectively. (B.3, C.3, D.3) Firing probabilities of the GLM, second-order GLM, and staged point-process model, respectively. (B.4, C.4, D.4) The learned transformations across all phase differences using the GLM, second-order GLM, and staged point-process model, respectively.

Figure 5:

Prediction results of quadratically transformed spike trains. Only one segment of the firing probabilities and spike trains is presented here when the initial phase difference of two input neurons is $π/3$. (A.1) The color bar representing the level of the output firing probability. (A.2) The quadratically transformed output firing probabilities. (A.3) The actual quadratically transformed output spike train in the raster plot. (A.4) The actual quadratic transformation. (B.1, C.1, D.1) The DTR-KS plots of the GLM, second-order GLM, and staged point-process model, respectively. (B.2, C.2, D.2) Raster plots of the predicted spike trains of the GLM, second-order GLM, and staged point-process model, respectively. (B.3, C.3, D.3) Firing probabilities of the GLM, second-order GLM, and staged point-process model, respectively. (B.4, C.4, D.4) The learned transformations across all phase differences using the GLM, second-order GLM, and staged point-process model, respectively.

Figure 5:

Prediction results of quadratically transformed spike trains. Only one segment of the firing probabilities and spike trains is presented here when the initial phase difference of two input neurons is $π/3$. (A.1) The color bar representing the level of the output firing probability. (A.2) The quadratically transformed output firing probabilities. (A.3) The actual quadratically transformed output spike train in the raster plot. (A.4) The actual quadratic transformation. (B.1, C.1, D.1) The DTR-KS plots of the GLM, second-order GLM, and staged point-process model, respectively. (B.2, C.2, D.2) Raster plots of the predicted spike trains of the GLM, second-order GLM, and staged point-process model, respectively. (B.3, C.3, D.3) Firing probabilities of the GLM, second-order GLM, and staged point-process model, respectively. (B.4, C.4, D.4) The learned transformations across all phase differences using the GLM, second-order GLM, and staged point-process model, respectively.

Figure 6:

Prediction results of nonlinearly transformed spike trains. Only one segment of the firing probabilities and spike trains when the initial phase difference of two input neurons is $π/3$ is presented here. (A.1) The color bar representing the level of the output firing probability. (A.2) The nonlinearly transformed output firing probabilities. (A.3) The actual nonlinearly transformed output spike train in the raster plot. (A.4) The actual nonlinear transformation. (B.1, C.1, D.1) The DTR-KS plots of the GLM, second-order GLM and staged point-process model, respectively, (B.2, C.2, D.2) Raster plots of the predicted spike trains of the GLM, second-order GLM, and staged point-process model, respectively. (B.3, C.3, D.3) Firing probabilities of the GLM, second-order GLM, and staged point-process model, respectively. (B.4, C.4, D.4) The learned transformations across all phase differences using the GLM, second-order GLM, and staged point-process model, respectively.

Figure 6:

Prediction results of nonlinearly transformed spike trains. Only one segment of the firing probabilities and spike trains when the initial phase difference of two input neurons is $π/3$ is presented here. (A.1) The color bar representing the level of the output firing probability. (A.2) The nonlinearly transformed output firing probabilities. (A.3) The actual nonlinearly transformed output spike train in the raster plot. (A.4) The actual nonlinear transformation. (B.1, C.1, D.1) The DTR-KS plots of the GLM, second-order GLM and staged point-process model, respectively, (B.2, C.2, D.2) Raster plots of the predicted spike trains of the GLM, second-order GLM, and staged point-process model, respectively. (B.3, C.3, D.3) Firing probabilities of the GLM, second-order GLM, and staged point-process model, respectively. (B.4, C.4, D.4) The learned transformations across all phase differences using the GLM, second-order GLM, and staged point-process model, respectively.

Figure 4 illustrates the prediction results of the linearly transformed output spike train. Here, the optimal regularization coefficient $αR$ for the second-order GLM is 0.003 and that for the GLM is 0. The first row shows the ground truth of the output spike train and the input-output dependence. Rows 2, 3, and 4 represent outputs of the GLM, second-order GLM, and stage point-process model, respectively. We can see that both the raster plots (see Figures 4B.2, 4C.2, and 4D.2), the predicted firing probabilities (see Figures 4B.3, 4C.3, and 4D.3), and the learned transformations (see Figures 4B.4, 4C.4, and 4D.4) of the different models are similar to each other and match well with the synthetic output spike train (see Figure 4A.2), the transformed output firing probabilities (see Figure 4A.3), and the original transformation (see Figure 4A.4). All the points in the KS plots of these three models (see Figures 4B.1, 4C.1, and 4D.1) lie within the 95% confidence bounds. These results indicate that the GLM and other models successfully predict the linearly transformed output spike train with performances that are close to one another.

Figure 5 illustrates the prediction results of the quadratically transformed output spike train. Here, the optimal regularization coefficient $αR$ for the second-order GLM is 0 and that for the GLM is 0. The GLM fails to match the quadratically transformed output spike train from Figures 5B.2, 5B.3, and 5B.4), while the second-order GLM and staged point-process model match well with the quadratically transformed output spike train prediction (see Figures 5C.2 and 5D.2), and KS plots (see Figures 5(C.1 and 5D.1) and learned transformation (see Figures 5C.4 and 5D.4). This indicates that the second-order GLM is capable of catching the quadratic transformation, as is our stage point-process model.

Figure 6 illustrates the prediction results of the nonlinearly transformed output spike train. Here, the optimal regularization coefficient $αR$ for the second-order GLM is 0 and that for the GLM is 0. Here, only our stage point-process model is able to predict the output firing activity (see Figures 6D.2 and 6D.3) and capture the nonlinear transformation (see Figure 6D.4), while both the GLM and second-order GLM fail, with their KS plots out of the 95% confidence bounds (see Figures 6B.1 and 6C.1). And the learned transformation patterns (see Figures 6B.4 and 6C.4) are very different from the ground truth (see Figure 6A.4). These results indicate that our staged point-process model demonstrates capability in modeling neural interactions when the output neurons depend on the interactive input neurons with high-order nonlinearity.

Here we validate the modified L-M algorithm during the training. We apply the staged point-process model together with the modified L-M algorithm, gradient ascent algorithm, and Newton-Raphson algorithm, respectively, to maximize the log likelihood. Here we demonstrate the case illustrated in Figure 3D.2 due to its high degree of nonlinearity. The maximum number of iteration steps is 1000. The averaged convergence curve over 10 trials is shown in Figure 7. From Figure 7, we can see that the convergence of the modified L-M algorithm is faster than that of the gradient ascent algorithm. The Newton-Raphson algorithm converges initially fast but stops at less optimal results. This is because the optimization surface is not concave, and the existence of local minima limits the performance of the algorithm. The modified L-M algorithm is the best, with fast convergence and a shift to better performance due to the balance between the above two algorithms. Overall, the modified L-M algorithm needs 428 steps on average to converge, while the gradient ascent algorithm does not converge in its 1000 iteration steps. The final DBR on the test set of the modified L-M algorithm is 0.92 on average, which is 39.4% lower than that of the gradient ascent algorithm and 51.3% lower than that of the Newton-Raphson algorithm. Besides this, the convergence curves of the modified L-M algorithm demonstrate a significant difference among the 10 trials. This is because the performance shifts controlled by the combination coefficient $μ$ are different among the trials.

Figure 7:

The convergence of the staged point-process model. Left, center, and right panels illustrate the change of DBR on training, validation, and test set, respectively. Red, blue and green curves and ribbons represent the convergence of the Newton-Raphson algorithm, gradient ascent algorithm, and modified L-M algorithm, respectively.

Figure 7:

The convergence of the staged point-process model. Left, center, and right panels illustrate the change of DBR on training, validation, and test set, respectively. Red, blue and green curves and ribbons represent the convergence of the Newton-Raphson algorithm, gradient ascent algorithm, and modified L-M algorithm, respectively.

From the simulation study, we can see that the staged point-process model demonstrates the best capability to obtain the mapping on the linear, quadratic, and nonlinear dependences between the input and output spike trains. The model works well with the modified L-M optimization algorithm and outperforms the GLM and second-order GLM. Besides this, compared with the Newton-Raphson algorithm and gradient ascent algorithm, the modified L-M algorithm learns faster and converges with better performance.

### 3.2  Application on the Prediction from PMd to M1

In this section, we test the performance of the staged point-process model trained with the modified L-M algorithm on real neural data recorded from a macaque PMd and M1 cortex. Performance is evaluated on the test data over several segments, where each segment contains 3000 samples with 50% temporal overlapping. The performance averaged on segments is compared among the GLM, second-order GLM, and staged point-process model.

#### 3.2.1  Predicting Spike Trains Form PMd to M1 with Different Models

In this section, we test our proposed model to predict M1 spike trains using the selected PMd neurons and compare its performance with the GLM and second-order GLM. The temporal history (H) of the PMd spike train ranging from 10 msec to 100 msec is explored, as well as the optimal number of hidden units, ranging from 2 to 40 (see Figure 8). Note that both the GLM and the second-order GLM are L1-regularized. From Figure 8, we can see that the optimal length of the input spike train history H is 50 msec, where the performance of the second-order GLM is the best. The optimal number of hidden units $NZ$ is set to be 20 considering the trade-off between the performance and computation burden. The maximum number of iterative steps is set to be 2000 for the modified L-M algorithm. Initialization of the weight parameters is set as in section 3.1, where the initialized weight variances $ξ1,ξ2,andξ$ were explored within [0.01, 10] during the training.

Figure 8:

Parameter exploration in M1 spike prediction. (A) The distance-bound ratio (DBR) of the staged point-process model on the test data versus the number of hidden units ($NZ$), where the length of the PMd history ($H$) is set to be 50 msec. (B) The DBR of the staged point-process model (green line), GLM (blue line), and second-order GLM (orange line) on the test data versus the temporal length of the PMd history ($H$), where $NZ$ for the stage point-process model is set to be 20.

Figure 8:

Parameter exploration in M1 spike prediction. (A) The distance-bound ratio (DBR) of the staged point-process model on the test data versus the number of hidden units ($NZ$), where the length of the PMd history ($H$) is set to be 50 msec. (B) The DBR of the staged point-process model (green line), GLM (blue line), and second-order GLM (orange line) on the test data versus the temporal length of the PMd history ($H$), where $NZ$ for the stage point-process model is set to be 20.

The predicted M1 firing probabilities of the GLM, second-order GLM, and staged point-process model on the 19th and 28th M1 neuron are illustrated in Figure 9. The predicted firing probability, as well as the actual spike observation of the output, are smoothed by convolving a normalized gaussian kernel (the mean is 0, and the standard deviation is 1). Here, the optimal size of this gaussian kernel is explored within the range from 0.1 sec to 10 sec, and the correlation coefficients (CCs) between the smoothed M1 spike train and the smoothed prediction results under different kernel sizes are illustrated in Figure 9 (see panels and 9A.3 and 9B.3). From Figure (see panels 9A.3 and 9B.3), we can see that the CCs of the three different models are relatively better under the kernel size of 1.5 sec, while too large a kernel size will smooth out the detail of the prediction and too small a size will lead to poor performance of the GLM and second-order GLM. The predicted firing probabilities smoothed using the 1.5-sec kernel are illustrated in Figures 9A.2 and 9B.2. From Figures 9A.2 and 9B.2, we can see that the predicted firing probabilities of the GLM and second-order GLM fail to match the smoothed M1 spike train at both the 19th and 28th M1 neurons, while the predicted firing probability of the staged point-process model matches well. For the spike prediction in neuron 19, the staged point-process model achieves the highest CC, at 0.94, while the GLM achieves a CC at 0.55 and the second-order GLM achieves a CC at 0.59. And for the spike prediction in neuron 28, the staged point-process model achieves the highest CC at 0.97, while the GLM achieves a CC at 0.80, and the second-order GLM achieves a CC at 0.82.

Figure 9:

Comparison of the predicted firing probabilities on test data using GLM (blue line), second-order GLM (orange line), stage point-process model (green line), and actual spike train (red line). Results of neurons 19 and 28 are illustrated in columns A and B, respectively. The actual spike train is shown in the top plot, and the predicted spike probabilities smoothed by a gaussian kernel are shown in the middle. The bottom plot illustrates the cross-correlation between the predicted spike rates and smoothed M1 spike train versus smooth kernel size.

Figure 9:

Comparison of the predicted firing probabilities on test data using GLM (blue line), second-order GLM (orange line), stage point-process model (green line), and actual spike train (red line). Results of neurons 19 and 28 are illustrated in columns A and B, respectively. The actual spike train is shown in the top plot, and the predicted spike probabilities smoothed by a gaussian kernel are shown in the middle. The bottom plot illustrates the cross-correlation between the predicted spike rates and smoothed M1 spike train versus smooth kernel size.

DTR-KS plots of several individual M1 neurons are illustrated in Figure 10. The two $45∘$ black dashed lines form the 95% confidence bounds of the DTR-KS test. We expect the maximal distance between the DTR-KS curve and $45∘$ black solid line to be smaller for better goodness-of-fit. From Figure 10, we can see that the DTR-KS curve of the staged point-process model is always the closest to the $45∘$ black solid line. This further validates that the staged point-process model statistically encodes the spike input the best.

Figure 10:

DTR-KS plots for individual M1 neurons on test data. The results of three models are shown: GLM (blue curve), second-order GLM (orange curve), and staged point-process model (green curve).

Figure 10:

DTR-KS plots for individual M1 neurons on test data. The results of three models are shown: GLM (blue curve), second-order GLM (orange curve), and staged point-process model (green curve).

We calculate the DBR of the GLM, second-order GLM, and the staged point-process model for each neuron. The statistical results of DBR across all M1 neurons using the three models are compared in the box plot of Figure 11. From the figure, we can see that the staged point-process model achieves the smallest neuron-averaged DBR. The one-tailed paired Student's $t$-test shows that the DBR of the staged point-process model at all M1 neurons is significantly lower those of the GLM ($p<$ 0.05) and second-order GLM ($p<$ 0.05). We also calculate the number of neurons of the prediction results that pass the DTR-KS test. For the GLM and second-order GLM, the numbers are 18 and 16, respectively. Here, despite the prediction results of the GLM having more M1 neurons that pass the DTR-KS test than those of the second-order GLM, the prediction results of the passed M1 neurons in the second-order GLM achieve significantly higher goodness-of-fit than those of the GLM. And for the staged point-process model, the number is 21, which is higher than for the GLM and second-order GLM.

Figure 11:

Comparison of DBR results between the staged point-process model (green box), GLM (blue box), and second-order GLM (orange box) on test data. $*$DBR results of the staged point-process model are significantly smaller than GLM (one-tailed paired $t$-test, $p<$ 0.05); $**$Denotes DBR results of the staged point-process model is significantly smaller than second-order GLM(one-tailed paired $t$-test, $p<$ 0.05).

Figure 11:

Comparison of DBR results between the staged point-process model (green box), GLM (blue box), and second-order GLM (orange box) on test data. $*$DBR results of the staged point-process model are significantly smaller than GLM (one-tailed paired $t$-test, $p<$ 0.05); $**$Denotes DBR results of the staged point-process model is significantly smaller than second-order GLM(one-tailed paired $t$-test, $p<$ 0.05).

#### 3.2.2  Comparison of the Modified LM Algorithm with Newton-Raphson Algorithm and Gradient Ascent Algorithm

To evaluate the efficiency and robustness of the modified L-M algorithm on real data, we show the prediction on two M1 neurons (neuron 1 and neuron 19) as examples using the staged point-process model. The performance is then compared with the performance of the gradient ascent algorithm and Newton-Raphson algorithm on each M1 neuron prediction. Initialization and other settings of the staged point-process model are the same as in sections 3.1 and 3.2.1. The convergence curves are averaged over 10 trials and illustrated with a mean curve and a standard deviation ribbon in Figure 12. As we expected, we can see that the modified L-M algorithm converges at an average 93.8 iteration steps, while the gradient ascent algorithm needs 646.8 iteration steps. The convergences of the modified L-M algorithm and Newton-Raphson algorithm are faster than that of the gradient ascent algorithm. When the learning curve converges, the final DBR of the modified L-M algorithm is 1.22, which is averaged on 54 M1 neurons. The final DBR of the gradient ascent algorithm is 1.33, which is 8.3% lower than that of the modified L-M algorithm. And the final DBR of the Newton-Raphson algorithm is 1.98, which is 38.4% lower than that of the modified L-M algorithm. These results indicate that the modified L-M algorithm is more efficient than the gradient ascent algorithm on real data, as on simulated data.

Figure 12:

The convergence of DBR during training. (A) The convergence of the first M1 neuron. (B) The convergence of the 19th M1 neuron. Red, blue, and green curves and ribbons represent the convergence of the Newton-Raphson algorithm, gradient ascent algorithm, and modified L-M algorithm, respectively.

Figure 12:

The convergence of DBR during training. (A) The convergence of the first M1 neuron. (B) The convergence of the 19th M1 neuron. Red, blue, and green curves and ribbons represent the convergence of the Newton-Raphson algorithm, gradient ascent algorithm, and modified L-M algorithm, respectively.

We further statistically validate the convergence of the different algorithms on all M1 channels. The convergence steps and final performance of the different algorithms on all M1 neurons are illustrated in Figure 13. We can see that the Newton-Raphson algorithm converges very fast at the early steps (67.8 on average) at a relatively lower optimal DBR compared with the other algorithms. This seems to verify the point that the Newton-Raphson algorithm is affected by the existence of local optima, while our algorithm is less affected. And from Figure 13, we can also see that both the convergence steps and final performance of the modified L-M algorithm are distributed in the lower area of the plot, which is expected as we obtained the same result in the simulation.

Figure 13:

Convergence steps and final performance (distance-bound ratio and steps for convergence) on all the 54 M1 neurons of different optimization algorithms.

Figure 13:

Convergence steps and final performance (distance-bound ratio and steps for convergence) on all the 54 M1 neurons of different optimization algorithms.

## 4  Discussion and Conclusion

Modeling the dependence of spike trains among brain regions enables us to understand how information is communicated among neurons, which makes it possible to design prosthetic devices that compensate for the loss of neuronal communication. The widely used generalized linear model assumes the output spike depends on the linearly weighted firing history of inputs, and it considers neither the nonlinear interaction among input neurons nor the nonlinearity across the input history. Volterra kernels were introduced in GLMs to consider the nonlinear interaction among pairs of input spikes, which improved the capability in modeling nonlinear dynamics and in turn improved the performance of prediction. However, a lower-order Volterra kernel may limit the capability of neural interaction modeling, where the interaction could be arbitrarily nonlinear.

To address the nonlinear interaction among input neurons and the nonlinearity across the input history, we propose a staged point-process model, which can be described by a cascade of two L-N stages and an inhomogeneous Bernoulli process. These two LN stages theoretically allow arbitrary nonlinear interaction among input neurons as well as nonlinearity across the input firing history. The final stage of the model is the output spike generation using the inhomogeneous Bernoulli process. The model is estimated by maximizing the log likelihood on output spike trains. We show that it is theoretically equivalent to a two-layer neural network (in the appendix) but deals with spike observations probabilistically.

Compared to GLMs and GLMs with a Volterra kernel, the staged point-process model allows arbitrary nonlinearity to model the interactions among input neurons. According to Cybenko (1989), any continuous nonlinear function can be approximated by a linear combination of finite sigmoidal hidden units, which corresponds in our model to the arbitrary nonlinear interaction among the input neurons. Therefore, we anticipated much more powerful modeling of nonlinear interaction among input neurons than GLMs and GLMs with a second-order Volterra kernel. This is clearly supported in our synthetic data with different levels of nonlinear interaction between input neurons. When input spikes are linearly projected to generate output spikes, the performance of our model is similar to a GLM and GLM with a Volterra kernel. But when the level of nonlinear interaction increases, our model demonstrates its advantage, as it achieves a much lower distance-bound ratio in DTR-KS results. This is further validated in real spike train data recorded from PMd and M1 in a monkey performing a four-direction center-out task. The distance-bound ratios of our model are lower than those of the GLM and second-order GLM on average with statistical significance. Besides this, our model has the most M1 neurons whose spike prediction results on a test set passing the DTR-KS test. This indicates that the staged point-process model describes the nonlinearity among input spike trains, which leads to a better prediction performance on real data.

In addition, the staged point-process model uses fewer parameters to approximate the nonlinear relationship between input and output neurons. If we consider the correlation in a pairwise manner within the input space, the number of parameters in the second-order GLM increases by $n(n+1)/2+1$, where $n$ is the input dimension (the multiplication of the length of input history $H$ and number of input neurons $NX$). The Laguerre expansion methods in the time domain would help to reduce the number of parameters under the framework of a Volterra kernel when a longer history is considered. We solve this problem by introducing hidden units, the same idea as in ANNs, but we add the stage of spike generation for each output neuron. In our model, the number of parameters increases by $NZ(n+2)+1$, where $NZ$ is the number of hidden units. From our results, we see that the number of parameters as well as the computational burden increases much more slowly with the input dimension in the staged point-process model compared with the second-order GLM. $NZ$ is generally much smaller than $n/2$ for both cases in section 3. Thus, our proposed model uses fewer parameters to achieve better performance. This is validated on both the simulation and real data, where our model has better performance and relatively fewer free parameters than the second-order GLM.

Here, the maximization of the log likelihood is nonconcave due to the structure of a cascade of two LN stages, which may lead to the existence of local maxima. We develop a modified L-M optimization algorithm that enables adaptive learning by directly calculating the Hessian matrix from the log-likelihood function. It inherits the same properties as the L-M algorithm, that is, it is efficient in searching and not easily trapped in local optima (Levenberg, 1944; Marquardt, 1963). In both simulation and real data, the modified L-M algorithm demonstrates faster convergence and better prediction performance than both the gradient ascent algorithm and Newton-Raphson algorithm. All of these results indicate that the staged point-process model, together with the modified L-M algorithm, is a powerful and efficient tool that allows nonlinear interaction among input spikes as well as nonlinearity in spike generation. Thus, the application of this model could be extended to improve the performance of nonlinear modeling on neuronal spike activities recorded from other brain regions, such as hippocampal CA1 and CA3.

The staged point-process model is based on the assumption of hidden neurons that may not be recorded during neural information processing among input and output neurons. On real data, we show that only some of the selected input neurons successfully predict output spikes. We consider that hidden neurons are interconnected to form neural pathways, which may not be picked up individually by an electrode array but with a net effect. We computationally model the effects of these hidden neurons as several hidden units in a two-stage LN-LN Bernoulli model. Objectively, our proposed model with hidden units better predicts the output spike trains, but the actual anatomical structure between input and output neurons is not necessarily two-staged. Thus, the explanation of our model on the interaction among input neurons is not as straightforward as that for the second-order GLM. A model that approximates the real anatomical structure (Qi et al., 2018; Xu et al., 2018) and is easy to interpret should be investigated in future work.

## Appendix:  Derivation of the Staged Point-Process Model

In this letter, we derive the staged point-process model using the theory of statistical learning (Bishop, 1995, 2006; Hastie, Friedman, & Tibshirani, 2013). Here, our model allows the Bernoulli firing probability of output neuron $p(Yk|X)$ to change over time to describe an inhomogeneous spike train. If we collect all the samples of the output spike $y$ over All time indexes and assume each sample is conditionally independent on input $x$, $p(y|x)$ can be described by a static distribution. Therefore, we can estimate the stationary weight from the distribution of $p(y|x)$. We first denote the input spike trains, which precede and determine the firing of hidden neurons, as $x$. And $x$ contains a number of variables indexed with different input neurons and different times in history. Let $zj$ denote the spike events of the $jth$ hidden neuron and $z¯j$ be its complement (no spiking case). Using Bayes' rule, we can write the posterior probability of $zj$ conditioned on $x$ as
$p(zj∣x)=p(zj)p(x∣zj)p(zj)p(x∣zj)+p(z¯j)p(x∣z¯j),$
(A.1)
which could be written in the form
$p(zj∣x)=σlnp(zj)p(x∣zj)p(z¯j)p(x∣z¯j),$
(A.2)
where $σ(·)=1/{1+exp[-(·)]}$ is the logistic sigmoid function. Here we will make the naive Bayes' assumption in which spikes of $x$ at different historical bins or neurons are treated as independent, conditioned on $zj$ or $z¯j$. Since the values in $x$ are binary spikes, we assume the distribution of $x$ conditioned on $zj$ or $z¯j$ follows the Bernoulli distribution. Let $H$ be the temporal length of the spike train ensemble $x$. And we have
$p(x∣zj)=∏i=1NX∏h=1H[p(xih∣zj)]xih[1-p(xih∣zj)]1-xih,p(x∣z¯j)=∏i=1NX∏h=1H[p(xih∣z¯j)]xih[1-p(xih∣z¯j)]1-xih,$
(A.3)
where $xih$ denotes the spike of the $ith$ neuron during $(tk-1,tk]$ in the input spike train ensemble $x$. Substituting equation A.3 to A.2, the posterior probability of $zj$ conditioned on $x$ can be written as
$p(zj∣x)=σlnp(zj)∏i=1NX∏h=1H[p(xih∣zj)]xih[1-p(xih∣zj)]1-xihp(z¯j)∏i=1NX∏h=1H[p(xih∣z¯j)]xih[1-p(xih∣z¯j)]1-xih=σ∑i=1NX∑h=1Hxihlnp(xih∣zj)p(xih∣z¯j)+(1-xih)ln1-p(xih∣zj)1-p(xih∣z¯j)+lnp(zj)p(z¯j)=σ∑i=1NX∑h=1Hxihlnp(xih∣zj)[1-p(xih∣z¯j)]p(xih∣z¯j)[1-p(xih∣zj)]+∑i=1NX∑h=1Hln1-p(xih∣zj)1-p(xih∣z¯j)+lnp(zj)p(z¯j),$
(A.4)
where the distribution of $P(xih∣zj)$, $P(xih∣z¯j)$, $P(zj)$, and $P(z¯j)$ is unknown and can be derived by estimation. Here, we can use free parameter $ωi,hj$ to refer to the term $lnp(xih∣zj)[1-p(xih∣z¯j)]p(xih∣z¯j)[1-p(xih∣zj)]$ that represents the relationship between $xih$ and the firing of the $jth$ hidden neuron, and $ω0j$ to refer to the term $∑i=1NX∑h=1Hln1-p(xih∣zj)1-p(xih∣z¯j)+lnp(zj)p(z¯j)$ that represents the background level of the firing probability at the $jth$ hidden neuron. Let $λzj$ denote the $jth$ hidden unit, which represents the conditional firing probability of the $jth$ hidden neuron. Equation A.4 can be formed as an LN structure that linearly combines the effect of input spike ensemble $x$ and nonlinearly transforms to the hidden unit:
$λzj=σ∑i=1NX∑h=1Hxihωi,hj+ω0j.$
(A.5)
Based on the assumption of hidden neurons, the firing of Y is determined by the firing of hidden neurons. However, the firing of Y cannot be directly modeled on the firing of hidden neurons. This is because the parameters in our model are estimated by the maximum likelihood estimation (MLE) technique. When we perform gradient-based methods to maximize the likelihood, the gradient methods require all the internal and output variables in the computation to be differentiable. Thus, we use the hidden units $λz$, which is a continuous differentiable variable representing the firing probability of hidden neurons. Let $y$ be the spike firing of neuron Y and $y¯$ be its complement. Using Bayes' rule, the conditional firing probability of $y$ can be formed as the posterior probability of $y$ conditioned on hidden units $λz$, which can be written as
$p(y∣λz)=p(y)p(λz∣y)p(y)p(λz∣y)+p(y¯)p(λz∣y¯)=σlnp(y)p(λz∣y)p(y¯)p(λz∣y¯).$
(A.6)
We will then make the naive Bayes' assumption and treat each hidden unit value as independent and conditioned on $y$ or $y¯$. Thus, we have
$p(λz∣y)=∏j=1NZp(λzj∣y),p(λz∣y¯)=∏j=1NZp(λzj∣y¯).$
(A.7)
Here, $λz$ also denotes the conditional firing probabilities, which are continuous values ranging from 0 to 1, but the data distribution of $λz$ is unknown. From the perspective of statistical sufficiency, we can assume the distributions of $λzj$ conditioned on $y$ or $y¯$ are members of exponential family of distributions (Andersen, 1970; Bishop, 1995; Abramovich & Ritov, 2013). And the distribution of $λzj$ conditioned on $y$ or $y¯$ can be written as
$p(λzj∣y)=exp{A(Θ1j)+B(λzj,φ)+Θ1jλzj},p(λzj∣y¯)=exp{A(Θ2j)+B(λzj,φ)+Θ2jλzj},$
(A.8)
where $A(·)$ and $B(·)$ are known functions. If $φ$ is known, the parameters $Θ1j$ and $Θ2j$ are canonical parameters of the exponential family distributions. Here, both $Θ1j$, $Θ2j$ and $φ$ are unknown parameters.
Substituting equations A.7 and A.8 into A.6, the posterior probability of $y$ conditioned on hidden units can be written as
$p(y∣λz)=σlnp(y)∏j=1NZexp{A(Θ1j)+B(λzj,φ)+Θ1jλzj}p(y¯)∏j=1NZexp{A(Θ2j)+B(λzj,φ)+Θ2jλzj}=σ∑j=1NZlnexp{Θ1jλzj}exp{Θ2jλzj}+∑j=1NZlnexp{A(Θ1j)}exp{A(Θ2j)}+lnp(y)p(y¯)=σ∑j=1NZ(Θ1j-Θ2j)λzj+∑j=1NZ[A(Θ1j)-A(Θ2j)]+lnp(y)p(y¯),$
(A.9)
where the distributions of $p(y)$ and $p(y¯)$ are unknown and can be derived by estimation. Similar to equation A.5, we use free parameter $θj$ to refer to the term $(Θ1j-Θ2j)$ that represents the relationship between the firing of the output neuron and the $jth$ hidden unit, and $θ0$ to refer to the term $∑j=1NZ[A(Θ1j)-A(Θ2j)]+lnp(y)p(y¯)$ that represents the background level of the firing probability at the output neuron. Thus, equation A.9 can be formed as an LN structure that linearly combines the effect of hidden units $λz$ and nonlinearly transforms to the firing probability of the output neuron. Let $λy$ denote the conditional firing probability of output neuron, which can be written as
$λy=σ∑j=1NZθjλzj+θ0.$
(A.10)
By combining equations A.5 and A.10, we can write the conditional firing probability of the output neuron as
$λy=σ∑j=1NZθjσ∑i=1NX∑h=1Hxihωi,hj+ω0j+θ0.$
(A.11)
Equation A.11 demonstrates that the probabilistic model that maps spikes between input and output neurons is equivalent to a two-layer LN-LN Bernoulli model, which contains two cascaded L-N stages followed by an instantaneous Bernoulli spike generator.

## Acknowledgments

The corresponding authors are G.P. and Y.W.G.P.'s work is supported by the National Key Research and Development Program of China (2017YFB1002503) and the Zhejiang Provincial Natural Science Foundation of China (LR15F020001). Y. W.'s work is supported by the National Natural Science Foundation of China (grant 61836003) and partially supported by the HKUST Initiation Grant for New Faculty (IGN16EG10) and HKUST startup fund (R9395).

## References

Abramovich
,
F.
, &
Ritov
,
Y.
(
2013
).
Statistical theory: A concise introduction
.
Boca Raton, FL
:
CRC Press
.
Ahrens
,
M. B.
,
Paninski
,
L.
, &
Sahani
,
M.
(
2008
).
Inferring input nonlinearities in neural encoding models
.
Network: Computation in Neural Systems
,
19
(
1
),
35
67
.
Alataris
,
K.
,
Berger
,
T.
, &
Marmarelis
,
V.
(
2000
).
A novel network for nonlinear modeling of neural systems with arbitrary point-process inputs
.
Neural Networks
,
13
(
2
),
255
266
.
Andersen
,
E. B.
(
1970
).
Sufficiency and exponential families for discrete sample spaces
.
Journal of the American Statistical Association
,
65
(
331
),
1248
1255
.
Andersen
,
P.
,
Morris
,
R.
,
Amaral
,
D.
,
Bliss
,
T.
, &
O'Keefe
,
J.
(
2007
).
The hippocampus book
.
New York
:
Oxford University Press
.
Benjamin
,
A. S.
,
Fernandes
,
H. L.
,
Tomlinson
,
T.
,
Ramkumar
,
P.
,
VerSteeg
,
C.
,
Miller
,
L.
, &
Kording
,
K. P.
(
2017
).
Modern machine learning far outperforms GLMs at predicting spikes
.
bioRxiv:111450
.
Berger
,
T. W.
,
Ahuja
,
A.
,
Courellis
,
S. H.
,
,
S. A.
,
Erinjippurath
,
G.
,
Gerhardt
,
G. A.
, …
Wills
,
J.
(
2005
).
Restoring lost cognitive function
.
IEEE Engineering in Medicine and Biology Magazine
,
24
(
5
),
30
44
.
Berger
,
T. W.
,
Song
,
D.
,
Chan
,
R. H.
,
Marmarelis
,
V. Z.
,
LaCoss
,
J.
,
Wills
,
J.
, …
Granacki
,
J. J.
(
2012
).
A hippocampal cognitive prosthesis: Multi-input, multi-output nonlinear modeling and VLSI implementation
.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
,
20
(
2
),
198
211
.
Bishop
,
C. M.
(
1995
).
Neural network for pattern recognition
.
New York
:
Oxford University Press
.
Bishop
,
C. M.
(
2006
).
Pattern recognition and machine learning
.
Berlin
:
Springer
.
Boyd
,
S.
, &
Vandenberghe
,
L.
(
2004
).
Convex optimization
.
Cambridge
:
Cambridge University Press
.
Brillinger
,
D. R.
(
1988
).
Maximum likelihood analysis of spike trains of interacting nerve cells
.
Biological Cybernetics
,
59
(
3
),
189
200
.
Brown
,
E. N.
,
Barbieri
,
R.
,
Ventura
,
V.
,
Kass
,
R. E.
, &
Frank
,
L. M.
(
2002
).
The time-rescaling theorem and its application to neural spike train data analysis
.
Neural Computation
,
14
(
2
), 325.
Brown
,
E. N.
,
Frank
,
L. M.
,
Tang
,
D.
,
Quirk
,
M. C.
, &
Wilson
,
M. A.
(
1998
).
A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells
.
Journal of Neuroscience
,
18
(
18
),
7411
7425
.
Brown
,
E. N.
,
Kass
,
R. E.
, &
Mitra
,
P. P.
(
2004
).
Multiple neural spike train data analysis: State-of-the-art and future challenges
.
Nature Neuroscience
,
7
(
5
), 456.
Bryant
,
H. L.
, &
Segundo
,
J. P.
(
1976
).
Spike initiation by transmembrane current: A white-noise analysis
.
Journal of Physiology
,
260
(
2
),
279
314
.
Calabrese
,
A.
,
Schumacher
,
J. W.
,
Schneider
,
D. M.
,
Paninski
,
L.
, &
Woolley
,
S. M.
(
2011
).
A generalized linear model for estimating spectrotemporal receptive fields from responses to natural sounds
.
PloS One
,
6
(
1
),
e16104
.
Chichilnisky
,
E.
(
2001
).
A simple white noise analysis of neuronal light responses
.
Network: Computation in Neural Systems
,
12
(
2
),
199
213
.
Cybenko
,
G.
(
1989
).
Approximation by superpositions of a sigmoidal function
.
Mathematics of Control, Signals, and Systems
,
2
(
4
),
303
314
.
De Boer
,
E.
, &
Kuyper
,
P.
(
1968
).
Triggered correlation
.
IEEE Transactions on Biomedical Engineering
,
3
,
169
179
.
de Ruyter van Steveninck
,
R.
, &
Bialek
,
W.
(
1988
).
Real-time performance of a movement-sensitive neuron in the blowfly visual system: Coding and information transfer in short spike sequences
.
Proceedings of the Royal Society of London B: Biological Sciences
,
234
(
1277
),
379
414
.
Eggermont
,
J.
,
Johannesma
,
P.
, &
Aertsen
,
A.
(
1983
).
Reverse-correlation methods in auditory research
.
Quarterly Reviews of Biophysics
,
16
(
3
),
341
414
.
Eldawlatly
,
S.
,
Jin
,
R.
, &
Oweiss
,
K. G.
(
2009
).
Identifying functional connectivity in large-scale neural ensemble recordings: A multiscale data mining approach
.
Neural Computation
,
21
(
2
),
450
477
.
Ergun
,
A.
,
Barbieri
,
R.
,
Eden
,
U. T.
,
Wilson
,
M. A.
, &
Brown
,
E. N.
(
2007
).
Construction of point process adaptive filter algorithms for neural systems using sequential Monte Carlo methods
.
IEEE Transactions on Biomedical Engineering
,
54
(
3
),
419
428
.
Fitzgerald
,
J. D.
,
Rowekamp
,
R. J.
,
Sincich
,
L. C.
, &
Sharpee
,
T. O.
(
2011
).
Second order dimensionality reduction using minimum and maximum mutual information models
.
PLoS Computational Biology
,
7
(
10
),
e1002249
.
Geng
,
K.
, &
Marmarelis
,
V. Z.
(
2017
).
Methodology of recurrent Laguerre–Volterra network for modeling nonlinear dynamic systems
.
IEEE Transactions on Neural Networks and Learning systems
,
28
(
9
),
2196
2208
.
Geng
,
K.
,
Shin
,
D. C.
,
Song
,
D.
,
Hampson
,
R. E.
,
,
S. A.
,
Berger
,
T. W.
, &
Marmarelis
,
V. Z.
(
2018
).
Mechanism-based and input-output modeling of the key neuronal connections and signal transformations in the CA3-CA1 regions of the hippocampus
.
Neural Computation
,
30
(
1
),
149
183
.
Gerstner
,
W.
, &
Kistler
,
W. M.
(
2002
).
Spiking neuron models: Single neurons, populations, plasticity
.
Cambridge
:
Cambridge University Press
.
Guggenmos
,
D. J.
,
Azin
,
M.
,
Barbay
,
S.
,
Mahnken
,
J. D.
,
Dunham
,
C.
,
Mohseni
,
P.
, &
Nudo
,
R. J.
(
2013
).
Restoration of function after brain damage using a neural prosthesis
.
Proceedings of the National Academy of Sciences
,
110
(
52
),
21177
21182
.
Hagan
,
M. T.
, &
Menhaj
,
M. B.
(
1994
).
Training feedforward networks with the Marquardt algorithm
.
IEEE Transactions on Neural Networks
,
5
(
6
),
989
993
.
Hampson
,
R. E.
,
Gerhardt
,
G. A.
,
Marmarelis
,
V.
,
Song
,
D.
,
Opris
,
I.
,
Santos
,
L.
, …
,
S. A.
(
2012
).
Facilitation and restoration of cognitive function in primate prefrontal cortex by a neuroprosthesis that utilizes minicolumn-specific neural firing
.
Journal of Neural Engineering
,
9
(
5
),
056012
.
Haslinger
,
R.
,
Pipa
,
G.
, &
Brown
,
E.
(
2010
).
Discrete time rescaling theorem: Determining goodness of fit for discrete time statistical models of neural spiking
.
Neural Computation
,
22
(
10
),
2477
2506
.
Hastie
,
T.
,
Friedman
,
J.
, &
Tibshirani
,
R.
(
2013
).
The elements of statistical learning: Data mining, inference, and prediction
.
Berlin
:
Springer
.
Holland
,
P. W.
, &
Welsch
,
R. E.
(
1977
).
Robust regression using iteratively reweighted least-squares
.
Communications in Statistics—Theory and Methods
,
6
(
9
),
813
827
.
Horwitz
,
G. D.
,
Chichilnisky
,
E.
, &
Albright
,
T. D.
(
2005
).
Blue-yellow signals are enhanced by spatiotemporal luminance contrast in macaque V1
.
Journal of Neurophysiology
,
93
(
4
),
2263
2278
.
Kelly
,
R.
,
Smith
,
M.
,
Kass
,
R.
, &
Lee
,
T. S.
(
2010
). Accounting for network effects in neuronal responses using L1 regularized point process models. In
J. D.
Lafferty
,
C. K. I.
Williams
,
J.
Shawe-Taylor
,
R. S.
Zemel
, &
A.
Culotta
(Eds.).
Advances in neural information processing systems
,
23
(pp.
1099
1107
).
Red Hook, NY
:
Curran
.
Kůrková
,
V.
(
1991
).
Kolmogorov's theorem is relevant
.
Neural Computation
,
3
(
4
),
617
622
.
Levenberg
,
K.
(
1944
).
A method for the solution of certain non-linear problems in least squares
.
Quarterly of Applied Mathematics
,
2
(
2
),
164
168
.
Marmarelis
,
P. Z.
, &
Naka
,
K. I.
(
1972
).
White-noise analysis of a neuron chain: An application of the Wiener theory
.
Science
,
175
(
4027
),
1276
1278
.
Marmarelis
,
V. Z.
(
1993
).
Identification of nonlinear biological systems using Laguerre expansions of kernels
.
Annals of Biomedical Engineering
,
21
(
6
),
573
589
.
Marmarelis
,
V. Z.
,
Shin
,
D. C.
,
Song
,
D.
,
Hampson
,
R. E.
,
,
S. A.
, &
Berger
,
T. W.
(
2013
).
Nonlinear modeling of dynamic interactions within neuronal ensembles using principal dynamic modes
.
Journal of Computational Neuroscience
,
34
(
1
),
73
87
.
Marmarelis
,
V. Z.
,
Shin
,
D. C.
,
Song
,
D.
,
Hampson
,
R. E.
,
,
S. A.
, &
Berger
,
T. W.
(
2014
).
On parsing the neural code in the prefrontal cortex of primates using principal dynamic modes
.
Journal of Computational Neuroscience
,
36
(
3
),
321
337
.
Marquardt
,
D. W.
(
1963
).
An algorithm for least-squares estimation of nonlinear parameters
.
Journal of the Society for Industrial and Applied Mathematics
,
11
(
2
),
431
441
.
McCullagh
,
P.
(
1984
).
Generalized linear models
.
European Journal of Operational Research
,
16
(
3
),
285
292
.
McFarland
,
J. M.
,
Cui
,
Y.
, &
Butts
,
D. A.
(
2013
).
Inferring nonlinear neuronal computation based on physiologically plausible inputs
.
PLoS Computational Biology
,
9
(
7
),
e1003143
.
Nicolelis
,
M. A.
,
Dimitrov
,
D.
,
Carmena
,
J. M.
,
Crist
,
R.
,
Lehew
,
G.
,
Kralik
,
J. D.
, &
Wise
,
S. P.
(
2003
).
Chronic, multisite, multielectrode recordings in macaque monkeys
.
Proceedings of the National Academy of Sciences
,
100
(
19
),
11041
11046
.
Nirenberg
,
S.
, &
Latham
,
P. E.
(
2003
).
Decoding neuronal spike trains: How important are correlations?
Proceedings of the National Academy of Sciences
,
100
(
12
),
7348
7353
.
Okatan
,
M.
,
Wilson
,
M. A.
, &
Brown
,
E. N.
(
2005
).
Analyzing functional connectivity using a network likelihood model of ensemble neural spiking activity
.
Neural Computation
,
17
(
9
),
1927
1961
.
Paninski
,
L.
(
2003
). Convergence properties of some spike-triggered analysis techniques. In
S.
Becker
,
S.
Thrun
, &
K.
Oberrnayer
(Eds.),
Advances in neural information processing systems, 15
(pp.
189
196
).
Cambridge, MA
:
MIT Press
.
Paninski
,
L.
(
2004
). Maximum likelihood estimation of cascade point-process neural encoding models.
Network: Computation in Neural Systems
,
15
(
4
),
243
262
.
Park
,
I. M.
,
Meister
,
M. L.
,
Huk
,
A. C.
, &
Pillow
,
J. W.
(
2014
).
Encoding and decoding in parietal cortex during sensorimotor decision-making
.
Nature Neuroscience
,
17
(
10
),
1395
1403
.
Park
,
I. M.
, &
Pillow
,
J. W.
(
2011
). Bayesian spike-triggered covariance analysis. In
J.
Shawe-Taylor
,
R. S.
Zemel
,
P. L.
Bartlett
,
F.
Pereira
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems, 24
(pp.
1692
1700
).
Red Hook, NY
:
Curran
.
Pillow
,
J. W.
, &
Simoncelli
,
E. P.
(
2003
). Biases in white noise analysis due to non-Poisson spike generation.
Neurocomputing
,
52
,
109
115
.
Prenger
,
R.
,
Wu
,
M. C. K.
,
David
,
S. V.
, &
Gallant
,
J. L.
(
2004
).
Nonlinear V1 responses to natural scenes revealed by neural network analysis
.
Neural Networks
,
17
(
5–6
),
663
679
.
Qi
,
Y.
,
Shen
,
J.
,
Wang
,
Y.
,
Tang
,
H.
,
Yu
,
H.
,
Wu
,
Z.
, &
Pan
,
G.
(
2018
).
Jointly learning network connections and link weights in spiking neural networks
. In
Proceedings of the 27th International Joint Conferences on Artificial Intelligence
(pp.
1597
1603
).
International Joint Conferences on Artificial Intelligence
.
Quinn
,
C. J.
,
Coleman
,
T. P.
,
Kiyavash
,
N.
, &
Hatsopoulos
,
N. G.
(
2011
).
Estimating the directed information to infer causal relationships in ensemble neural spike train recordings
.
Journal of Computational Neuroscience
,
30
(
1
),
17
44
.
Rieke
,
F.
,
Warland
,
D.
,
de Ruyter van Steveninck
,
R.
, &
Bialek
,
W.
(
1999
).
Spikes: Exploring the neural code
.
Cambridge, MA
:
MIT Press
.
Ringach
,
D. L.
,
Sapiro
,
G.
, &
Shapley
,
R.
(
1997
).
A subspace reverse-correlation technique for the study of visual neurons
.
Vision Research
,
37
(
17
),
2455
2464
.
Rust
,
N. C.
,
Schwartz
,
O.
,
Movshon
,
J. A.
, &
Simoncelli
,
E.
(
2004
).
Spike-triggered characterization of excitatory and suppressive stimulus dimensions in monkey V1
.
Neurocomputing
,
58
,
793
799
.
Samengo
,
I.
, &
Gollisch
,
T.
(
2013
).
Spike-triggered covariance: Geometric proof, symmetry properties, and extension beyond gaussian stimuli
.
Journal of Computational Neuroscience
,
34
(
1
),
137
161
.
Sanchez
,
J. C.
,
Kim
,
S.-P.
,
Erdogmus
,
D.
,
Rao
,
Y. N.
,
Principe
,
J. C.
,
Wessberg
,
J.
, &
Nicolelis
,
M.
(
2002
).
Input-output mapping performance of linear and nonlinear models for estimating hand trajectories from cortical neuronal firing patterns
. In
Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing
(pp.
139
148
).
IEEE Piscataway, NJ
.
Schwartz
,
O.
,
Chichilnisky
,
E.
, &
Simoncelli
,
E. P.
(
2002
). Characterizing neural gain control using spike-triggered covariance. In
T. G.
Dietterich
,
S.
Becker
, &
Z.
Ghahramani
(Eds.),
Advances in neural information processing systems
,
14
(pp.
269
276
).
Cambridge, MA
:
MIT Press
.
Schwartz
,
O.
,
Pillow
,
J. W.
,
Rust
,
N. C.
, &
Simoncelli
,
E. P.
(
2006
). Spike-triggered neural characterization.
Journal of Vision
,
6
(
4
),
13
.
Serruya
,
M. D.
,
Hatsopoulos
,
N. G.
,
Paninski
,
L.
,
Fellows
,
M. R.
, &
Donoghue
,
J. P.
(
2002
).
Brain-machine interface: Instant neural control of a movement signal
.
Nature
,
416
(
6877
),
141
142
.
Sharpee
,
T. O.
,
Atencio
,
C. A.
, &
Schreiner
,
C. E.
(
2011
).
Hierarchical representations in the auditory cortex
.
Current Opinion in Neurobiology
,
21
(
5
),
761
767
.
Sharpee
,
T.
,
Rust
,
N. C.
, &
Bialek
,
W.
(
2004
).
Analyzing neural responses to natural signals: Maximally informative dimensions
.
Neural Computation
,
16
(
2
),
223
250
.
Shepherd
,
G. M.
(
2004
).
The synaptic organization of the brain
.
New York
:
Oxford University Press
.
Slee
,
S. J.
,
Higgs
,
M. H.
,
Fairhall
,
A. L.
, &
Spain
,
W. J.
(
2005
).
Two-dimensional time coding in the auditory brainstem
.
Journal of Neuroscience
,
25
(
43
),
9978
9988
.
Song
,
D.
,
Chan
,
R. H.
,
Marmarelis
,
V. Z.
,
Hampson
,
R. E.
,
,
S. A.
, &
Berger
,
T. W.
(
2007
).
Nonlinear dynamic modeling of spike train transformations for hippocampal-cortical prostheses
.
IEEE Transactions on Biomedical Engineering
,
54
(
6
),
1053
1066
.
Song
,
D.
,
Chan
,
R. H.
,
Marmarelis
,
V. Z.
,
Hampson
,
R. E.
,
,
S. A.
, &
Berger
,
T. W.
(
2009
).
Nonlinear modeling of neural population dynamics for hippocampal prostheses
.
Neural Networks
,
22
(
9
),
1340
1351
.
Stevenson
,
I. H.
,
Rebesco
,
J. M.
,
Hatsopoulos
,
N. G.
,
Haga
,
Z.
,
Miller
,
L. E.
, &
Kording
,
K. P.
(
2009
).
Bayesian inference of functional connectivity and network structure from spikes
.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
,
17
(
3
),
203
213
.
Tibshirani
,
R.
(
1996
).
Regression shrinkage and selection via the LASSO
.
Journal of the Royal Statistical Society. Series B (Methodological)
58
,
267
288
.
Trappenberg
,
T. P.
(
2002
).
Fundamentals of computational neuroscience
.
New York
:
Oxford University Press
.
Truccolo
,
W.
,
Eden
,
U. T.
,
Fellows
,
M. R.
,
Donoghue
,
J. P.
, &
Brown
,
E. N.
(
2005
).
A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects
.
Journal of Neurophysiology
,
93
(
2
),
1074
1089
.
Wang
,
Y.
,
Paiva
,
A. R.
,
Príncipe
,
J. C.
, &
Sanchez
,
J. C.
(
2009
).
Sequential Monte Carlo point-process estimation of kinematics from neural spiking activity for brain-machine interfaces
.
Neural Computation
,
21
(
10
),
2894
2930
.
Wang
,
Y.
,
Wang
,
F.
,
Xu
,
K.
,
Zhang
,
Q.
,
Zhang
,
S.
, &
Zheng
,
X.
(
2015
).
Neural control of a tracking task via attention-gated reinforcement learning for brain-machine interfaces
.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
,
23
(
3
),
458
467
.
Wessberg
,
J.
,
Stambaugh
,
C. R.
,
Kralik
,
J. D.
,
Beck
,
P. D.
,
Laubach
,
M.
,
Chapin
,
J. K.
, …
Nicolelis
,
M. A.
, (
2000
).
Real-time prediction of hand trajectory by ensembles of cortical neurons in primates
.
Nature
,
408
(
6810
),
361
.
Wilamowski
,
B. M.
, &
Yu
,
H.
(
2010
).
Improved computation for Levenberg-Marquardt training
.
IEEE Transactions on Neural Networks
,
21
(
6
),
930
937
.
Wu
,
A.
,
Park
,
I. M.
, &
Pillow
,
J. W.
(
2015
). Convolutional spike-triggered covariance analysis for neural subunit models. In
C.
Cortes
,
N. D.
Lawrence
,
D. D.
Lee
,
M.
Sugiyama
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems
,
28
(pp.
793
801
).
Red Hook, NY
:
Curran
.
Xing
,
D.
,
Qian
,
C.
,
Li
,
H.
,
Zhang
,
S.
,
Zhang
,
Q.
,
Hao
,
Y.
, …
Pan
,
G.
(
2017
). Predicting spike trains from PMd to M1 using discrete time rescaling targeted GLM.
IEEE Transactions on Cognitive and Developmental Systems
,
10
,
194
204
.
Xu
,
K.
,
Wang
,
Y.
,
Wang
,
Y.
,
Wang
,
F.
,
Hao
,
Y.
,
Zhang
,
S.
, …
Zheng
,
X.
(
2013
).
Local-learning-based neuron selection for grasping gesture prediction in motor brain machine interfaces
.
Journal of Neural Engineering
,
10
(
2
),
026008
.
Xu
,
Q.
,
Qi
,
Y.
,
Yu
,
H.
,
Shen
,
J.
,
Tang
,
H.
, &
Pan
,
G.
(
2018
).
CSNN: An augmented spiking based framework with perceptron-inception
. In
Proceedings of the 27th International Joint Conferences on Artificial Intelligence
(pp.
1646
1652
).
International Joint Conferences on Artifical Intelligence
.
Zanos
,
T. P.
,
Courellis
,
S. H.
,
Berger
,
T. W.
,
Hampson
,
R. E.
,
,
S. A.
, &
Marmarelis
,
V. Z.
(
2008
).
Nonlinear modeling of causal interrelationships in neuronal ensembles
.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
,
16
(
4
),
336
352
.