## Abstract

Motor brain machine interfaces (BMIs) interpret neural activities from motor-related cortical areas in the brain into movement commands to control a prosthesis. As the subject adapts to control the neural prosthesis, the medial prefrontal cortex (mPFC), upstream of the primary motor cortex (M1), is heavily involved in reward-guided motor learning. Thus, considering mPFC and M1 functionality within a hierarchical structure could potentially improve the effectiveness of BMI decoding while subjects are learning. The commonly used Kalman decoding method with only one simple state model may not be able to represent the multiple brain states that evolve over time as well as along the neural pathway. In addition, the performance of Kalman decoders degenerates in heavy-tailed nongaussian noise, which is usually generated due to the nonlinear neural system or influences of movement-related noise in online neural recording. In this letter, we propose a hierarchical model to represent the brain states from multiple cortical areas that evolve along the neural pathway. We then introduce correntropy theory into the hierarchical structure to address the heavy-tailed noise existing in neural recordings. We test the proposed algorithm on in vivo recordings collected from the mPFC and M1 of two rats when the subjects were learning to perform a lever-pressing task. Compared with the classic Kalman filter, our results demonstrate better movement decoding performance due to the hierarchical structure that integrates the past failed trial information over multisite recording and the combination with correntropy criterion to deal with noisy heavy-tailed neural recordings.

## 1  Introduction

Brain-machine interfaces (BMIs) provide a direct communication channel between the brain and prosthetic devices that bypasses the peripheral nervous system. The neural activities in multiple cortices are recorded for further analysis. Signal processing methods, such as population coding and Kalman filters, were applied to decode neural activity to predict movement intention to assist people with disabilities (Nicolelis, 2003; Lebedev & Nicolelis, 2006; Taylor, Tillery, & Schwartz, 2002; Musallam, Corneil, Greger, Scherberger, & Andersen, 2004; Velliste, Perel, Spalding, Whitford, & Schwartz, 2008; O'Doherty, Lebedev, Hanson, Fitzsimmons, & Nicolelis, 2009; Gilja et al., 2012; Churchland et al., 2012; Orsborn et al., 2014; Vyas et al., 2018; Hochberg et al., 2006, 2012; Truccolo, Friehs, Donoghue, & Hochberg, 2008; Moritz, Perlmutter, & Fetz, 2008; Collinger et al., 2013; Gilja et al., 2015; Bouton et al., 2016; Ajiboye et al., 2017; Brandman et al., 2018; Wolpaw et al., 2000; Chapin, Moxon, Markowitz, & Nicolelis, 1999; Chhatbar & Francis, 2013). These cortical areas were mainly motor-related areas, such as primary motor cortex (M1) (Li et al., 2017), premotor cortex (PMd) (Churchland, Yu, Ryu, Santhanam, & Shenoy, 2006), and parietal cortex (Pesaran, Pezaris, Sahani, Mitra, & Andersen, 2002).

Motor control in the central nervous system has a hierarchical structure. In a descending motor hierarchical pathway, the premotor cortex receives information from the prefrontal cortex and works with the supplementary motor areas to pass information to the primary motor cortex, and then to the spinal cord to execute the movement (Gazzaniga, Ivry, & Mangun, 2008). In particular, the medial prefrontal cortex (mPFC), anatomically related to both the anterior cingulate cortex (ACC) and the dorsolateral PFC of primates, is involved heavily in the learning process of motor tasks (Uylings, Groenewegen, & Kolb, 2003; Brand, Recknor, Grabenhorst, & Bechara, 2007; Rushworth, Noonan, Boorman, Walton, & Behrens, 2011; Otis et al., 2017; Kamigaki, 2019; Ito, Stuphorn, Brown, & Schall, 2003; Matsumoto & Hikosaka, 2007; Wang et al., 2018; Wunderlich, Rangel, & O'Doherty, 2009; Alexander & Brown, 2019; Silvetti, Seurinck, & Verguts, 2011; Vassena, Holroyd, & Alexander, 2017). The neural activity in mPFC is closely correlated with the prediction error over the course of adaptation (Silvetti, Seurinck, & Verguts, 2011; Vassena et al., 2017; Alexander & Brown, 2011). In other words, mPFC signals the discrepancies between actual and predicted outcomes during learning. This error will be used to improve movement generation and is important when the subject adapts his or her brain signals to control the prosthesis. Thus, considering the hierarchical structure in the brain that involves the functionality of mPFC could potentially improve the effectiveness of BMI while the subject is learning.

Kalman filters were widely used to decode the neural activity in BMIs (Malik, Truccolo, Brown, & Hochberg, 2011; Homer et al., 2014; Wu, Gao, Bienenstock, Donoghue, & Black, 2006). The neural activity was taken as the observation and was linked to the movement states by a linear observation function. The movement was modeled as the internal state and estimated from observation at each time instance. The advantage of using a Kalman filter as the decoder was to model the temporal dynamics of brain state (movement intention). It was especially helpful for BMIs in the pure brain control without actual movement, where the brain state continuously changed over time without the need to detect the initialization of a certain movement. Actually, neural states differed across cortical areas (Kamigaki, 2019). The state in mPFC indicates the discrepancies between the actual and predicted outcomes in learning, the state in premotor cortex demonstrates the planning of motion, and the state in primary motor cortex shows the execution of the movement. All the states evolve over time and pass information along the neural pathway. However, a typical Kalman filter decoder uses only one brain state to roughly represent the neural activity observations recorded from multiple cortical areas. This may ignore the dynamics of the neural information passing along the pathway and potentially limit the decoding performance. Moreover, Kalman filter the adopts the minimum mean square error (MMSE) criterion, which assumes gaussian distributed noise. In neural systems, which are typically nonlinear, nongaussian noise may occur due to the nonlinearity of neural systems or the influence of movement-related noise during the neural recording on freely behaving subjects. In both cases, the performance of Kalman filter decoders can be severely degraded (Schick & Mitter, 1994). As a solution to the problem of nongaussian noise, information theoretic learning (ITL) was proposed to greatly improve performance in nongaussian noise environments (Principe, 2010; Liu, Pokharel, & Principe, 2007; Singh & Principe, 2009). In ITL, the optimization criterion was proposed based on a new criterion, correntropy, to measure all even-order statistics of the distribution other than just the second-order statistics (Shi & Lin, 2014; He, Hu, Zheng, & Kong, 2011; He, Zheng, & Hu, 2011; Du, Tang, Wang, Zhang, & Tao, 2019; Mandanas & Kotropoulos, 2017; Liu, Qu, Zhao, & Chen, 2017). In recent work, a linear maximum correntropy Kalman filter was developed by researchers, and it exhibited superior performance in the presence of heavy-tailed nongaussian noise by taking advantage of the robustness of correntropy (Chen, Liu, Zhao, & Principe, 2017).

In this letter, we propose a hierarchical dynamic model, correntropy-based hierarchical Kalman filter (CHKF), to improve movement decoding accuracy from multiple cortical neural recordings. The hierarchical structure models the brain states in the different cortical areas as the hidden states, which evolve over time and along the neural pathway. In particular, we use the hierarchical model to integrate the error information extracted from the medial prefrontal cortex while the subject is learning a new task. The failed information will be passed to better estimate the state of the movement execution in the upcoming trial. Moreover, we introduce correntropy theory into the hierarchical structure to address the possible presence of heavy-tailed nongaussian noise that results from the linear approximation of the neural system or from influences of movement-related noise during neural recording. We verify our model on in vivo neural recordings of two rats' learning process of a lever-pressing task to decode the movement from multiple cortical recordings. The neural activities are simultaneously recorded from M1 and mPFC. The mPFC and M1 cortical states estimated from each cortical recording both evolve over time as described by the state function in the hierarchical dynamical model. The discrepancy signal during the rewarding stage indicated by mPFC states in the previous trial will contribute to the estimation of the movement state in the upcoming trial. The proposed algorithm is applied to decode movement states from observations of mPFC and motor cortical activity. To demonstrate the advantage of the hierarchical structure, the decoding performance of CHKF is compared with those of a classic Kalman filter (KF) with observation from M1 activity only, as well as from pooling multiple cortical activities into higher-dimensional observations. We also compare CHKF with hierarchical Kalman filter (HKF) to demonstrate the advantage of combining the correntropy into the hierarchical structure in addressing the presence of heavy-tailed nongaussian noise in neural recordings. Using a square root transform (SRT) on the observations is a commonly used preprocessing to make the binned neural firing counts more gaussian distributed. We use the SRT processing on neural firing rates before applying the above algorithms and compare the performance of the hierarchical model with the classic Kalman model. We use mean square error (MSE) to evaluate the movement estimation compared with the ground truth.

The rest of the letter is organized as follows. In section 2, the hierarchical model, the proposed correntropy-based hierarchical Kalman filter, and related algorithms are introduced. In section 3, the real data application on two rats' learning procedure of a lever-pressing task is used to demonstrate the superior performance of the proposed algorithm compared with those of related algorithms. In section 4, the conclusion is given.

## 2  Theory

### 2.1  Hierarchical Model

In this section, we propose a hierarchical model according to the physiological pathway of action control. When the external stimuli are presented, the signals pass along the neural pathway to generate the action. The hierarchical model can be built as follows:
$q(k)=I(q(k-1))+w3(k-1),$
(2.1)
$u(k)=Bu(k-1)+Tq(k-1)+w1(k-1),$
(2.2)
$x(k)=Fx(k-1)+Au(k-1)+w2(k-1),$
(2.3)
where $k$ denotes the time instance; $q(k)$ denotes the cue or external stimuli sequence; $u(k)$ denotes the brain state of the cortical area in the upstream of the neural pathway (e.g., the anticipation of the coming reward with the prepared action); $x(k)$ denotes the brain state of the cortical area in the downstream of the neural pathway (e.g., the intention of the movement execution); $I·$, $B·$, and $F·$ stand for the state transition functions for $q(k)$, $u(k)$, and $x(k)$, respectively, which indicate the brain states evolve concerning the past; $T·$ stands for the coupling function that links the state $u(k)$ to the external stimuli $q(k)$; $A·$ stands for the coupling function that links the state $x(k)$ to the state $u(k)$ in the upstream; and $w1(k)$, $w2(k)$, and $w3(k)$ denote the process noises.
The measurement information involves recording from the multiple cortical areas when available. The measurement model is built as follows:
$z(k)=H1u(k)+v1(k),$
(2.4)
$y(k)=H2x(k)+v2(k),$
(2.5)
where $z(k)$ denotes the observation (recording of multiple neural firings) in the upstream of the neural pathway with respect to $u(k)$; $y(k)$ denotes the observation (recording of multiple neural firings) in the downstream of the neural pathway with respect to $x(k)$; $H1·$ and $H2·$ stand for the corresponding measurement functions, which represent the neural tuning property; and $v1(k)$ and $v2(k)$ denote the measurement noises.

### 2.2  Correntropy-Based Hierarchical Kalman Filter

In general, we assume that $q(k)$ is a known quantity, which is regarded as a sequence of instructions, such as cues or stimuli. Here we approximate the nonlinear transition function and coupling function in the state model with linear matrices to theoretically derive the simple solution. Then equations 2.2 and 2.3 are rewritten as
$u(k)=Bu(k-1)+Tq(k-1)+w1(k-1),$
(2.6)
$x(k)=Fx(k-1)+Au(k-1)+w2(k-1),$
(2.7)
where $B$ and $F$ stand for the state transition matrices with respect to $u(k)$ and $x(k)$, $T$ and $A$ stand for the coupling matrices with respect to $q(k)$ and $u(k)$, and $w1(k)$ and $w2(k)$ denote the process noises.
The hierarchical structure models the states from different cortical areas considering the time dynamics evolving between the brain states along the neural pathway. Here, the nonlinear measurement equations are also approximated by linear equations, and equations 2.4 to 2.7 can be rewritten in equations 2.8 and 2.9 in matrix form as
$X˜(k)=F˜X˜(k-1)+T˜q(k-1)+w˜(k-1),$
(2.8)
$Y˜(k)=H˜X˜(k)+v˜(k),$
(2.9)
where $X˜(k)=u(k)x(k)$, $F˜=B0AF$, $T˜=T0$, $w˜(k-1)=w1(k-1)w2(k-1)$, $Y˜(k)=z(k)y(k)$, $H˜=H100H2$, and $v˜(k)=v1(k)v2(k)$.

Note that our goal is to estimate $u(k)$ and $x(k)$ from $z(k)$ and $y(k)$. The current design of the hierarchical model allows the sparse distribution of transition matrices $F˜$ and $T˜$ and observation matrix $H˜$, which potentially reduces computational complexity.

We introduce correntropy as the new criterion into the hierarchical model to better estimate the state, which has the potential to address the nongaussian noise in real neural recordings. Correntropy measures the similarity between two random variables considering all the even-order statistics as in equation 2.10,
$V(X,Y)=Eκ(X,Y)=∫κ(x,y)dFXY(x,y),$
(2.10)
where $E$ denotes the expectation operator, $FXY(x,y)$ represents the joint distribution function with respect to the two random variables $X$ and $Y$, $κ·,·$ is a shift-invariant Mercer kernel, and the gaussian kernel is chosen in this letter, that is,
$κ(x,y)=Gσ(e)=exp-e22σ2,$
(2.11)
where $e=x-y$ and $σ>0$ denotes the kernel bandwidth. Taking a Taylor series expansion of the gaussian kernel, it is noted that correntropy contains not only the second-order moment but also all higher-even-order statistical information. Especially when the kernel bandwidth $σ$ is 20 times larger than the values selected according to Silverman's rule (Liu, Pokharel, & Principe, 2007), the second-order moment plays a leading role in correntropy.
For our neural decoding, we first estimate the prior mean $X˜^(k|k-1)$ and covariance matrix $P˜(k|k-1)$ of the state $X˜(k)$, which are obtained by
$X˜^(k|k-1)=F˜X˜^(k-1|k-1)+T˜q(k-1),$
(2.12)
$P˜(k|k-1)=F˜P˜(k-1|k-1)F˜T+W˜(k-1).$
(2.13)
Then we use the sample mean estimator to define the new cost function,
$JX˜(k)=1n+m∑i=1n+mGσd˜i(k)-m˜i(k)X˜(k),$
(2.14)
where $n$ is the dimension of $X˜(k)$, $m$ is the dimension of $Y˜(k)$, $d˜i(k)$ denotes the $i$th element of $D˜(k)$, $m˜i(k)$ denotes the $i$th row of $M˜(k)$, and $D˜(k)=S˜-1(k)X˜^(k|k-1)Y˜(k)$, $M˜(k)=S˜-1(k)IH˜$, where $S˜(k)$ can be obtained by making a Cholesky decomposition on the covariance of the noise term $δ˜(k)=-X˜(k)-X˜^(k|k-1)v˜(k)$ and denoted as $S˜(k)=S˜p(k)00S˜v(k)$.
By maximizing the cost function in equation 2.14, we set the first derivative of the cost function to be zero, which yields
$X˜(k)=M˜T(k)C˜(k)M˜(k)-1M˜T(k)C˜(k)D˜(k),$
(2.15)
where
$C˜(k)=C˜X(k)00C˜Y(k),C˜X(k)=diagGσe˜1(k),…,Gσe˜n(k),C˜Y(k)=diagGσe˜n+1(k),…,Gσe˜n+m(k),e˜i(k)=d˜i(k)-m˜i(k)X˜(k).$
Equation 2.15 can be further expressed as equation 2.16 (the detailed derivation is in appendix A). The posterior mean and the covariance matrix of $X˜(k)$ are given as
$X˜(k)=X˜^(k|k-1)+K˜¯(k)Y˜(k)-H˜X˜^(k|k-1),$
(2.16)
$P˜(k|k)=I-K˜¯(k)H˜P˜¯(k|k-1)I-K˜¯(k)H˜T+K˜¯(k)V˜¯(k)K˜¯(k)T,$
(2.17)
where
$K˜¯(k)=P˜¯(k|k-1)H˜TH˜P˜¯(k|k-1)H˜T+V˜¯(k)-1,P˜¯(k|k-1)=S˜p(k|k-1)C˜X-1(k)S˜pT(k|k-1),V˜¯(k)=S˜v(k)C˜Y-1(k)S˜vT(k).$
In particular, $K˜¯(k)$ is the revised Kalman gain, which is a function of the revised prior covariance matrix $P˜¯(k|k-1)$ and the revised covariance matrix of measurement noise $V˜¯(k)$. $C˜X(k)$ is a function of $X˜(k)$, and $P˜¯(k|k-1)$ is a function of $C˜X(k)$. Similarly, $C˜Y(k)$ is also a function of $X˜(k)$, and $V˜¯(k)$ is a function of $C˜Y(k)$. Therefore equation 2.16 is actually a fixed-point equation with respect to $X˜(k)$. Here we solve $X˜(k)$ in equation 2.16 using the fixed-point iterative method (Chen et al., 2017; Singh & Principe, 2010), which is faster than the gradient-based iterative method (Singh & Principe, 2009; Shi & Lin, 2014) and does not involve the free parameter (i.e., step size). The iteration stops when
$X˜^(t)(k|k)-X˜^(t-1)(k|k)X˜^(t-1)(k|k)≤ɛ,$
(2.18)
with superscript $(t)$ denoting the iteration index and $ɛ$ being a small, positive value, or the iterative index reaches a preset value.

The convergence of the estimation $X˜(k)$ is influenced by the kernel bandwidth $σ$. The sufficient condition with respect to the choice of kernel bandwidth to ensure the convergence of the fixed-point iteration in CHKF is provided in appendix B.

In general, the smaller the kernel bandwidth is, the more robust the algorithm. But the kernel bandwidth has a low-bound value to guarantee the convergence of the fixed-point method. On the other hand, when the kernel bandwidth becomes increasingly larger, the performance of the algorithm will behave like the corresponding hierarchical Kalman filter (HKF). Especially, if $σ→∞$, then $C˜(k)→I$, and the proposed algorithm will reduce to the HKF.

The CHKF algorithm is summarized in algorithm 1.

### 2.3  Related Algorithms

In this section, we introduce two related methods for comparison. The first is the classic Kalman filter with only one simple state model but takes in the observation from single or multiple sites' cortical recordings, which is compared with CHKF to show the superior performance of the hierarchical model. The second is the hierarchical Kalman filter (HKF), which is compared with CHKF to show the improved performance of correntropy theory in the presence of the heavy-tailed, nongaussian noise. A square root transform (SRT) is used as an optimal preprocessing on the firing rate observations to make the data more toward gaussian distribution for all the algorithms, including the classic Kalman and hierarchical model.

#### 2.3.1  Kalman Filter as a Neural Decoder

In this letter, the classic Kalman filter has a shallow structure as there is only one state model to describe the dynamic of cortical states, and the observation can be recorded from M1 activity only as well as pooling multiple cortical activities into a higher-dimensional observation vector. Then the corresponding state and observation models are written as
$x(k)=Fx(k-1)+w(k-1),$
(2.19)
$y(k)=Hx(k)+v(k),$
(2.20)
where $x(k)$ denotes movement state at time $k$ and $y(k)$ denotes the observation on neural activity. The observation can be binned spike counts per 100 ms from one (KF1) or two (KF2) cortical areas. The observation could also be the square root of the neural firing rate of each channel from only one (KF1_SRT) or two (KF2_SRT) cortical areas. $F$ and $H$ are the movement state transition function and neural tuning parameter, which are obtained by the least square method; $w(k)$ and $v(k)$ are the uncorrelated process noise and measurement noise, and their means and covariance matrices, $W(k)$ and $V(k)$, are obtained from the residuals of the linear approximation by $F$ and $H$. The estimation on the state follows the standard Kalman filter method,
$x^(k|k-1)=Fx^(k-1|k-1),$
(2.21)
$P(k|k-1)=FP(k-1|k-1)FT+W(k-1),$
(2.22)
$x^(k|k)=x^(k|k-1)+K(k)(y(k)-Hx^(k|k-1)),$
(2.23)
$P(k|k)=I-K(k)HP(k|k-1)I-K(k)HT+K(k)V(k)KT(k),$
(2.24)
where $x^(k|k-1)$ and $P(k|k-1)$ are the prior mean and covariance matrix, $x^(k|k)$ and $P(k|k)$ are the posterior mean and covariance matrix, and the Kalman gain $K(k)$ is computed as
$K(k)=P(k|k-1)HTHP(k|k-1)HT+V(k)-1.$
(2.25)

#### 2.3.2  Hierarchical Kalman Filter

Here we show the solution to the hierarchical model in equations 2.8 and 2.9 without considering the correntropy criterion. The main difference is the definition of $X˜(k)$ in equation 2.8 where state $u(k)$ is modeled as the error signaling mPFC activity and passed through the hierarchical structure to influence movement state $x(k)$. And the observation $Y˜(k)$ denotes the original neural firing rate, or the square root of the neural firing rate, if SRT is applied as preprocessing, of each channel from two cortical areas. $F˜$ and $H˜$ are the state transition function and neural tuning parameter, which are obtained by the least square method. $w˜(k)$ and $v˜(k)$ are the uncorrelated process noise and measurement noise, and their means and covariance matrices $W˜(k)$ and $V˜(k)$ are obtained from the residuals of the linear approximation by $F˜$ and $H˜$. The hierarchical Kalman filter (HKF) algorithm is summarized as follows:
$X˜^(k|k-1)=F˜X˜^(k-1|k-1)+T˜q(k-1),$
(2.26)
$P˜(k|k-1)=F˜P˜(k-1|k-1)F˜T+W˜(k-1),$
(2.27)
$X˜^(k|k)=X˜^(k|k-1)+K˜(k)Y˜(k)-H˜X˜^(k|k-1),$
(2.28)
$P˜(k|k)=I-K˜(k)H˜P˜(k|k-1)I-K˜(k)H˜T+K˜(k)V˜(k)K˜(k)T,$
(2.29)
where $X˜^(k|k-1)$ and $P˜(k|k-1)$ are the prior mean and covariance matrix, $X˜^(k|k)$ and $P˜(k|k)$ are the posterior mean and covariance matrix, and the Kalman gain $K˜(k)$ is written as
$K˜(k)=P˜(k|k-1)H˜TH˜P˜(k|k-1)H˜T+V˜(k)-1.$
(2.30)

## 3  Application

In this section, we apply the proposed correntropy based hierarchical Kalman filter (CHKF) as the neural decoding model to interpret movement intention from multiple cortical observations. The in vivo neural data were collected in the learning process of two rats performing a lever-pressing task. The multiple cortical neural observations using activities from primary motor cortex (M1) and medial prefrontal cortex (mPFC) are available to estimate the corresponding brain states, respectively. The newly proposed CHKF is applied to estimate the brain states in each cortical area and show the brain states changing over time and space. To illustrate the need to use different brain states to represent the information along the neural pathway, we make comparisons with classic Kalman filters (KF) with only one movement state model. Here, the classic KF may have an observation from only M1 activity (KF1) or may pool the multiple cortical activities together as high-dimensional observations (KF2). To illustrate the superiority of using the correntropy theory, we compare the performance of the proposed CHKF with that of the hierarchical Kalman filter (HKF) using the same observations from multiple cortical activities. Moreover, we also apply the square root transform (SRT) on neural firing rate observations as a commonly used preprocess for each algorithm, denoted as KF_SRT, HKF_SRT, and CHKF_SRT, and compare their performances.

### 3.1  Experiment and Data Collection

The BMI experimental paradigm was designed and implemented at the Hong Kong University of Science and Technology. All animal handling procedures were approved by the Animal Care Committee of the Hong Kong University of Science and Technology, strictly complying with the Guide for Care and Use of Laboratory Animals. Two male Sprague Dawley (SD) rats were trained to learn to perform a one-lever-pressing task. The task trials were initialized by an audio cue (10 kHz). The rats needed to press a lever with their right paw within 5 s and then hold the lever for 500 ms to obtain a water reward. If the subjects completed the task, a feedback tone (10 kHz) was presented for 90 ms, and a water drop was presented right after the tone ended. No pressing and early release were regarded as failed trials, and rats would hear no feedback and would not receive water. The intertrial intervals were set randomly ranging from 3 s to 6 s.

The rats were surgically implanted with two 16-channel microelectrode arrays in both M1 and mPFC in the left hemisphere. When the rats were performing the task, the neural signals from the mPFC were recorded from the implanted electrodes with a Plexon (OmniPlex Neural Recording Data Acquisition System, Plexon, Dallas, Texas). Neural activities were recorded from the mPFC and M1 simultaneously. The raw signal was sampled at the 40 kHz frequency and was high-passed at 500 Hz with a four-pole Butterworth filter. The spikes were detected from the processed neural signal with a $-3θ$ to $-5θ$ threshold, where $θ$ denotes the standard deviation of the histogram of the amplitudes. An offline sorter (Plexon) was used to sort the single units from each channel, and the spike timing information was used. Spike firing rates were counted with a nonoverlapping 100 ms time window. Meanwhile, the corresponding behavioral events were obtained using a behavioral recording system (Lafayette Instrument, USA) and synchronized through Plexon digital input.

The common phenomenon of the neural recording is the peaky noise related to subject movement, especially during the movement when the rat reaches, presses the lever, or reaches the hose for its water reward. Figure 1 shows the noisy neural activity with movement-related noise. The $x$-axis is the time, and the left $y$-axis is the amplitude of the multi-unit activity, and the bars indicate the spike events sorted from the multi-unit activity. We can see that the peaky noise happens randomly across the channels, which leads to a dramatic increase in the neural firing. Sometimes the neural firing bursts (magenta shadows) across all channels, which can be removed by common average reference. In some cases, large noise happens only on a few channels, and we have to manually move the affected small time segments because the offline sorting may not be able to remove all. Note that is always tricky to explore the good values of thresholding, as the best threshold could be different across channels or change over time. Especially when it comes to the online processing of neural data, the decoding results will degrade badly due to the random large noise. The idea is to consider probabilistically the existence of the random large noise and eliminate it by the new criterion, such as correntropy.
Figure 1:

The noisy neural recording in the experiment. The $x$-axis is the time, and the sampling rate is 40,000 samples per second. The $y$-axis is the multi-unit activity, and the unit is millivolts (mV). The spike events are shown after offline sorting. Yellow bars, red bars, and blue bars denote the timing of the cue start, press, and reward, respectively, and magenta shadows denote the timing of large noise.

Figure 1:

The noisy neural recording in the experiment. The $x$-axis is the time, and the sampling rate is 40,000 samples per second. The $y$-axis is the multi-unit activity, and the unit is millivolts (mV). The spike events are shown after offline sorting. Yellow bars, red bars, and blue bars denote the timing of the cue start, press, and reward, respectively, and magenta shadows denote the timing of large noise.

Here we choose one day of the neural data during the learning process, where the subject's success rate increases significantly. There are 114 successful trials and 53 failed trials. Here we segment time periods 500 ms before the cue, 500 ms of action (press or not), and 300 ms during the reward-presenting stage for each trial and connect the segments convolving a sigmoid function to smooth the state value over time. Figure 2a shows the ground truth of the movement $x(k)$ over time. The value of the movement changes from 0 to 1 for each successful trial (green dots) but stays at 0 for the failed trial (red crosses label the timing of the cues for each failed trial). Figure 2b shows the expected discrepancy signal $u(k)$ changing from 0 to 1, indicating the error information after each failed trial during the reward-presenting stage.
Figure 2:

The ground truth labeling of brain states. Green dots and red crosses denote the timing of the cues for successful trials, and failed trials respectively. (a) Movement state over time (1 for press and holding, 0 for no press). (b) Expected discrepancy state over time (1 for discrepancy signaling after each failed case during rewarding presenting stage).

Figure 2:

The ground truth labeling of brain states. Green dots and red crosses denote the timing of the cues for successful trials, and failed trials respectively. (a) Movement state over time (1 for press and holding, 0 for no press). (b) Expected discrepancy state over time (1 for discrepancy signaling after each failed case during rewarding presenting stage).

We split the data into training (60%) and test (40%). We obtain the state transition matrix and tuning matrix from the training data and then implement the proposed method on the testing data set to model the information passing along the neural pathway and estimate the movement state $x(k)$ from both M1 activity and mPFC activity. In particular, we extract the error information of the failed trials $u(k)$ from mPFC activity and expect it to contribute to a stronger movement intention in the upcoming trial through the hierarchical model.

### 3.2  Neural Decoding Using the Hierarchical Dynamic Model

We apply the proposed CHKF to derive all the movement states from the multiple cortical observations and compare the performance with HKF and classic KF, and the effects of the square root transform on spike rates as preprocessing for all algorithms are also discussed. We evaluate the performance by the mean square error (MSE) between the estimated state and the ground truth.

We build the state model on $u(k)$ extracted from mPFC following equation 2.6 without considering the external cue information. The movement state model is built following equation 2.7. The main difference is that we consider a window of the past data points of $u(k)$ with a delay as $u(k)=u(k-L-n+1)⋯u(k-L)T$, which indicates the possible effect from the previous trial. The parameters $L$ and $n$ in state vector $u(k)$ are chosen when the estimation performance is optimal. The collected neural data in mPFC and M1 are denoted as the observations in equation 2.9. Note that the $y(k)$ contains the observations in mPFC and M1 the same as KF2. The desired state in the training set is used to obtain the transition matrix $F˜$ and the observation matrix $H˜$. And the residuals of the linear approximation are used to build the noise term in the state model and observation model. Note that the bias is considered to build both models.

Here we check the noises in the observation model in our recordings. Figure 3 shows the noise distribution over 1900 data points. The $x$-axis is noise error, and the $y$-axis is probability density. The blue solid curve is the real data by offline removing the segments with movement-related peaky noise, and the red dash-dotted curve is the normal distribution for comparison. According to the definition of the heavy-tailed distribution in Decarlo (1997), the kurtosis of the blue solid curve is 16.22, which is large than the kurtosis of the normal distribution (Kur $=$ 3), indicating the distribution is heavy-tailed. We also plot the distribution without offline removing the movement-related noise (the black dotted curve). The kurtosis is 23.13, which makes the distribution more heavy-tailed.
Figure 3:

The probability density of the noise. The $x$-axis is the noise error, and the $y$-axis is the probability density. The real data by offline removing the movement-related noise (blue solid curve), real data with movement peaky noise (black dotted curve), and the normal distribution (red dash-dotted curve) are for comparison.

Figure 3:

The probability density of the noise. The $x$-axis is the noise error, and the $y$-axis is the probability density. The real data by offline removing the movement-related noise (blue solid curve), real data with movement peaky noise (black dotted curve), and the normal distribution (red dash-dotted curve) are for comparison.

The decoding from the noisy neural activity using the traditional Kalman filter is worse. Here, a segment of the movement reconstruction is shown in Figure 4. The $x$-axis is time instance, and the $y$-axis is estimation value. The cyan solid curve is classic KF with observations from both primary motor cortex and medial prefrontal cortex (KF2), the blue solid curve is hierarchical Kalman filter (HKF), and the magenta solid curve is correntropy-based HKF (CHKF). The red dots denote the timing of the cue for failed action, and the black asterisks denote the timing of the movement-related noise. We can see that from the 193th time step to the 200th time step and the 270th time step to the 275th time step, our CHKF, as well as HKF, are superior to KF2, where subject fails in the previous trial. This is because our hierarchical model considers the brain states passing along mPFC and M1. Figure 5 shows the estimation values of $u(k)$ by HKF (blue solid curve) and CHKF (magenta solid curve) and the ground-truth label (red dashed curve). We can see that the state estimated from mPFC activity always increases at the rewarding stage of each failed trial. It indicates that mPFC activity does have the information on the discrepancy signal. Our hierarchical model integrates the past fail experience to a new trial and contributes to better movement decoding. In addition, we notice from the 232th time step to the 237th time step and the 252th time step to the 257th time step in Figure 4 that our CHKF is superior to HKF and KF2 in movement reconstruction where the movement-related noise occurs (the black asterisks). And we also see a similar phenomenon from the 247th time step to the 252th time step in Figure 5. It indicates that the new correntropy criterion eliminates the effects of the heavy-tailed noise and improves the state estimation.
Figure 4:

The estimation of the movement $x(k)$. The $x$-axis is the time, and the $y$-axis is the estimation values of $x(k)$. Red dots denote failed trials, and black asterisks denote the peaky noise. The classic KF with observations from the motor cortex and medial prefrontal cortex (KF2) (cyan solid curve), hierarchical Kalman filter (HKF) (blue solid curve), and correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve) are for comparison.

Figure 4:

The estimation of the movement $x(k)$. The $x$-axis is the time, and the $y$-axis is the estimation values of $x(k)$. Red dots denote failed trials, and black asterisks denote the peaky noise. The classic KF with observations from the motor cortex and medial prefrontal cortex (KF2) (cyan solid curve), hierarchical Kalman filter (HKF) (blue solid curve), and correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve) are for comparison.

Figure 5:

The estimation values of the discrepancy signal $u(k)$. The $x$-axis is the time, and the $y$-axis is the estimation values of $u(k)$. Red dots denote failed trials, and black asterisks denote the peaky noise. The hierarchical Kalman filter (HKF) (blue solid curve) and correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve) are for comparison.

Figure 5:

The estimation values of the discrepancy signal $u(k)$. The $x$-axis is the time, and the $y$-axis is the estimation values of $u(k)$. Red dots denote failed trials, and black asterisks denote the peaky noise. The hierarchical Kalman filter (HKF) (blue solid curve) and correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve) are for comparison.

From section 2.2, we know that the kernel bandwidth has a great effect on the performance of CHKF. Figure 6 shows the relationship between decoding performance and kernel bandwidth. As we can see, if the kernel bandwidth is too small, the algorithm has relatively worse performance and even diverges. If the kernel bandwidth is too large, the performance of CHKF (magenta solid curve) is close to that of HKF (blue solid curve). With a proper kernel bandwidth, the CHKF is superior to the HKF. Especially when the kernel bandwidth $σ=3$, the CHKF shows the best performance. Thus, unless otherwise specified, the kernel bandwidth in CHKF is set to 3.
Figure 6:

The relationship between the performance and kernel bandwidth. The $x$-axis is the kernel bandwidth, and the $y$-axis is the mean square root (MSE) between the estimation values and the ground truth. The hierarchical Kalman filter (HKF) (blue solid curve) and correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve) are for comparison.

Figure 6:

The relationship between the performance and kernel bandwidth. The $x$-axis is the kernel bandwidth, and the $y$-axis is the mean square root (MSE) between the estimation values and the ground truth. The hierarchical Kalman filter (HKF) (blue solid curve) and correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve) are for comparison.

The square-root-transformed observation is a common preprocessing method to make the binned spike count more gaussian distributed. We will apply this preprocessing on all the above algorithms, where the algorithms with SRT processing are denoted as KF_SRT, HKF_SRT, and CHKF_SRT. We then statistically test the decoding performance of all the algorithms with and without SRT preprocessing over multiple segments of testing data on two rats. Rat A and rat B have 15 and 11 segments of data, respectively, and each segment contains 80 data points (approximately eight trials). The performance using different algorithms is shown in Figures 7 and 8 for rat A and rat B, respectively. The $x$-axis is the name of algorithm, and the $y$-axis is the distribution of decoding performance across multiple segments measured by MSE. The performance of models that are applied on the original neural firing rates is shown in Figures 7a and 8a. The square-root-transformation preprocessing on the spike rates is used for each model in Figures 7b and 8b. With the same preprocessing, we can see that all the hierarchical models perform better than the nonhierarchical ones. In particular, HKF performs 31.48% and 5.79% better on average than KF1 and KF2, respectively, in rat A. HKF performs 9.22% and 6.74% better on average than KF1 and KF2, respectively in rat B. With SRT preprocessing, HKF_SRT performs 36.26% and 11.07% better on average than KF1_SRT, and KF2_SRT respectively, in rat A. HKF_SRT performs 10.25% and 9.99% better on average than KF1_SRT and KF2_SRT, respectively, in rat B. In addition, with the same preprocessing, the hierarchical model with the correntropy criterion is as good as or better than the hierarchical model, which depends on whether the real data contain the heavy-tailed noise. In particular, CHKF performs 13.27% better on average than HKF in rat A. CHKF_SRT performs 0.30% better on average than HKF_SRT in rat A. CHKF performs 19.33% better on average than HKF in rat B. CHKF_SRT performs similar to HKF_SRT in rat B. Overall, we perform a pairwise student t-test on decoding performance across multiple test segments against the alternative specified by the right tail test for each pair of comparisons. All the tests are performed using Bonferroni correction at an $α=0.01$ significance level. Under the null hypothesis, the probability of observing an equal or higher value in the test statistics is indicated by the p-value. We find that the CHKF achieves a better decoding result than the classic Kalman filter using one (KF1) or two cortical observations (KF2). The conclusion stands with or without SRT preprocessing. We label the statistical significance using two asterisks on the pairs of comparison in Figures 7 and 8.
Figure 7:

The decoding performances on rat A using different algorithms without SRT preprocessing (a) and with SRT preprocessing (b). The $x$-axis is the name of the algorithm, and the $y$-axis is the mean square root (MSE) between the movement estimation of the corresponding algorithm and the ground truth.

Figure 7:

The decoding performances on rat A using different algorithms without SRT preprocessing (a) and with SRT preprocessing (b). The $x$-axis is the name of the algorithm, and the $y$-axis is the mean square root (MSE) between the movement estimation of the corresponding algorithm and the ground truth.

Figure 8:

The decoding performances on rat B using different algorithms without SRT preprocessing (a) and with SRT preprocessing (b). The $x$-axis is the name of the algorithm, and the $y$-axis is the mean square root (MSE) between the movement estimation of the corresponding algorithm and the ground truth.

Figure 8:

The decoding performances on rat B using different algorithms without SRT preprocessing (a) and with SRT preprocessing (b). The $x$-axis is the name of the algorithm, and the $y$-axis is the mean square root (MSE) between the movement estimation of the corresponding algorithm and the ground truth.

Note in the comparison between the pair of the same CHKF models using or without using SRT, CHKF_SRT (the magenta box in Figure 7b) performs better than CHKF (the magenta box in Figure 7a) in rat A, while CHKF (the magenta box in Figure 8a) performs better than CHKF_SRT (the magenta box in Figure 8b) in rat B. It indicates that SRT could be an optional preprocessing. For the data mostly distributed similarly as gaussian but with some peaky noise, the SRT performs deterministically on every data point, which could reduce the noise effect and make clean data much less gaussian at the same time. In comparison, the correntropy criterion mainly considers the probability of the peaky noise existence and eliminates the noise when necessary, and thus it becomes a more reliable criterion.

Note the performance difference in two rats, mainly because the noisy neural recording could have different levels of heavy-tailed distribution across subjects. We further manipulate the occurrence rate of the peaky noise (the percentage of the time instances with movement-related noise out of the total recordings) to study the effect on decoding. The occurrence rate of the noise is controlled by manually removing the noisy contaminated segments or randomly adding the peaky noise but following the same noise amplitude range as in our actual recording. Figures 9 and 10 demonstrate the relationship between the movement decoding performance and the occurrence rate of the movement-related noise on rats A and B, respectively. The $x$-axis is the occurrence rate of the movement-related noise, and the $y$-axis is the mean square error (MSE) between the movement decoding results and the ground-truth data. The classic KF with observations from both M1 and mPFC (KF2) not using or using the square root transform on firing rate observations (KF2_SRT) is illustrated in the cyan solid curve and the cyan dotted curve, respectively. The blue solid curve is the hierarchical Kalman filter (HKF), the magenta solid curve is the correntropy-based HKF (CHKF), the blue dotted curve with square marker is the HKF using the square root transform on the firing rate observation (HKF_SRT), and the magenta dotted curve is the CHKF using the square root transform on the firing rate observation (CHKF_SRT). We can see the same trends that all the hierarchical models perform better than the nonhierarchical ones with the same preprocessing. And with the same preprocessing, the hierarchical models with the correntropy criterion could potentially improve the performance further. The proposed hierarchical model is still better than the classic Kalman filter when the noise occurrence rate is very low (correspondingly, the neural data after offline sorting). The superiority of the hierarchical structure and the correntropy criterion becomes clearer when the occurrence rate of the noise is higher. In Figure 10, the better results using HKF and CHKF without SRT are shown when the noise occurrence is low. It further indicates that SRT could be an optional preprocessing, which depends on the neural data distribution and does not necessarily lead to better performance.
Figure 9:

The relationship between the estimation performance and occurrence rate of movement-related noise on rat A. The $x$-axis is the occurrence rate of movement-related noise, and the $y$-axis is the mean square root (MSE) between the movement estimation and the ground truth. The classic KF with observations from the motor cortex and medial prefrontal cortex (KF2) (cyan solid curve), KF2 using square root transform on observations (KF2_SRT) (cyan dotted curve), hierarchical Kalman filter (HKF) (blue solid curve), correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve), HKF using square root transform on observations (HKF_SRT) (blue dotted curve with squares), and CHKF using square root transform on observations (CHKF_SRT) (magenta dotted curve) are for comparison.

Figure 9:

The relationship between the estimation performance and occurrence rate of movement-related noise on rat A. The $x$-axis is the occurrence rate of movement-related noise, and the $y$-axis is the mean square root (MSE) between the movement estimation and the ground truth. The classic KF with observations from the motor cortex and medial prefrontal cortex (KF2) (cyan solid curve), KF2 using square root transform on observations (KF2_SRT) (cyan dotted curve), hierarchical Kalman filter (HKF) (blue solid curve), correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve), HKF using square root transform on observations (HKF_SRT) (blue dotted curve with squares), and CHKF using square root transform on observations (CHKF_SRT) (magenta dotted curve) are for comparison.

Figure 10:

The relationship between the estimation performance and occurrence rate of movement-related noise on rat B. The $x$-axis is the occurrence rate of movement-related noise, and the $y$-axis is the mean square root (MSE) between the movement estimation and the ground truth. The classic KF with observations from the motor cortex and medial prefrontal cortex (KF2) (cyan solid curve), KF2 using square root transform on observations (KF2_SRT) (cyan dotted curve), hierarchical Kalman filter (HKF) (blue solid curve), correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve), HKF using square root transform on observations (HKF_SRT) (blue dotted curve with squares), and CHKF using square root transform on observations (CHKF_SRT) (magenta dotted curve) are for comparison.

Figure 10:

The relationship between the estimation performance and occurrence rate of movement-related noise on rat B. The $x$-axis is the occurrence rate of movement-related noise, and the $y$-axis is the mean square root (MSE) between the movement estimation and the ground truth. The classic KF with observations from the motor cortex and medial prefrontal cortex (KF2) (cyan solid curve), KF2 using square root transform on observations (KF2_SRT) (cyan dotted curve), hierarchical Kalman filter (HKF) (blue solid curve), correntropy-based hierarchical Kalman filter (CHKF) (magenta solid curve), HKF using square root transform on observations (HKF_SRT) (blue dotted curve with squares), and CHKF using square root transform on observations (CHKF_SRT) (magenta dotted curve) are for comparison.

## 4  Conclusion

BMI technology records and decodes neural signals to predict movement behavior and assist the lives of those with disabilities. The previous BMIs generally use recordings from motor-related areas. In fact, motion control in the central nervous system has a hierarchical structure. Along the hierarchical neural pathway, many cortical areas are heavily involved in the learning process of the motor task. As a commonly used decoding algorithm, the Kalman filter models the movement intention as a neural state and derives the state from observations optimally with the linear assumption and gaussian noise. However, one simple state model cannot represent the brain states at multiple cortical areas that evolve over time and space.

In this letter, we propose a hierarchical model to represent the brain states from multicortical areas that evolve along the neural pathway. Then we introduce correntropy theory into the hierarchical model to address heavy-tailed nongaussian noise possibly existing in the noisy neural recording. We test the proposed algorithm, correntropy-based hierarchical Kalman filter (CHKF), on the in vivo neural recordings of two rats learning a lever-pressing task. We compare the performance of the proposed method with the classic Kalman filter (KF) with only one movement state model, but the observations could be only from the motor cortical activities or the multiple cortical recordings. The MSE of the CHKF decreases by 33.67% and 21.53% on average across multisegment data of two rats when compared with those of the KF with motor cortical recording only (KF1) and KF with multiple cortical recordings (KF2) on observations, respectively. With the square root transform as the preprocessing on neural firing rates, the MSE of the correntropy-based hierarchical Kalman filter (CHKF_SRT) decreases by 23.35% and 10.66% on average across multisegment data of two rats when compared with those of KF with motor cortical recording only (KF1_SRT) and KF with multiple cortical recordings (KF2_SRT), respectively. The results demonstrate that involving multiple brain states in the hierarchical structure allows the information to evolve over time as well as along the neural pathway, which contributes to the statistically better performance of movement intention estimation. We also compare the performance of the proposed CHKF with the hierarchical Kalman filter (HKF) with the same observations from multiple cortical activities. The results show that the correntropy criterion introduced into the proposed hierarchical structure is superior to HKF in decoding the movement when the noisy neural recording is strongly heavy-tailed distributed. The new correntropy criterion especially helps more on the data without offline sorting and benefits more for the online processing scenario to eliminate the effect of the large noise randomly appearing across channels and over time.

Moreover, as a common preprocessing method, the square root transform is applied on neural observation on all algorithms for comparison. The hierarchical model with correntropy (CHKF_SRT) remains the best among all algorithms with the same preprocessing. However, the results on two rats by CHKF with or without SRT show that the SRT preprocessing does not necessarily lead to better performance. For the neural data mostly distributed similarly as gaussian but with some peaky noise, the SRT could reduce the noise effect but also make clean data much less gaussian at the same time. In comparison, the correntropy criterion mainly considers the probability of the peaky noise existence and eliminates it when necessary, and thus becomes a more reliable criterion.

Overall, the proposed hierarchical dynamic model provides a more accurate estimation of movement intention from noisy multiple cortical recordings, which potentially improves the performance of BMI technology.

## Appendix A: Derivation of the Posterior Estimation from Equation 2.15 to Equation 2.16

We theoretically derive equation 2.15 to obtain equation 2.16. For simplicity, we denote $S˜p(k|k-1)$ by $S˜p$, $S˜v(k)$ by $S˜v$, $C˜X(k)$ by $C˜X$, and $C˜Y(k)$ by $C˜Y$ here. According to equations 2.14 and 2.15, we have the following equations:
$M˜(k)=S˜-1(k)IH˜=S˜p-100S˜v-1IH˜=S˜p-1S˜v-1H˜,$
(A.1)
$C˜k=C˜X00C˜Y,$
(A.2)
$D˜k=S˜-1kX˜^k|k-1Y˜k=S˜p-1X˜^k|k-1S˜v-1Y˜k.$
(A.3)
By equations A.1 and A.2, we have
$M˜T(k)C˜(k)M˜(k)-1=S˜p-1TC˜XS˜p-1+H˜TS˜v-1TC˜YS˜v-1H˜-1.$
(A.4)
Using the matrix inversion lemma (Riedel, 1992) with the following identification,
$S˜p-1TC˜XS˜p-1→A,H˜T→X1,H˜→X2T,S˜v-1TC˜YS˜v-1→G.$
Equation A.4 can be written as
$M˜T(k)C˜kM˜(k)-1=S˜pC˜X-1S˜pT-S˜pC˜X-1S˜pTH˜T(S˜vC˜Y-1S˜vT+H˜S˜pC˜X-1S˜pTH˜T)-1H˜S˜pC˜X-1S˜pT.$
(A.5)
Moreover, by equations A.1 to A.3, we have
$M˜T(k)C˜(k)D˜(k)=S˜p-1TC˜XS˜p-1x^k|k-1+H˜TS˜v-1TC˜YS˜v-1Y˜(k).$
(A.6)
Substituting equations A.5 and A.6 into equation 2.15, we can obtain equation 2.16.

## Appendix B: The Sufficient Convergence of the Fixed-Point Iteration in CHKF

In this section, we give the sufficient condition to ensure the convergence of the fixed-point iteration in CHKF. In fact, equation 2.15 is equivalent to equation 2.16 according to the derivation in section A. Since equation 2.16 is a fixed-point equation, equation 2.15 is also a fixed-point equation and can be expressed as
$X˜(k)=fX˜(k).$
(B.1)
According to the Banach fixed-point theorem (Agarwal, Meehan, & Regan, 2001), if $∃β>0$, and $0<α<1$ such that the initial state estimate $∥X˜^(0)(k|k)∥1≤β$, where $.1$ denotes an 1-norm of a vector or an induced norm of a matrix, any state estimate in the iterative process, $X˜(k)∈X˜(k)∈Rn:∥X˜(k)∥1≤β$, needs to satisfy the following two conditions to guarantee the convergence of the fixed-point algorithm in equation 2.15 in CHKF:
$fX˜(k)1≤β,$
(B.2)
$∇X˜(k)fX˜(k)1≤α,$
(B.3)
where $∇X˜(k)fX˜(k)$ denotes the $n×n$ Jacobian matrix of $fX˜(k)$ with respect to $X˜(k)$, that is,
$∇X˜(k)fX˜(k)=∂∂X˜1(k)fX˜(k)…∂∂X˜n(k)fX˜(k),$
(B.4)
with
$∂∂X˜j(k)fX˜(k)=-Nmm-11σ2∑i=1n+me˜i(k)m˜ij(k)Gσe˜i(k)m˜iT(k)m˜i(k)fX˜(k)+Nmm-11σ2∑i=1n+me˜i(k)m˜ij(k)Gσe˜i(k)m˜iT(k)d˜i(k),$
where $Nmm=∑i=1n+mGσe˜i(k)m˜iT(k)m˜i(k)$ and $m˜ij(k)$ is the $j$th element of $m˜i(k)$.

In the following, we adopt the theorem in Chen, Wang, Zhao, Zheng, and Principe (2015) to find the condition to satisfy equations B.2 and B.3.

Theorem 1.
If $β>ζ=n∑i=1n+mm˜iT(k)1d˜i(k)λmin∑i=1n+mm˜iT(k)m˜i(k)$, where $λmin[.]$ denotes the minimum eigenvalue of a matrix, and the kernel bandwidth $σ≥maxσ1,σ2$, in which $σ1$ is the solution of the equation $ϕ(σ)=β$, with
$ϕ(σ)=n∑i=1n+mm˜iT(k)1d˜i(k)λmin∑i=1n+mGσβm˜i(k)1+d˜i(k)m˜iT(k)m˜i(k),σ∈0,∞,$
(B.5)
and $σ2$ is the solution of equation $ψσ=α0<α<1$, with
$ψ(σ)=n∑i=1n+mβm˜i(k)1+d˜i(k)m˜i(k)1βm˜iT(k)m˜i(k)1+m˜iT(k)d˜i(k)1σ2λmin∑i=1n+mGσβm˜i(k)1+d˜i(k)m˜iT(k)m˜i(k),σ∈0,∞,$
(B.6)
then $fX˜(k)1≤β$ and $∇X˜(k)fX˜(k)1≤α$ satisfy for all estimates in the iterative process $X˜(k)∈X˜(k)∈Rn:X˜(k)1≤β$.

Consequently, if the kernel bandwidth $σ$ is larger than a certain value (e.g., $maxσ1,σ2$), the fixed-point iteration in CHKF converges.

## Acknowledgments

This work was supported by Shenzhen-Hong Kong Innovation Circle (Category D) (SGDX2019081623021543), the Hong Kong Innovation and Technology Fund (ITS/092/17), the National Natural Science Foundation of China (61836003), Chau Hoi Shuen Foundation's Donation (R9051), and the sponsorship for targeted strategic partnership (FP902).

## References

Agarwal
,
R. P.
,
Meehan
,
M.
, &
Regan
,
D. O.
(
2001
).
Fixed point theory and applications
.
Cambridge
:
Cambridge University Press
.
Ajiboye
,
A. B.
,
Willett
,
F. R.
,
Young
,
D. R.
,
Memberg
,
W. D.
,
Murphy
,
B. A.
,
Miller
,
J. P.
, …
Kirsch
,
R. F.
(
2017
).
Restoration of reaching and grasping movements through brain-controlled muscle stimulation in a person with tetraplegia: A proof-of-concept demonstration
.
Lancet
,
389
(
10081
),
1821
1830
.
Alexander
,
W. H.
, &
Brown
,
J. W.
(
2011
).
Medial prefrontal cortex as an action-outcome predictor
.
Nature Neuroscience
,
14
(
10
),
1338
1344
.
Alexander
,
W. H.
, &
Brown
,
J. W.
(
2019
).
The role of the anterior cingulate cortex in prediction error and signaling surprise
.
Topics in Cognitive Science
,
11
(
1
),
119
135
.
Bouton
,
C. E.
,
Shaikhouni
,
A.
,
Annetta
,
N. V.
,
,
M. A.
,
Friedenberg
,
D. A.
,
Nielson
,
D. M.
, …
Rezai
,
A. R.
(
2016
).
Restoring cortical control of functional movement in a human with quadriplegia
.
Nature
,
533
,
247
250
.
Brand
,
M.
,
Recknor
,
E. C.
,
Grabenhorst
,
F.
, &
Bechara
,
A.
(
2007
).
Decisions under ambiguity and decisions under risk: Correlations with executive functions and comparisons of two different gambling tasks with implicit and explicit rules
.
Journal of Clinical and Experimental Neuropsychology
,
29
(
1
),
86
99
.
Brandman
,
D. M.
,
Hosman
,
T.
,
Saab
,
J.
,
Burkhart
,
M. C.
,
Shanahan
,
B. E.
,
Ciancibello
,
J. G.
, …
Hochberg
,
L. R.
(
2018
).
Rapid calibration of an intracortical brain-computer interface for people with tetraplegia
.
Journal of Neural Engineering
,
15
(
2
), 026007.
Chapin
,
J. K.
,
Moxon
,
K. A.
,
Markowitz
,
R. S.
, &
Nicolelis
,
M. A. L.
(
1999
).
Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex
.
Nature Neurosience
,
2
,
664
670
.
Chen
,
B.
,
Liu
,
X.
,
Zhao
,
H.
, &
Principe
,
J. C.
(
2017
).
Maximum correntropy Kalman filter
.
Automatica
,
76
,
70
77
.
Chen
,
B.
,
Wang
,
J.
,
Zhao
,
H.
,
Zheng
,
N.
, &
Principe
,
J. C.
(
2015
).
Convergence of a fixed-point algorithm under maximum correntropy criterion
.
IEEE Signal Processing Letters
,
22
(
10
),
1723
1727
.
Chhatbar
,
P. Y.
, &
Francis
,
J. T.
(
2013
).
Towards a naturalistic brain-machine interface: Hybrid torque and position control allows generalization to novel dynamics
.
PLOS One
,
8
(
1
), 52286.
Churchland
,
M. M.
,
Cunningham
,
J. P.
,
Kaufman
,
M. T.
,
Foster
,
J. D.
,
Nuyujukian
,
P.
,
Ryu
,
S. I.
, &
Shenoy
,
K. V.
(
2012
).
Neural population dynamics during reaching
.
Nature
,
487
,
51
56
.
Churchland
,
M. M.
,
Yu
,
B. M.
,
Ryu
,
S. I.
,
Santhanam.
,
G.
, &
Shenoy
,
K. V.
(
2006
).
Neural variability in premotor cortex provides a signature of motor preparation
.
Journal of Neuroscience
,
26
(
14
),
3697
3712
.
Collinger
,
J. L.
,
Wodlinger
,
B.
,
Downey
,
J. E.
,
Wang
,
W.
,
Tyler-Kabara
,
E. C.
,
Weber
,
D.
, …
Schwartz
,
A. B.
, (
2013
).
High-performance neuroprosthetic control by an individual with tetraplegia
.
Lancet
,
381
(
9866
),
557
564
.
Decarlo
,
L. T.
(
1997
).
On the meaning and use of kurtosis
.
Psychological Methods
,
2
(
3
),
292
307
.
Du
,
B.
,
Tang
,
X.
,
Wang
,
Z.
,
Zhang
,
L.
, &
Tao
,
D.
(
2019
).
Robust graph-based semisupervised learning for noisy labeled data via maximum correntropy criterion
.
IEEE Transactions on Cybernetics
,
49
(
4
),
1440
1453
.
Gazzaniga
,
M. S.
,
Ivry
,
R. B.
, &
Mangun
,
G. R.
(
2008
).
Cognitive neuroscience: The biology of the mind
(3rd ed.).
New York
:
Norton
.
Gilja
,
V.
,
Nuyujukian
,
P.
,
Chestek
,
C. A.
,
Cunningham
,
J. P.
,
Yu
,
B. M.
,
Fan
,
J. P.
, …
Shenoy
,
K. V.
(
2012
).
A high-performance neural prosthesis enabled by control algorithm design
.
Nature Neuroscience
,
15
(
12
),
1752
1757
.
Gilja
,
V.
,
Pandarinath
,
C.
,
Blabe
,
C. H.
,
Nuyujukian
,
P.
,
Simeral
,
J. D.
,
Sarma
,
A. A.
, …
Henderson
,
J. M.
(
2015
).
Clinical translation of a high-performance neural prosthesis
.
Nature Medicine
,
21
,
1142
1145
.
He
,
R.
,
Hu
,
B.-G.
Zheng
,
W.-S.
, &
Kong
,
X.-W.
(
2011
).
Robust principal component analysis based on maximum correntropy criterion
.
IEEE Transactions on Image Processing
,
20
(
6
),
1485
1494
.
He
,
R.
,
Zheng
,
W.-S.
, &
Hu
,
B.-G.
(
2011
).
Maximum correntropy criterion for robust face recognition
.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
33
(
8
),
1561
1576
.
Hochberg
,
L. R.
,
Bacher
,
D.
,
Jarosiewicz
,
B.
,
Masse
,
N. Y.
,
Simeral
,
J. D.
,
Vogel
,
J.
, …
Donoghue
,
J. P.
(
2012
).
Neuronal ensemble control of prosthetic devices by a human with tetraplegia
.
Nature
,
485
,
372
375
.
Hochberg
,
L. R.
,
Serruya
,
M. D.
,
Friehs
,
G. M.
,
Mukand
,
J. A.
,
Saleh
,
M.
,
Caplan
,
A. H.
, …
Donoghue
,
J. P.
(
2006
).
Neuronal ensemble control of prosthetic devices by a human with tetraplegia
.
Nature
,
442
,
164
171
.
Homer
,
M. L.
,
Perge
,
J. A.
,
Black
,
M. J.
,
Harrison
,
M. T.
,
Cash
,
S. S.
, &
Hochberg
,
L. R.
(
2014
).
Efficient decoding with steady-state Kalman filter in neural interface systems
.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
,
22
(
2
),
239
248
.
Ito
,
S.
,
Stuphorn
,
V.
,
Brown
,
J. W.
, &
Schall
,
J. D.
(
2003
).
Performance monitoring by the anterior cingulate cortex during saccade countermanding
.
Science
,
302
(
5642
),
120
122
.
Kamigaki
,
T.
(
2019
).
Prefrontal circuit organization for executive control
.
Neuroscience Research
,
140
,
23
36
.
Lebedev
,
M. A.
, &
Nicolelis
,
M. A. L.
(
2006
).
Brain-machine interfaces: Past, present and future
.
Trends in Neurosciences
,
29
(
9
),
536
546
.
Li
,
W.
,
Guo
,
Y.
,
Fan
,
J.
,
Ma
,
C.
,
Ma
,
X.
,
Chen
,
X.
, &
He
,
J.
(
2017
).
The neural mechanism exploration of adaptive motor control: dynamical economic cell allocation in the primary motor cortex
.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
,
25
(
5
),
492
501
.
Liu
,
W.
,
Pokharel
,
P. P.
, &
Principe
,
J. C.
(
2007
).
Correntropy: Properties and applications in non-gaussian signal processing
.
IEEE Transactions on Signal Processing
,
55
(
11
),
5286
5298
.
Liu
,
X.
,
Qu
,
H.
,
Zhao
,
J.
, &
Chen
,
B.
(
2017
).
State space maximum correntropy filter
.
Signal Processing
,
130
,
152
158
.
Malik
,
W. Q.
,
Truccolo
,
W.
,
Brown
,
E. N.
, &
Hochberg
,
L. R.
(
2011
).
Efficient decoding with steady-state Kalman filter in neural interface systems
.
IEEE Transactions on Neural Systems and Rehabilitation Engineering
,
19
(
1
),
25
34
.
Mandanas
,
F. D.
, &
Kotropoulos
,
C. L.
(
2017
).
Robust multidimensional scaling using a maximum correntropy criterion
.
IEEE Transactions on Signal Processing
,
65
(
4
),
919
932
.
Matsumoto
,
M.
, &
Hikosaka
,
O.
(
2007
).
Lateral habenula as a source of negative reward signals in dopamine neurons
.
Nature
,
447
,
1111
1115
.
Moritz
,
C. T.
,
Perlmutter
,
S. I.
, &
Fetz
,
E. E.
(
2008
).
Direct control of paralysed muscles by cortical neurons
.
Nature
,
456
,
639
642
.
Musallam
,
S.
,
Corneil
,
B. D.
,
Greger
,
B.
,
Scherberger
,
H.
, &
Andersen
,
R. A.
(
2004
).
Cognitive control signals for neural prosthetics
.
Science
,
305
(
5681
),
258
262
.
Nicolelis
,
M. A. L.
(
2003
).
Brain-machine interfaces to restore motor function and probe neural circuits
.
Nature Reviews Neuroscience
,
4
(
5
),
417
422
.
O'Doherty
,
J. E.
,
Lebedev
,
M. A.
,
Hanson
,
T. L.
,
Fitzsimmons
,
N. A.
, &
Nicolelis
,
M. A. L.
(
2009
).
A brain-machine interface instructed by direct intracortical microstimulation
.
Frontiers in Integrative Neuroscience
,
3
, 20.
Orsborn
,
A. L.
,
Moorman
,
H. G.
,
Overduin
,
S. A.
,
Shanechi
,
M. M.
,
Dimitrov
,
D. F.
, &
Carmena
,
J. M.
(
2014
).
Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control
.
Neuron
,
82
(
6
),
1380
1393
.
Otis
,
J. M.
,
Namboodiri
,
V. M. K.
,
Matan
,
A. M.
,
Voets
,
E. S.
,
Mohorn
,
E. P.
,
Kosyk
,
O.
, …
Stuber
,
G. D.
(
2017
).
Prefrontal cortex output circuits guide reward seeking through divergent cue encoding
.
Nature
,
543
,
103
107
.
Pesaran
,
B.
,
Pezaris
,
J. S.
,
Sahani
,
M.
,
Mitra
,
P. P.
, &
Andersen
,
R. A.
(
2002
).
Temporal structure in neuronal activity during working memory in macaque parietal cortex
.
Nature Neuroscience
,
5
(
8
),
805
811
.
Principe
,
J.
(
2010
).
Information theoretic learning: Renyis entropy and kernel perspectives
.
New York
:
Springer
.
Riedel
,
K. S.
(
1992
).
A Sherman-Morrison-Woodbury identity for rank augmenting matrices with application to centering
.
SIAM Journal on Matrix Analysis and Applications
,
13
(
2
),
659
662
.
Rushworth
,
M. F. S.
,
Noonan
,
M. P.
,
Boorman
,
E. D.
,
Walton
,
M. E.
, &
Behrens
,
T. E.
(
2011
).
Frontal cortex and reward-guided learning and decision-making
.
Neuron
,
70
(
6
),
1054
1069
.
Schick
,
I. C.
, &
Mitter
,
S. K.
(
1994
).
Robust recursive estimation in the presence of heavy-tailed observation noise
.
Annals of Statistics
,
22
(
2
),
1045
1080
.
Shi
,
L.
, &
Lin
,
Y.
(
2014
).
Convex combination of adaptive filters under the maximum correntropy criterion in impulsive interference
.
IEEE Signal Processing Letters
,
21
(
11
),
1385
1388
.
Silvetti
,
M.
,
Seurinck
,
R.
, &
Verguts
,
T.
(
2011
).
Value and prediction error in medial frontal cortex: Integrating the single-unit and systems levels of analysis
.
Frontiers in Human Neuroscience
,
5
, 75.
Singh, A., &
Principe
,
J. C.
(
2009
).
Using correntropy as a cost function in linear adaptive filters.
In
Proceedings of the International Joint Conference on Neural Networks
(pp.
2950
2955
).
Piscataway, NJ
:
IEEE
.
Singh
,
A.
, &
Principe
,
J. C.
(
2010
).
A closed form recursive solution for maximum correntropy training
. In
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(pp.
2070
2073
).
Piscataway, NJ
:
IEEE
.
Taylor
,
D. M.
,
Tillery
,
S. I. H.
, &
Schwartz
,
A. B.
(
2002
).
Direct cortical control of 3D neuroprosthetic devices
.
Science
,
296
(
5574
),
1829
1832
.
Truccolo
,
W.
,
Friehs
,
G. M.
,
Donoghue
,
J. P.
, &
Hochberg
,
L. R.
(
2008
).
Primary motor cortex tuning to intended movement kinematics in humans with tetraplegia
.
Journal of Neuroscience
,
28
(
5
),
1163
1178
.
Uylings
,
H. B. M.
,
Groenewegen
,
H. J.
, &
Kolb
,
B.
(
2003
).
Do rats have a prefrontal cortex?
Behavioural Brain Research
,
146
(
1–2
),
3
17
.
Vassena
,
E.
,
Holroyd
,
C. B.
, &
Alexander
,
W. H.
(
2017
).
Computational models of anterior cingulate cortex: At the crossroads between prediction and effort
.
Frontiers in Neuroscience
,
11
, 316.
Velliste
,
M.
,
Perel
,
S.
,
Spalding
,
M. C.
,
Whitford
,
A. S.
, &
Schwartz
,
A. B.
(
2008
).
Cortical control of a prosthetic arm for self-feeding
.
Nature
,
453
,
1098
1101
.
Vyas
,
S.
,
Even-Chen
,
N.
,
Stavisky
,
S. D.
Ryu
,
S. I.
,
Nuyujukian
,
P.
, &
Shenoy
,
K. V.
(
2018
).
Neural population dynamics underlying motor learning transfer
.
Neuron
,
97
(
5
),
1177
1186
.
Wang
,
J. X.
,
Kurth-Nelson
,
Z.
,
Kumaran
,
D.
,
Tirumala
,
D.
,
Soyer
,
H.
,
Leibo
,
J. Z.
, …
Botvinick
,
M.
, (
2018
).
Prefrontal cortex as a meta-reinforcement learning system
.
Nature Neuroscience
,
21
,
860
868
.
Wolpaw
,
J. R.
,
Birbaumer
,
N.
,
Heetderks
,
W. J.
,
McFarland
,
D. J.
,
Peckham
,
P. H.
,
Schalk
,
G.
, …
Vaughan
,
T. M.
(
2000
).
Brain-computer interface technology: A review of the first international meeting
.
IEEE Transactions on Rehabilitation Engineering
,
8
(
2
),
164
173
.
Wu
,
W.
,
Gao
,
Y.
,
Bienenstock
,
E.
,
Donoghue
,
J. P.
, &
Black
,
M. J.
(
2006
).
Bayesian population decoding of motor cortical activity using a Kalman filter
.
Neural Computation
,
18
(
1
),
80
118
.
Wunderlich
,
K.
,
Rangel
,
A.
, &
O'Doherty
,
J. P.
(
2009
).
Neural computations underlying action-based decision making in the human brain
. In
Proceedings of the National Academy of Sciences of the United States of America
,
106
(
40
),
17199
17204
.

## Author notes

Xi Liu is the first author, and Xiang Shen is the co-first author. Yiwen Wang serves as corresponding author, and Yueming Wang serves as co-corresponding author.