## Abstract

Motor brain machine interfaces (BMIs) interpret neural activities from motor-related cortical areas in the brain into movement commands to control a prosthesis. As the subject adapts to control the neural prosthesis, the medial prefrontal cortex (mPFC), upstream of the primary motor cortex (M1), is heavily involved in reward-guided motor learning. Thus, considering mPFC and M1 functionality within a hierarchical structure could potentially improve the effectiveness of BMI decoding while subjects are learning. The commonly used Kalman decoding method with only one simple state model may not be able to represent the multiple brain states that evolve over time as well as along the neural pathway. In addition, the performance of Kalman decoders degenerates in heavy-tailed nongaussian noise, which is usually generated due to the nonlinear neural system or influences of movement-related noise in online neural recording. In this letter, we propose a hierarchical model to represent the brain states from multiple cortical areas that evolve along the neural pathway. We then introduce correntropy theory into the hierarchical structure to address the heavy-tailed noise existing in neural recordings. We test the proposed algorithm on in vivo recordings collected from the mPFC and M1 of two rats when the subjects were learning to perform a lever-pressing task. Compared with the classic Kalman filter, our results demonstrate better movement decoding performance due to the hierarchical structure that integrates the past failed trial information over multisite recording and the combination with correntropy criterion to deal with noisy heavy-tailed neural recordings.

## 1 Introduction

Brain-machine interfaces (BMIs) provide a direct communication channel between the brain and prosthetic devices that bypasses the peripheral nervous system. The neural activities in multiple cortices are recorded for further analysis. Signal processing methods, such as population coding and Kalman filters, were applied to decode neural activity to predict movement intention to assist people with disabilities (Nicolelis, 2003; Lebedev & Nicolelis, 2006; Taylor, Tillery, & Schwartz, 2002; Musallam, Corneil, Greger, Scherberger, & Andersen, 2004; Velliste, Perel, Spalding, Whitford, & Schwartz, 2008; O'Doherty, Lebedev, Hanson, Fitzsimmons, & Nicolelis, 2009; Gilja et al., 2012; Churchland et al., 2012; Orsborn et al., 2014; Vyas et al., 2018; Hochberg et al., 2006, 2012; Truccolo, Friehs, Donoghue, & Hochberg, 2008; Moritz, Perlmutter, & Fetz, 2008; Collinger et al., 2013; Gilja et al., 2015; Bouton et al., 2016; Ajiboye et al., 2017; Brandman et al., 2018; Wolpaw et al., 2000; Chapin, Moxon, Markowitz, & Nicolelis, 1999; Chhatbar & Francis, 2013). These cortical areas were mainly motor-related areas, such as primary motor cortex (M1) (Li et al., 2017), premotor cortex (PMd) (Churchland, Yu, Ryu, Santhanam, & Shenoy, 2006), and parietal cortex (Pesaran, Pezaris, Sahani, Mitra, & Andersen, 2002).

Motor control in the central nervous system has a hierarchical structure. In a descending motor hierarchical pathway, the premotor cortex receives information from the prefrontal cortex and works with the supplementary motor areas to pass information to the primary motor cortex, and then to the spinal cord to execute the movement (Gazzaniga, Ivry, & Mangun, 2008). In particular, the medial prefrontal cortex (mPFC), anatomically related to both the anterior cingulate cortex (ACC) and the dorsolateral PFC of primates, is involved heavily in the learning process of motor tasks (Uylings, Groenewegen, & Kolb, 2003; Brand, Recknor, Grabenhorst, & Bechara, 2007; Rushworth, Noonan, Boorman, Walton, & Behrens, 2011; Otis et al., 2017; Kamigaki, 2019; Ito, Stuphorn, Brown, & Schall, 2003; Matsumoto & Hikosaka, 2007; Wang et al., 2018; Wunderlich, Rangel, & O'Doherty, 2009; Alexander & Brown, 2019; Silvetti, Seurinck, & Verguts, 2011; Vassena, Holroyd, & Alexander, 2017). The neural activity in mPFC is closely correlated with the prediction error over the course of adaptation (Silvetti, Seurinck, & Verguts, 2011; Vassena et al., 2017; Alexander & Brown, 2011). In other words, mPFC signals the discrepancies between actual and predicted outcomes during learning. This error will be used to improve movement generation and is important when the subject adapts his or her brain signals to control the prosthesis. Thus, considering the hierarchical structure in the brain that involves the functionality of mPFC could potentially improve the effectiveness of BMI while the subject is learning.

Kalman filters were widely used to decode the neural activity in BMIs (Malik, Truccolo, Brown, & Hochberg, 2011; Homer et al., 2014; Wu, Gao, Bienenstock, Donoghue, & Black, 2006). The neural activity was taken as the observation and was linked to the movement states by a linear observation function. The movement was modeled as the internal state and estimated from observation at each time instance. The advantage of using a Kalman filter as the decoder was to model the temporal dynamics of brain state (movement intention). It was especially helpful for BMIs in the pure brain control without actual movement, where the brain state continuously changed over time without the need to detect the initialization of a certain movement. Actually, neural states differed across cortical areas (Kamigaki, 2019). The state in mPFC indicates the discrepancies between the actual and predicted outcomes in learning, the state in premotor cortex demonstrates the planning of motion, and the state in primary motor cortex shows the execution of the movement. All the states evolve over time and pass information along the neural pathway. However, a typical Kalman filter decoder uses only one brain state to roughly represent the neural activity observations recorded from multiple cortical areas. This may ignore the dynamics of the neural information passing along the pathway and potentially limit the decoding performance. Moreover, Kalman filter the adopts the minimum mean square error (MMSE) criterion, which assumes gaussian distributed noise. In neural systems, which are typically nonlinear, nongaussian noise may occur due to the nonlinearity of neural systems or the influence of movement-related noise during the neural recording on freely behaving subjects. In both cases, the performance of Kalman filter decoders can be severely degraded (Schick & Mitter, 1994). As a solution to the problem of nongaussian noise, information theoretic learning (ITL) was proposed to greatly improve performance in nongaussian noise environments (Principe, 2010; Liu, Pokharel, & Principe, 2007; Singh & Principe, 2009). In ITL, the optimization criterion was proposed based on a new criterion, correntropy, to measure all even-order statistics of the distribution other than just the second-order statistics (Shi & Lin, 2014; He, Hu, Zheng, & Kong, 2011; He, Zheng, & Hu, 2011; Du, Tang, Wang, Zhang, & Tao, 2019; Mandanas & Kotropoulos, 2017; Liu, Qu, Zhao, & Chen, 2017). In recent work, a linear maximum correntropy Kalman filter was developed by researchers, and it exhibited superior performance in the presence of heavy-tailed nongaussian noise by taking advantage of the robustness of correntropy (Chen, Liu, Zhao, & Principe, 2017).

In this letter, we propose a hierarchical dynamic model, correntropy-based hierarchical Kalman filter (CHKF), to improve movement decoding accuracy from multiple cortical neural recordings. The hierarchical structure models the brain states in the different cortical areas as the hidden states, which evolve over time and along the neural pathway. In particular, we use the hierarchical model to integrate the error information extracted from the medial prefrontal cortex while the subject is learning a new task. The failed information will be passed to better estimate the state of the movement execution in the upcoming trial. Moreover, we introduce correntropy theory into the hierarchical structure to address the possible presence of heavy-tailed nongaussian noise that results from the linear approximation of the neural system or from influences of movement-related noise during neural recording. We verify our model on in vivo neural recordings of two rats' learning process of a lever-pressing task to decode the movement from multiple cortical recordings. The neural activities are simultaneously recorded from M1 and mPFC. The mPFC and M1 cortical states estimated from each cortical recording both evolve over time as described by the state function in the hierarchical dynamical model. The discrepancy signal during the rewarding stage indicated by mPFC states in the previous trial will contribute to the estimation of the movement state in the upcoming trial. The proposed algorithm is applied to decode movement states from observations of mPFC and motor cortical activity. To demonstrate the advantage of the hierarchical structure, the decoding performance of CHKF is compared with those of a classic Kalman filter (KF) with observation from M1 activity only, as well as from pooling multiple cortical activities into higher-dimensional observations. We also compare CHKF with hierarchical Kalman filter (HKF) to demonstrate the advantage of combining the correntropy into the hierarchical structure in addressing the presence of heavy-tailed nongaussian noise in neural recordings. Using a square root transform (SRT) on the observations is a commonly used preprocessing to make the binned neural firing counts more gaussian distributed. We use the SRT processing on neural firing rates before applying the above algorithms and compare the performance of the hierarchical model with the classic Kalman model. We use mean square error (MSE) to evaluate the movement estimation compared with the ground truth.

The rest of the letter is organized as follows. In section 2, the hierarchical model, the proposed correntropy-based hierarchical Kalman filter, and related algorithms are introduced. In section 3, the real data application on two rats' learning procedure of a lever-pressing task is used to demonstrate the superior performance of the proposed algorithm compared with those of related algorithms. In section 4, the conclusion is given.

## 2 Theory

### 2.1 Hierarchical Model

### 2.2 Correntropy-Based Hierarchical Kalman Filter

Note that our goal is to estimate $u(k)$ and $x(k)$ from $z(k)$ and $y(k)$. The current design of the hierarchical model allows the sparse distribution of transition matrices $F\u02dc$ and $T\u02dc$ and observation matrix $H\u02dc$, which potentially reduces computational complexity.

The convergence of the estimation $X\u02dc(k)$ is influenced by the kernel bandwidth $\sigma $. The sufficient condition with respect to the choice of kernel bandwidth to ensure the convergence of the fixed-point iteration in CHKF is provided in appendix B.

In general, the smaller the kernel bandwidth is, the more robust the algorithm. But the kernel bandwidth has a low-bound value to guarantee the convergence of the fixed-point method. On the other hand, when the kernel bandwidth becomes increasingly larger, the performance of the algorithm will behave like the corresponding hierarchical Kalman filter (HKF). Especially, if $\sigma \u2192\u221e$, then $C\u02dc(k)\u2192I$, and the proposed algorithm will reduce to the HKF.

The CHKF algorithm is summarized in algorithm 1.

### 2.3 Related Algorithms

In this section, we introduce two related methods for comparison. The first is the classic Kalman filter with only one simple state model but takes in the observation from single or multiple sites' cortical recordings, which is compared with CHKF to show the superior performance of the hierarchical model. The second is the hierarchical Kalman filter (HKF), which is compared with CHKF to show the improved performance of correntropy theory in the presence of the heavy-tailed, nongaussian noise. A square root transform (SRT) is used as an optimal preprocessing on the firing rate observations to make the data more toward gaussian distribution for all the algorithms, including the classic Kalman and hierarchical model.

#### 2.3.1 Kalman Filter as a Neural Decoder

#### 2.3.2 Hierarchical Kalman Filter

## 3 Application

In this section, we apply the proposed correntropy based hierarchical Kalman filter (CHKF) as the neural decoding model to interpret movement intention from multiple cortical observations. The in vivo neural data were collected in the learning process of two rats performing a lever-pressing task. The multiple cortical neural observations using activities from primary motor cortex (M1) and medial prefrontal cortex (mPFC) are available to estimate the corresponding brain states, respectively. The newly proposed CHKF is applied to estimate the brain states in each cortical area and show the brain states changing over time and space. To illustrate the need to use different brain states to represent the information along the neural pathway, we make comparisons with classic Kalman filters (KF) with only one movement state model. Here, the classic KF may have an observation from only M1 activity (KF1) or may pool the multiple cortical activities together as high-dimensional observations (KF2). To illustrate the superiority of using the correntropy theory, we compare the performance of the proposed CHKF with that of the hierarchical Kalman filter (HKF) using the same observations from multiple cortical activities. Moreover, we also apply the square root transform (SRT) on neural firing rate observations as a commonly used preprocess for each algorithm, denoted as KF_SRT, HKF_SRT, and CHKF_SRT, and compare their performances.

### 3.1 Experiment and Data Collection

The BMI experimental paradigm was designed and implemented at the Hong Kong University of Science and Technology. All animal handling procedures were approved by the Animal Care Committee of the Hong Kong University of Science and Technology, strictly complying with the Guide for Care and Use of Laboratory Animals. Two male Sprague Dawley (SD) rats were trained to learn to perform a one-lever-pressing task. The task trials were initialized by an audio cue (10 kHz). The rats needed to press a lever with their right paw within 5 s and then hold the lever for 500 ms to obtain a water reward. If the subjects completed the task, a feedback tone (10 kHz) was presented for 90 ms, and a water drop was presented right after the tone ended. No pressing and early release were regarded as failed trials, and rats would hear no feedback and would not receive water. The intertrial intervals were set randomly ranging from 3 s to 6 s.

The rats were surgically implanted with two 16-channel microelectrode arrays in both M1 and mPFC in the left hemisphere. When the rats were performing the task, the neural signals from the mPFC were recorded from the implanted electrodes with a Plexon (OmniPlex Neural Recording Data Acquisition System, Plexon, Dallas, Texas). Neural activities were recorded from the mPFC and M1 simultaneously. The raw signal was sampled at the 40 kHz frequency and was high-passed at 500 Hz with a four-pole Butterworth filter. The spikes were detected from the processed neural signal with a $-3\theta $ to $-5\theta $ threshold, where $\theta $ denotes the standard deviation of the histogram of the amplitudes. An offline sorter (Plexon) was used to sort the single units from each channel, and the spike timing information was used. Spike firing rates were counted with a nonoverlapping 100 ms time window. Meanwhile, the corresponding behavioral events were obtained using a behavioral recording system (Lafayette Instrument, USA) and synchronized through Plexon digital input.

We split the data into training (60%) and test (40%). We obtain the state transition matrix and tuning matrix from the training data and then implement the proposed method on the testing data set to model the information passing along the neural pathway and estimate the movement state $x(k)$ from both M1 activity and mPFC activity. In particular, we extract the error information of the failed trials $u(k)$ from mPFC activity and expect it to contribute to a stronger movement intention in the upcoming trial through the hierarchical model.

### 3.2 Neural Decoding Using the Hierarchical Dynamic Model

We apply the proposed CHKF to derive all the movement states from the multiple cortical observations and compare the performance with HKF and classic KF, and the effects of the square root transform on spike rates as preprocessing for all algorithms are also discussed. We evaluate the performance by the mean square error (MSE) between the estimated state and the ground truth.

We build the state model on $u(k)$ extracted from mPFC following equation 2.6 without considering the external cue information. The movement state model is built following equation 2.7. The main difference is that we consider a window of the past data points of $u(k)$ with a delay as $u(k)=u(k-L-n+1)\cdots u(k-L)T$, which indicates the possible effect from the previous trial. The parameters $L$ and $n$ in state vector $u(k)$ are chosen when the estimation performance is optimal. The collected neural data in mPFC and M1 are denoted as the observations in equation 2.9. Note that the $y(k)$ contains the observations in mPFC and M1 the same as KF2. The desired state in the training set is used to obtain the transition matrix $F\u02dc$ and the observation matrix $H\u02dc$. And the residuals of the linear approximation are used to build the noise term in the state model and observation model. Note that the bias is considered to build both models.

*t*-test on decoding performance across multiple test segments against the alternative specified by the right tail test for each pair of comparisons. All the tests are performed using Bonferroni correction at an $\alpha =0.01$ significance level. Under the null hypothesis, the probability of observing an equal or higher value in the test statistics is indicated by the

*p*-value. We find that the CHKF achieves a better decoding result than the classic Kalman filter using one (KF1) or two cortical observations (KF2). The conclusion stands with or without SRT preprocessing. We label the statistical significance using two asterisks on the pairs of comparison in Figures 7 and 8.

Note in the comparison between the pair of the same CHKF models using or without using SRT, CHKF_SRT (the magenta box in Figure 7b) performs better than CHKF (the magenta box in Figure 7a) in rat A, while CHKF (the magenta box in Figure 8a) performs better than CHKF_SRT (the magenta box in Figure 8b) in rat B. It indicates that SRT could be an optional preprocessing. For the data mostly distributed similarly as gaussian but with some peaky noise, the SRT performs deterministically on every data point, which could reduce the noise effect and make clean data much less gaussian at the same time. In comparison, the correntropy criterion mainly considers the probability of the peaky noise existence and eliminates the noise when necessary, and thus it becomes a more reliable criterion.

## 4 Conclusion

BMI technology records and decodes neural signals to predict movement behavior and assist the lives of those with disabilities. The previous BMIs generally use recordings from motor-related areas. In fact, motion control in the central nervous system has a hierarchical structure. Along the hierarchical neural pathway, many cortical areas are heavily involved in the learning process of the motor task. As a commonly used decoding algorithm, the Kalman filter models the movement intention as a neural state and derives the state from observations optimally with the linear assumption and gaussian noise. However, one simple state model cannot represent the brain states at multiple cortical areas that evolve over time and space.

In this letter, we propose a hierarchical model to represent the brain states from multicortical areas that evolve along the neural pathway. Then we introduce correntropy theory into the hierarchical model to address heavy-tailed nongaussian noise possibly existing in the noisy neural recording. We test the proposed algorithm, correntropy-based hierarchical Kalman filter (CHKF), on the in vivo neural recordings of two rats learning a lever-pressing task. We compare the performance of the proposed method with the classic Kalman filter (KF) with only one movement state model, but the observations could be only from the motor cortical activities or the multiple cortical recordings. The MSE of the CHKF decreases by 33.67% and 21.53% on average across multisegment data of two rats when compared with those of the KF with motor cortical recording only (KF1) and KF with multiple cortical recordings (KF2) on observations, respectively. With the square root transform as the preprocessing on neural firing rates, the MSE of the correntropy-based hierarchical Kalman filter (CHKF_SRT) decreases by 23.35% and 10.66% on average across multisegment data of two rats when compared with those of KF with motor cortical recording only (KF1_SRT) and KF with multiple cortical recordings (KF2_SRT), respectively. The results demonstrate that involving multiple brain states in the hierarchical structure allows the information to evolve over time as well as along the neural pathway, which contributes to the statistically better performance of movement intention estimation. We also compare the performance of the proposed CHKF with the hierarchical Kalman filter (HKF) with the same observations from multiple cortical activities. The results show that the correntropy criterion introduced into the proposed hierarchical structure is superior to HKF in decoding the movement when the noisy neural recording is strongly heavy-tailed distributed. The new correntropy criterion especially helps more on the data without offline sorting and benefits more for the online processing scenario to eliminate the effect of the large noise randomly appearing across channels and over time.

Moreover, as a common preprocessing method, the square root transform is applied on neural observation on all algorithms for comparison. The hierarchical model with correntropy (CHKF_SRT) remains the best among all algorithms with the same preprocessing. However, the results on two rats by CHKF with or without SRT show that the SRT preprocessing does not necessarily lead to better performance. For the neural data mostly distributed similarly as gaussian but with some peaky noise, the SRT could reduce the noise effect but also make clean data much less gaussian at the same time. In comparison, the correntropy criterion mainly considers the probability of the peaky noise existence and eliminates it when necessary, and thus becomes a more reliable criterion.

Overall, the proposed hierarchical dynamic model provides a more accurate estimation of movement intention from noisy multiple cortical recordings, which potentially improves the performance of BMI technology.

## Appendix A: Derivation of the Posterior Estimation from Equation 2.15 to Equation 2.16

## Appendix B: The Sufficient Convergence of the Fixed-Point Iteration in CHKF

In the following, we adopt the theorem in Chen, Wang, Zhao, Zheng, and Principe (2015) to find the condition to satisfy equations B.2 and B.3.

**Theorem 1.**

Consequently, if the kernel bandwidth $\sigma $ is larger than a certain value (e.g., $max\sigma 1,\sigma 2$), the fixed-point iteration in CHKF converges.

## Acknowledgments

This work was supported by Shenzhen-Hong Kong Innovation Circle (Category D) (SGDX2019081623021543), the Hong Kong Innovation and Technology Fund (ITS/092/17), the National Natural Science Foundation of China (61836003), Chau Hoi Shuen Foundation's Donation (R9051), and the sponsorship for targeted strategic partnership (FP902).

## References

*Fixed point theory and applications*

*Lancet*

*Nature Neuroscience*

*Topics in Cognitive Science*

*Nature*

*Journal of Clinical and Experimental Neuropsychology*

*Journal of Neural Engineering*

*Nature Neurosience*

*Automatica*

*IEEE Signal Processing Letters*

*PLOS One*

*Nature*

*Journal of Neuroscience*

*Lancet*

*Psychological Methods*

*IEEE Transactions on Cybernetics*

*Cognitive neuroscience: The biology of the mind*

*Nature Neuroscience*

*Nature Medicine*

*IEEE Transactions on Image Processing*

*IEEE Transactions on Pattern Analysis and Machine Intelligence*

*Nature*

*Nature*

*IEEE Transactions on Neural Systems and Rehabilitation Engineering*

*Science*

*Neuroscience Research*

*Trends in Neurosciences*

*IEEE Transactions on Neural Systems and Rehabilitation Engineering*

*IEEE Transactions on Signal Processing*

*Signal Processing*

*IEEE Transactions on Neural Systems and Rehabilitation Engineering*

*IEEE Transactions on Signal Processing*

*Nature*

*Nature*

*Science*

*Nature Reviews Neuroscience*

*Frontiers in Integrative Neuroscience*

*Neuron*

*Nature*

*Nature Neuroscience*

*Information theoretic learning: Renyis entropy and kernel perspectives*

*SIAM Journal on Matrix Analysis and Applications*

*Neuron*

*Annals of Statistics*

*IEEE Signal Processing Letters*

*Frontiers in Human Neuroscience*

*Proceedings of the International Joint Conference on Neural Networks*

*Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing*

*Science*

*Journal of Neuroscience*

*Behavioural Brain Research*

*Frontiers in Neuroscience*

*Nature*

*Neuron*

*Nature Neuroscience*

*IEEE Transactions on Rehabilitation Engineering*

*Neural Computation*

*Proceedings of the National Academy of Sciences of the United States of America*

## Author notes

Xi Liu is the first author, and Xiang Shen is the co-first author. Yiwen Wang serves as corresponding author, and Yueming Wang serves as co-corresponding author.