Abstract

Although many real-time neural decoding algorithms have been proposed for brain-machine interface (BMI) applications over the years, an optimal, consensual approach remains elusive. Recent advances in deep learning algorithms provide new opportunities for improving the design of BMI decoders, including the use of recurrent artificial neural networks to decode neuronal ensemble activity in real time. Here, we developed a long-short term memory (LSTM) decoder for extracting movement kinematics from the activity of large (N = 134–402) populations of neurons, sampled simultaneously from multiple cortical areas, in rhesus monkeys performing motor tasks. Recorded regions included primary motor, dorsal premotor, supplementary motor, and primary somatosensory cortical areas. The LSTM's capacity to retain information for extended periods of time enabled accurate decoding for tasks that required both movements and periods of immobility. Our LSTM algorithm significantly outperformed the state-of-the-art unscented Kalman filter when applied to three tasks: center-out arm reaching, bimanual reaching, and bipedal walking on a treadmill. Notably, LSTM units exhibited a variety of well-known physiological features of cortical neuronal activity, such as directional tuning and neuronal dynamics across task epochs. LSTM modeled several key physiological attributes of cortical circuits involved in motor tasks. These findings suggest that LSTM-based approaches could yield a better algorithm strategy for neuroprostheses that employ BMIs to restore movement in severely disabled patients.

1  Introduction

In the brain-machine interface (BMI) field, neural decoders are computational algorithms that aim at producing the best possible mapping of large-scale neural activity to complex motor behaviors such as limb movements (Carmena et al., 2003; Fitzsimmons, Lebedev, Peikon, & Nicolelis, 2009; Ifft, Shokur, Li, Lebedev, & Nicolelis, 2013; Taylor, Tillery, & Schwartz, 2002; Wu et al., 2003) and whole-body navigation (Rajangam et al., 2016). Here we report a decoder that exceeds the high performance of previously reported decoding methods based on the traditional Kalman filter and newer variations of this classic method. To this date, different versions of Kalman filters (Kalman, 1960; Kalman & Bucy, 1961) have been successfully utilized in various BMI applications, including experiments in rhesus monkeys (Aggarwal, Mollazadeh, Davidson, Schieber, & Thakor, 2013; Chen & Takahashi, 2013; Dangi, Gowda, & Carmena, 2013; Ifft et al., 2013; Li, Li, & Li, 2016; Li et al., 2009; Makin, O'Doherty, Cardoso, & Sabes, 2017; Wu et al., 2003) and clinical BMIs tested in human patients (Hochberg et al., 2012; Homer et al., 2013; Jarosiewicz et al., 2013). Kalman filters explicitly model neuronal tuning properties (i.e., the relationship between neuronal activity and behavioral parameters of interest) using a series of assumptions, such as the statistical distribution of noise (e.g., gaussian or Poisson), characteristics of the tuning curve (e.g., linear or quadratic), and a predetermined number of “taps” (time bins of spike counts) used to capture neuronal response dynamics. However, the performance of a Kalman filter can be considered suboptimal in many cases because it does not account for the temporal characteristics of spiking activity and limb kinematics over sufficiently long periods of time. This is because the number of taps should not be too high to avoid overfitting (Li et al., 2016).

Conversely, recurrent artificial neural networks offer a solution to this problem, since they are capable of keeping information for a long time without increasing the number of free parameters. Recurrent artificial neural networks used to be hard to train if the time lag between the input and the output exceeded 5 to 10 steps. However, the development of long-short term memory (LSTM) architecture (Gers, Schmidhuber, & Cummins, 2000; Hochreiter & Schmidhuber, 1997) allowed this major obstacle to be overcome by letting learning of input-output relationships happen with an arbitrary and large number of time lags. This property is very useful for neural decoding of motor parameters because a typical motor task lasts several seconds—a time interval difficult to model with a Kalman filter. Therefore, using LSTM to retain information for long intervals could significantly improve the decoding performance of neuroprosthetic devices that are based on brain-machine interfaces.

In this study, we tested two standard versions of LSTM against three alternative decoders: unscented Kalman filter (Li et al., 2016), Kalman filter (Wu et al., 2003), and Wiener filter. We used each algorithm to reconstruct the kinematic parameters (position, velocity, and acceleration) of both upper and lower limb movements, based on the combined electrical activity of cortical neuronal ensembles recorded in four rhesus monkeys chronically implanted with multielectrode arrays. These chronic implants were placed in the primary motor (M1), dorsal premotor (PMd), supplementary motor (SMA), and primary somatosensory (S1) cortical areas. Data from three experimental tasks were used: (1) center-out arm reaching (Carmena et al., 2003; Li et al., 2016; Li et al., 2009), (2) bimanual reaching (Ifft et al., 2013), and (3) bipedal walking on a treadmill (Fitzsimmons et al., 2009; Peikon, Fitzsimmons, Lebedev, & Nicolelis, 2009). Overall, LSTM-based algorithms outperformed the other decoders for all three tasks. Moreover, LSTM applications needed fewer training data to achieve the same decoding performance as the other decoders. LSTM decoding performance also remained robust to perturbations of the network hyperparameters, such as structure, length of training snippet, and the ratio of neurons randomly lost during both training and testing.

2  Materials and Methods

All animal procedures were performed in accordance with the National Research Council's Guide for the Care and Use of Laboratory Animals and were approved by the Duke University Institutional Animal Care and Use Committee.

2.1  Experimental Tasks

Data were taken from three experimental tasks: (1) center-out arm reaching (Carmena et al., 2003; Li et al., 2016; Li et al., 2009), (2) bimanual reaching (Ifft et al., 2013), and (3) bipedal walking on a treadmill (Fitzsimmons et al., 2009; Peikon et al., 2009).

In the center-out task, monkeys continuously controlled a hand-held joystick (CH-400R-P3, Hangzhou ChuangHong Electric Co.) to place a screen cursor over screen targets (see Figure 1B). The computer monitor was placed 55 cm in front of the monkey. The targets were circular, with a diameter of 7 cm. Each trial started with a target appearing at the center of the screen. Then the monkey had to hold the cursor at that central location for 500 ms. After this period, a peripheral target appeared on the screen, 9 cm from the center and at a random angle. The monkey had 5 s to move the cursor to the target. Next, the monkey held the cursor at the target location for 200 to 1000 ms, and finally received a juice reward (Hawaiian Punch). We decoded six kinematics variables: x and y components of joystick position, velocity, and acceleration.

Figure 1:

Neural recording and the three motor tasks used in this study. (A) Recording sites in monkey C, monkey N, monkey R, and monkey P. M1, primary motor cortex; PMd, dorsal premotor cortex; SMA, supplementary motor cortex; and S1, primary somatosensory cortex. (B) The center-out arm-reaching task requires the monkey to move the cursor (small black circle) into the target (large black circle). The target alternatively appears between the screen center and peripherals (anywhere on the gray dashed circle). (C) The upper plot shows the cursor position for three consecutive trials (center out-in-out), and the bottom plot illustrates the corresponding z-scored neuronal firing rate (z-FR) for 137 concurrently recorded M1 neurons. The vertical black line indicates the time that a new target appears, the red line represents the time the cursor reaches the target, and the blue line shows the time the monkey receives a juice reward. (D) The bimanual reaching task, where the monkey moves left and right avatar arms (small black circles) in left and right targets (large black circle). Dashed circles are the four potential target locations for each side. (E) The bipedal walking task had the monkey walk forward or backward on a treadmill, where its hips, knees, and ankles (yellow dots) are video-tracked.

Figure 1:

Neural recording and the three motor tasks used in this study. (A) Recording sites in monkey C, monkey N, monkey R, and monkey P. M1, primary motor cortex; PMd, dorsal premotor cortex; SMA, supplementary motor cortex; and S1, primary somatosensory cortex. (B) The center-out arm-reaching task requires the monkey to move the cursor (small black circle) into the target (large black circle). The target alternatively appears between the screen center and peripherals (anywhere on the gray dashed circle). (C) The upper plot shows the cursor position for three consecutive trials (center out-in-out), and the bottom plot illustrates the corresponding z-scored neuronal firing rate (z-FR) for 137 concurrently recorded M1 neurons. The vertical black line indicates the time that a new target appears, the red line represents the time the cursor reaches the target, and the blue line shows the time the monkey receives a juice reward. (D) The bimanual reaching task, where the monkey moves left and right avatar arms (small black circles) in left and right targets (large black circle). Dashed circles are the four potential target locations for each side. (E) The bipedal walking task had the monkey walk forward or backward on a treadmill, where its hips, knees, and ankles (yellow dots) are video-tracked.

In the bimanual task, a monkey held two joysticks, with the right and left hands, to control the movements of two avatar hands shown on the screen. The task required placing the avatar hands over their initial targets close to the center of the screen, holding these positions for 750 to 1500 ms (drawn from a uniform distribution), and then reaching toward the peripheral targets designated for each hand (see Figure 1D). The initial targets were 8 cm by 8 cm squares located 12 cm to the left and right from the screen center. The peripheral targets were circles, 8 cm in diameter, shifted 8 cm leftward, rightward, up, or down with respect to the initial targets. The four potential target locations for each hand resulted in a configuration with 16 potential target locations, each appearing with equal probability. After the monkeys acquired the peripheral targets, within a 5 s timeout limit, they had to hold the avatar hands over these targets for 50 ms to receive a juice reward. The behavioral task was identical to the one described in our previous study (Ifft et al., 2013), but here we used data from a different monkey. We decoded 12 kinematics variables: x and y coordinates of both joysticks and the corresponding velocity and acceleration.

In the bipedal walking task, a monkey bipedally walked on a treadmill while holding a horizontal bar with its hand. An adjustable neckplate restrained the monkey. Three different treadmill speeds were tested: 12.5, 25, and 50 cm/s (see Figure 1E). The monkeys walked in both forward and backward directions. Each walking session lasted 10 min, after which the monkey rested for 5 min; then the treadmill parameters were changed and the next walking session conducted. The monkey was rewarded with raisins every 5 to 10 s. Fluorescent markers were attached to the hip, knee, and ankle of both legs, and two video cameras captured the marker locations at 30 Hz as described in Peikon et al. (2009). We decoded 36 kinematics variables: the position, velocity, and acceleration of the ankle, the knee, and the hip of both legs in the sagittal plane.

2.2  Neuronal Ensemble Recordings and Data Sets

Monkeys were chronically implanted with microwire arrays in multiple cortical areas in both cerebral hemispheres. The implant locations were different for different monkeys; in all of them, we recorded from M1 and in some from S1, PMd, and SMA (see Figure 1A). The extracellular activity of neuronal populations was amplified, digitized, and processed using a multichannel recording system (Plexon, TX, USA).

Neuronal spikes were sorted using a template matching algorithm. While our neuronal recordings mostly represent single neurons (Schwarz et al., 2014), we chose to stay conservative and use the term unit to allow the eventual inclusion of residual multiunit activity. As repeatedly reported in the literature, multi-units can be used for decoding, with decoding performance compared to that of single units (Carmena et al., 2003; Chestek et al., 2011; Stark & Abeles, 2007) (see Supplementary Figure S1 for the decoding performance for emulated multi-units). The data set for the center-out task included six recording sessions (each 14–30 min long) from monkey N and three sessions (40–120 min) from monkey C. Both monkeys performed the task with their right arms. In monkey C, trials during which the monkey did not perform were removed from the analysis, which resulted in 30 to 45 min per session being analyzed. In monkey N, 97 to 107 units were sampled in M1 and 59 to 61 in S1, all in the left hemisphere. In one session from monkey C, 92 M1 and 103 SMA units were recorded bilaterally. In the other two sessions, we recorded 97 to 103 M1 units bilaterally and 52 to 55 SMA units and 60 to 72 PMd units in the left hemisphere (see Figure 1A).

The bimanual data set included three sessions from monkey R (50–75 min, with 40–65 min analyzed after the no-performance trials were removed). In this monkey, 164 to 171 units were recorded from the left M1 while the monkey manipulated two joysticks with its hands.

The bipedal data set consisted of five days of recordings in monkey P. A range of 225 to 250 units was sampled in M1 bilaterally and 150 to 159 in the right and left S1. For two recording sessions, a pair of treadmill speeds was tested (12.5 and 25 cm/s, forward and backward). The duration of each of these sessions was approximately 40 min, but up to 4 min of data per session were excluded because of the occasional inability of the video system to track the markers correctly. For the other three sessions, three speeds were tested (12.5, 25, and 50 cm/s, forward and backward walking). Each session lasted approximately 60 minutes, with 3 to 16 minutes per session excluded from data analysis.

2.3  LSTM Architecture

We used the basic form of LSTM (Gers et al., 2000; Hochreiter & Schmidhuber, 1997) with full gradient training. The LSTM recurrent network structure was specially designed to store information for an extended period of time. The information was read, written, and forgotten by three different gates: input, output, and forget. The LSTM schematics is shown in Figure 2.

Figure 2:

Schematics of LSTM. (A) The LSTM block keeps its cell memory ct-1 from the previous time step t-1. The LSTM block also receives an input vector xt at time t (e.g., neural firing rates), which is concatenated with an output vector mt-1 at the previous time step (e.g., cursor positions). The concatenated vector is forward to four learned network layers (ft, forget gate; it, input gate; jt, block input; ot, output gate) with nonlinear activation functions (sigmoid or tanh) that control the flow of information. To make the next prediction mt, a learned projection network layer projected regulated ct onto the dimension of mt. Note that if there are multiple layers of LSTM as in panel B, only the last layer (not the earlier layers) has the projection network. For the illustrated two-layer LSTM, the input vector of the second LSTM layer is the output vector m1,t from the first LSTM layer.

Figure 2:

Schematics of LSTM. (A) The LSTM block keeps its cell memory ct-1 from the previous time step t-1. The LSTM block also receives an input vector xt at time t (e.g., neural firing rates), which is concatenated with an output vector mt-1 at the previous time step (e.g., cursor positions). The concatenated vector is forward to four learned network layers (ft, forget gate; it, input gate; jt, block input; ot, output gate) with nonlinear activation functions (sigmoid or tanh) that control the flow of information. To make the next prediction mt, a learned projection network layer projected regulated ct onto the dimension of mt. Note that if there are multiple layers of LSTM as in panel B, only the last layer (not the earlier layers) has the projection network. For the illustrated two-layer LSTM, the input vector of the second LSTM layer is the output vector m1,t from the first LSTM layer.

For the forward pass, at any given time t, an LSTM block took three input vectors: the instantaneous firing rates xt of the neuronal ensemble, the LSTM output mt-1 (i.e., decoded kinematics), and the state of the memory cells ct-1 from the previous time point. The forget gate ft first determined the information to be forgotten. Then the input gate it determined the information to be added to the memory cells, and finally the output gate ot determined the information from the memory cells to project onto the kinematics of interests mt. In the equations below, Ws are learned weight matrices, and σ and g are point-wise sigmoid (σ(α)=11+e-α) and hyperbolic tangent (tanh(α)=sinh(α)cosh(α)) functions, respectively:

Concatenatedinputzt=[xtTmt-1T1]TForgetgateft=σ(Wfzt)Inputgateit=σ(Wizt)Blockinputjt=g(Wjzt)Outputgateot=σ(Wozt)Memorycellstatect=ct-1·ft+it·jtBlockoutputmt=Wpg(ct)·ot,lastLSTMblockg(ct)·ot,else

Because these equations are differentiable, gradient-based learning algorithms can be deployed to learn the weight matrices, using an error signal derived from the differences between mt of the last LSTM block and the recorded kinematics. We propagated this error signal back in time through the memory cell whose activation function was linear with the derivative of one. When the value for the forget gate remained equal to one, the error signal stayed constant in time, and LSTM learned the relationship between the target values and the input values at arbitrary time lags.

We implemented two LSTMs in TensorFlow 1.3.0: one was the basic LSTM with 128 hidden units, as shown Figure 2A (LSTM1), and the other was a two-layer LSTM with 128 hidden units in each layer, as shown Figure 2B (LSTM2). The settings that produced the main results were as follows: 1000 iterations of training with a batch size of 64 and unrolling of 30 steps (equivalent to 1.5 s as each time step was 50 ms). We used an Adam optimizer (Kingma & Ba, 2014) with the learning rate α=0.001,β1=0.9,β2=0.999, and ε=1e-08 (default values in Kingma & Ba, 2014) and optimized a Huber loss function (delta = 16) with L2-regularization (alpha = 0.2). To further regularize the LSTM, a dropout technique was applied with the probability of 0.2. (20% of the input and output units were randomly set to 0 during training.)

2.4  Comparison of Different Decoders

LSTM decoding performance was compared to three alternative algorithms: the Wiener filter, Kalman filter, and unscented Kalman filter. The Wiener filter, WIENER10 (10 past taps were used), performed a regression of the z-scored spike counts for 10 50 ms taps, representing time in the past with L2-regularization (ridge regression; best alpha was searched using cross-validation).

The linear Kalman filter, KF10 (i.e., 10 taps), decoded the kinematics (position, velocity, and acceleration) with the iterations that consisted of two steps. During the first step, the filter's movement model predicted the current kinematics from the same parameter calculated during the previous iteration. During the second step, this kinematic prediction was corrected in a Bayesian way based on the difference between the measured neural firing rates and the estimated firing rates derived using the observation model. The observation model (also called neuronal tuning model) represented firing rate as a linear function of the kinematic parameters for the 10 50 ms taps. The observation model and the movement model were fitted with ridge regression, where the best alpha was found using cross-validation.

The unscented Kalman filter, UKF5 (i.e., 5 taps), utilized the observation model that represented firing rate as a quadratic function of kinematics (x and y position and the corresponding values of velocity and acceleration; Li et al., 2016; Li et al., 2009). Five taps, instead of 10, were used for better filter stability as suggested in S. Li et al. (2016). Both the observation model and the movement models were fitted with ridge regression.

2.5  Evaluation of Decoding Performance

Both limb kinematics and neuronal firing rates were converted to z-scores to perform decoding. The decoded values were converted back to their original units to quantify the decoder's performance. The first 30 seconds of decoded kinematics were excluded from the evaluation because the Kalman filters needed time to stabilize decoding. Decoding performance was evaluated using three metrics: coefficient of determination (R2), correlation coefficient (r), and signal-to-noise ratio in decibels (SNRdb). R2 was calculated as
R2=1-i(yi-fi)2i(yi-y¯)2,
where yi is limb kinematics at time i, fi is decoded kinematics at time i, and y¯ is the mean for limb kinematics. The correlation coefficient was calculated as
ry,f=cov(y,f)σyσf,
where cov is covariance and σz is the standard deviation of z. SNRdb was calculated as
SNRdb=10×log10i(yi-y¯)2i(yi-fi)2.
We performed a five-fold cross-validation, which yielded five values for each metric. The median of these five values was taken to represent the decoder's performance for each recording session. To compare the decoding performance across the decoders and different kinematic parameters, we performed a mixed-design ANOVA. The ANOVA treated the metrics as repeated measures because all data points were obtained from the same sessions, whereas the kinematic parameters and subjects (i.e., monkeys) were nonrepeated measures. The sum of squares of type 3 was employed for unbalanced data. If an ANOVA main effect on the decoder was observed, post hoc multiple comparisons were conducted using a paired permutation test with 10,000 permutations for every pair of decoders, and the p-values were adjusted by the false discovery rate (FDR). If the main effect of kinematics or subject was observed, post hoc multiple comparisons were conducted as a conventional permutation test with 10,000 permutations; the p-values were adjusted by FDR.

3  Results

The performance of the decoders is presented below for different tasks.

3.1  Center-Out Task

For the center-out task, the performance of each decoder was first assessed using the correlation coefficient (r) as a metric (see Figure 3A). An ANOVA revealed main effects for the following categories: decoders (F(4,192)=374.8,p<0.05), kinematics (F(2,48)=211.0,p<0.05), and subjects (F(1,48)=135.5,p<0.05). Post hoc multiple comparison of different decoders showed that the performance of LSTM2 (mean r=0.66) was better (paired permutation test, FDR corrected, p<0.005) compared to the other four decoders (LSTM1, mean r=0.58; UKF5, r=0.48; KF10, r=0.51; WIENER10, r=0.57). Additionally, LSTM1 outperformed UKF5, KF10, and WIENER10 (paired permutation test, FDR corrected, p<0.023). A post hoc multiple comparison conducted for kinematics showed that all decoders performed best for predicting position (mean r=0.77), followed by velocity (r=0.59) and acceleration (r=0.30). (These values of r are different from each other; permutation test, FDR corrected, p<0.005.) Finally, all decoders performed better (permutation test, p<0.005) for monkey N (mean r=0.63) than for monkey C (r=0.41).

Figure 3:

Quality of offline reconstruction of cursor position in the center-out task, evaluated as the correlation coefficient (A) and coefficient of determination (B) for monkey N (first row) and C (second row). Median values and 95% confidence intervals (error bar) obtained from 1000 bootstrap replicates are plotted. LSTM2: two-layer LSTM; LSTM1: single-layer LSTM; UKF5: 5th order UKF; KF10: 10th order KF; WIENER10: 10th order Wiener filter.

Figure 3:

Quality of offline reconstruction of cursor position in the center-out task, evaluated as the correlation coefficient (A) and coefficient of determination (B) for monkey N (first row) and C (second row). Median values and 95% confidence intervals (error bar) obtained from 1000 bootstrap replicates are plotted. LSTM2: two-layer LSTM; LSTM1: single-layer LSTM; UKF5: 5th order UKF; KF10: 10th order KF; WIENER10: 10th order Wiener filter.

Next, the decoding performance was assessed using the coefficient of determination (R2; see Figure 3B). This metric is stricter than the correlation coefficient because it is more affected by the drifts and changes in amplitude of the decoded variable compared to the correlation coefficient. The ANOVA showed main effects for the decoders (F(4,192)=227.2,p<0.05), kinematics (F(2,48)=93.0,p<0.05), and subjects (F(1,48)=81.5,p<0.05). The post hoc multiple comparison showed that LSTM2 (mean R2=0.45) outperformed (paired permutation test, FDR corrected, p<0.005) the other four decoders (LSTM1, R2=0.37; UKF5, R2=0.23; KF10, R2=0.23; WIENER10, R2=0.34), and LSTM1 outperformed UKF5, KF10, and WIENER10 (paired permutation test, FDR corrected, p<0.018). UKF5 and KF10 had similar R2 (paired permutation test, FDR corrected, p=0.48). The decoding performance was best (permutation test, FDR corrected, p<0.005) for position (mean R2=0.52), followed by velocity (R2=0.36) and acceleration (R2=0.09. All decoders performed better (permutation test, FDR corrected, p<0.005) for monkey N (mean R2=0.40) than for monkey C (R2=0.18). The data for the SNRdb metric are presented in the supplementary materials (see supplementary Figure S2).

After showing that both types of LSTM outperformed UKF5, KF10, and WIENER10, we examined the decoded traces to assess the contribution of different task epochs to the performance improvement. Figure 4A shows a 10 s snippet showing the x-coordinates for the true and decoded joystick position. Two features of LSTM performance can be seen. First, when the monkeys did not move the joystick (i.e., initial central hold), both LSTMs generated more stable output compared to the other decoders. Second, the joystick trajectories obtained with LSTM2 were visibly smoother (i.e., contained less high-frequency noise) compared to those generated by LSTM1. These observations were confirmed by the analysis of mean squared error (MSE) for the difference between the decoded and true joystick positions during the initial central hold period of the behavioral task (see Figure 4B). An ANOVA showed the main effect of the decoder (repeated measure, F(4,68)=97.0,p<0.05), and post hoc multiple comparisons showed that the LSTM2 output was the most stable for the analyzed epoch (mean MSE = 1.31), followed by LSTM1 (MSE = 1.54), WIENER10 (MSE = 2.56), UKF5 (MSE = 4.75), and KF10 (MSE = 6.27). These values were statistically different, as confirmed by the paired permutation test (FDR corrected, p<0.005 for every pair). Trajectory smoothness was quantified using normalized mean absolute jerk (Rohrer et al., 2002) (NMAJ=Δ-1vpeak(t2-t1)t1t2|d2vdt2|dt, where v is the cursor speed and vpeak=Δmaxt[t1,t2]v(t)); (see Figure 4C), where an ANOVA showed the main effect of the decoder (repeated measure, F(4,32)=19.4,p<0.05). The post hoc analysis showed that LSTM2 produced the smoothest trajectories (mean NMAJ = -0.27, paired permutation test FDR corrected, p<0.01), followed by UKF5 (NMAJ = -0.42), KF10 (NMAJ = -0.45), LSTM1 (NMAJ = -0.54), and WIENER10 (NMAJ = -0.56) (see Figure 4C).

Figure 4:

Comparisons on the decoded cursor position. (A) A 10 s snippet of true and decoded joystick position to visualize the differences of decoded traces among decoders. (B) Stability of decoded joystick position when monkey holds it in the center. The stability was measured by mean squared error (MSE) between the decoded position and the center, so the lower the value, the more stable the decoder. x, decoded position in x-axis; y, decoded position in y-axis. (C) The smoothness of the decoded joystick position was measured as a normalized mean absolute jerk (NMAJ). Median values and its 95% confidence interval (error bar) obtained from 1000 bootstrap replicates are plotted. Moreover, one session of online BMI control was conducted in monkey C (supplementary video S1). As evident from the videos, when LSTM was used as the decoder, the monkey was able to generate straight movement trajectories and hold the cursor steady during the preparatory interval. This performance was almost as good as the one during manual control. By contrast, the trajectories generated with UKF were not as straight, and the central hold was noisier compared to the LSTM and manual control conditions.

Figure 4:

Comparisons on the decoded cursor position. (A) A 10 s snippet of true and decoded joystick position to visualize the differences of decoded traces among decoders. (B) Stability of decoded joystick position when monkey holds it in the center. The stability was measured by mean squared error (MSE) between the decoded position and the center, so the lower the value, the more stable the decoder. x, decoded position in x-axis; y, decoded position in y-axis. (C) The smoothness of the decoded joystick position was measured as a normalized mean absolute jerk (NMAJ). Median values and its 95% confidence interval (error bar) obtained from 1000 bootstrap replicates are plotted. Moreover, one session of online BMI control was conducted in monkey C (supplementary video S1). As evident from the videos, when LSTM was used as the decoder, the monkey was able to generate straight movement trajectories and hold the cursor steady during the preparatory interval. This performance was almost as good as the one during manual control. By contrast, the trajectories generated with UKF were not as straight, and the central hold was noisier compared to the LSTM and manual control conditions.

3.1.1  LSTM Hyperparameters

Next, we examined the effect of LSTM hyperparameters on the decoding performance. We first explored how the LSTM performance was affected by the network width (i.e., the number of cells: 16, 32, 64, 128, 256, or 512) and height (i.e., number of layers: 1 to 3; see Figure 5). The top section of Figure 5 shows the effect of the number of layers on the decoding performance. In the center-out data set, six kinematic parameters were decoded from 134 to 223 units. For these data, we found that the performance of the two-layer LSTMs (mean R2=0.41) was significantly better (paired permutation test, p<0.05 with all kinematics combined) than the performance of the one-layer LSTMs (mean R2=0.34). However, adding the third LSTM layer did not improve the performance (mean R2=0.41; paired permutation test, p=0.36). The middle section of Figure 5 compares the decoding performance of LSTMs with the widening and narrowing architectures; the number of cells is the same in different LSTMs. In the widening LSTM, the second layer has more cells than the first layer, and in a narrowing LSTM, the second layer has fewer cells. We observed that the widening LSTMs performed better than the narrowing LSTMs (widening LSTM, mean R2=0.42; narrowing LSTM, mean R2=0.39; paired permutation test, p<0.005 with all kinematic variables combined).

Figure 5:

Decoding performance for different LSTM structures. Each row represents a decoder, where the number of cells is listed from the first layers to the last layers. For example, LSTM2 was LSTM-128-128 and LSTM1 was LSTM-128. Each column is a kinematic variable, and the decoding performance is color-coded (yellow representing the highest, navy blue the lowest). The first group of decoders (above the first white horizontal line) has the same number of cells but a different number of layers. The second group of decoders is LSTMs with narrowing or widening structure. The third group of decoders is the (unscented) Kalman filters and the Wiener filter. The vertical white line separates the results from the two monkeys. x: joystick position in x-axis; y: joystick position in y-axis; vx: joystick velocity in x-axis; vy: joystick velocity in y-axis; ax: joystick acceleration in x-axis: ay, joystick acceleration in y-axis.

Figure 5:

Decoding performance for different LSTM structures. Each row represents a decoder, where the number of cells is listed from the first layers to the last layers. For example, LSTM2 was LSTM-128-128 and LSTM1 was LSTM-128. Each column is a kinematic variable, and the decoding performance is color-coded (yellow representing the highest, navy blue the lowest). The first group of decoders (above the first white horizontal line) has the same number of cells but a different number of layers. The second group of decoders is LSTMs with narrowing or widening structure. The third group of decoders is the (unscented) Kalman filters and the Wiener filter. The vertical white line separates the results from the two monkeys. x: joystick position in x-axis; y: joystick position in y-axis; vx: joystick velocity in x-axis; vy: joystick velocity in y-axis; ax: joystick acceleration in x-axis: ay, joystick acceleration in y-axis.

Next, we examined decoding performance as a function of the training snippet length. We tested the following length values: 0.05 s, 0.25 s, 0.5 s, 0.75 s, 1 s, 1.25 s, 1.5 s, 2 s, 2.5 s, 3 s, and 10 s (see Figure 6A). To estimate the point where the decoding performance stopped improving, we pooled the data from the two monkeys together; the difference in R2 between the neighboring points (e.g., 0.05 s versus 0.25 s, but not 0.05 s versus 0.5 s) was used as a measure of improvement. For decoding of position, LSTM2 performance stopped improving when the snippet length reached 2 s, and LSTM1 stopped improving at 2.5 s (paired permutation test, FDR corrected, p>0.05). For decoding of velocity, LSTM2 stopped improving at 0.8 s for LSTM2 and LSTM1 at 2.5 s. For acceleration, LSTM2 stopped improving at 1.2 s, while LSTM1 continued to improve for a snippet length of up to 3 s (paired permutation test, FDR corrected, p<0.05 for 2.5 s versus 3 s). Thus, for all three kinematic parameters, a shorter snippet was needed for LSTM2 to reach the maximum performance as compared to LSTM1.

Figure 6:

Effect of hyperparameters on the LSTM decoding performance. (A) Decoding performance as a function of the length of past activities the LSTM trained with. (B) Performance as a function of the total number of the training data. (C) Performance as a function of the dropout ratio of the hidden cells during training. (D) Performance as a function of the dropout ratio of the input neurons during testing. Median values and its 95% confidence interval (error bar) obtained from 1000 bootstrap replicates are plotted.

Figure 6:

Effect of hyperparameters on the LSTM decoding performance. (A) Decoding performance as a function of the length of past activities the LSTM trained with. (B) Performance as a function of the total number of the training data. (C) Performance as a function of the dropout ratio of the hidden cells during training. (D) Performance as a function of the dropout ratio of the input neurons during testing. Median values and its 95% confidence interval (error bar) obtained from 1000 bootstrap replicates are plotted.

In addition to these tests, we assessed how a very long history (i.e., 10 s) affected the decoding performance. The difference between the R2 values for a snippet length of 3 s or 10 s was small but statistically significant. For LSTM1, the decoding for 10 s (position, R2=0.66; velocity, R2=0.41; acceleration, R2=0.11) was better than for 3 s (position, R2=0.65; velocity, R2=0.41; acceleration, R2=0.10) (paired permutation test, FDR corrected, p<0.021), while for LSTM2, it was the opposite (10 s versus 3 s: position, R2=0.71 versus 0.71, p=0.13; velocity, R2=0.46 versus 0.47, p<0.01; acceleration, R2=0.18 versus 0.20, p<0.01; paired permutation test, FDR corrected).

We also assessed the decoding performance as a function of the total number of training samples. In this analysis, we increased the length of the training segment from 30 s to 5 min at 30 s steps (see Figure 6B). We used the same cross-validation paradigm as in the other analyses; the training fold was randomly sampled to reduce the number training data. Because sampling different parts of the training fold created variance in the testing results, we repeated the procedure 10 times and took the average of the 10 testing results as the final result. Three observations emerged. First, for all three types of kinematic parameters, the decoding performance continued to improve as the training data set increased from 0.5 to 3.5 min (paired permutation test, FDR corrected, p<0.05). Second, LSTMs trained with the full training fold outperformed those trained on 5 minutes of data for decoding of the joystick position (LSTM2, R2=0.69 versus 0.56; LSTM1, R2=0.63 versus 0.54; paired permutation test, FDR corrected, p<0.005) and velocities (LSTM2, R2=0.47 versus 0.38; LSTM1, R2=0.39 versus 0.36; p<0.005), but not acceleration (LSTM2, R2=0.19 versus 0.12; LSTM1, R2=0.08 versus 0.09; p<0.005). Third, LSTMs trained with a small number of data outperformed UKF5, KF10, and WIENER10 trained on full data. For the decoding of position, LSTM2 needed at most 2.5 min of training data to perform comparably to its three competitors (paired permutation test, FDR corrected, p>0.069) and at most 4 min of data to outperform them (p<0.05). For the decoding of velocity, LSTM2 needed at most 3.5 min of training data to perform comparably to its competitors (p>0.069). Although LSTM2 trained on 5 min did not outperform WIENER10 (R2=0.38 versus 0.37, p=0.25), the LSTM2 trained with 2.5 min of data outperformed UKF5 (R2=0.25,p<0.005) and KF10 (R2=0.30,p<0.005). For acceleration, LSTM2 trained with 2 min of data (R2=0.06) surpassed the performance of UKF5 (R2=0.01,p<0.005) and KF10 (R2=0.04,p<0.008), but not WIENER10 (R2=0.14), even when the LSTM2 was trained on 5 min of data (R2=0.12,p=0.01). LSTM1 needed additional 30 s to 1 min of data to reach the performance level of LSTM2; otherwise, it performed about the same to LSTM2 in comparison to the other decoders.

Next, we looked at the decoding performance as a function of the percentage of hidden units randomly dropped out during training (see Figure 6C). This dropout method has been proposed as a regularization technique to reduce overfitting (Hinton, Srivastava, Krizhevsky, Sutskever, & Salakhutdinov, 2012). We varied the percentage of dropped units, from 0% to 90%, and observed that LSTM2 had the highest R2 for all three kinematic parameters when 20% of units were dropped (paired permutation test, FDR corrected, p<0.031). For LSTM1, the best decoding performance was achieved for all three kinematics when no units were dropped out (paired permutation test, FDR corrected, p<0.031).

Finally, we tested the decoding performance as a function of the percentage of neurons randomly dropped out during testing (see Figure 6D). This test assessed the decoder robustness to unstable recordings (e.g., due to a failure of some electrodes). We randomly selected the neurons to drop in 10 different tests and calculated the average R2 for these tests as a measure of decoder performance. We found that both LSTMs performed the best for all three kinematic parameters when no neurons were dropped (paired permutation test, p<0.05 after FDR correction). The deterioration of decoding was evaluated using a dropping curve. The x-axis of the curve was the percentage of neurons dropped, and the y-axis represented normalized R2 (R2/max(R2)). To quantify the rate of performance decrease with the neurons being dropped, we computed the area under the dropping curve (AUDC). AUDC ranged between 0 and 1; the higher the AUDC, the less pronounced was the decrease in performance. We found that the AUDC of LSTM1 (mean = 0.65) was slightly but significantly better than that of LSTM2 (mean = 0.64; paired permutation test, p=0.017).

3.1.2  LSTM Internal Dynamics

Figures 7A to 7H illustrate the activities of different LSTM2 components, for a short 10 s snippet, which depicts offline reconstruction of kinematics from the normalized firing rate of input units. We observed that the same type of cells had a higher intragroup correlation in the second LSTM layer than in the first layer (e.g., input gate cells in the second layer versus in the first layer; diagonal of Figure 7I). To compute the intragroup correlation, we first randomly split the cells of the group into two halves and randomly sampled 2500 data points. Then we quantified the correlation between the two halves using a bias-corrected distance correlation (Székely & Rizzo, 2013), which measured the dependency (possibly nonlinear) between two random vectors. This procedure was repeated 10 times, and the mean was taken to represent the intragroup correlation. High correlation values were observed in the second layer for the input gates (0.96±0.01 versus 0.43±0.03,p<0.01, paired permutation test), the output gate (0.99±0.00 versus 0.68±0.02,p<0.01), the forget gate (0.92±0.01 versus 0.27±0.02,p<0.01), the block input (0.78±0.01 versus 0.37±0.01,p<0.01), and the memory c state (0.84±0.01 versus 0.60±0.01,p<0.01). Intergroup correlation between the three gates was higher in the second layer than in the first layer (0.77±0.01 versus 0.49±0.01,p<0.01, permutation test).

Figure 7:

Internal dynamics of a two-layer LSTM (LSTM2) for offline reconstruction during a 10 s recording snippet. (A) The z-scored neuronal firing rate of 91 M1 and 100 SMA neurons from monkey C and (B) the corresponding true (true x, true y) and LSTM reconstructed cursor positions (pred x, pred y). (C) The values of input gates of 128 cells of each LSTM layer during the same period, where the upper subplot is the first LSTM layer and the lower subplot is the second LSTM layer. Other internal dynamics are plotted in the same layer for (D) block input, (E) forget gate, (F) memory cell state, (G) output gate, and (H) output values. These LSTM cells are sorted by their correlation in memory cells for each layer. (I) Correlations between different populations of cells during offline reconstruction for an entire recording session. White lines separate the input neurons and the two LSTM layers. (J) The proportion of time during a session that gates were fully opened (x-axis) and closed (y-axis). (K) Preferred direction and modulation depth of the input neurons and the memory cells of the two LSTM layers.

Figure 7:

Internal dynamics of a two-layer LSTM (LSTM2) for offline reconstruction during a 10 s recording snippet. (A) The z-scored neuronal firing rate of 91 M1 and 100 SMA neurons from monkey C and (B) the corresponding true (true x, true y) and LSTM reconstructed cursor positions (pred x, pred y). (C) The values of input gates of 128 cells of each LSTM layer during the same period, where the upper subplot is the first LSTM layer and the lower subplot is the second LSTM layer. Other internal dynamics are plotted in the same layer for (D) block input, (E) forget gate, (F) memory cell state, (G) output gate, and (H) output values. These LSTM cells are sorted by their correlation in memory cells for each layer. (I) Correlations between different populations of cells during offline reconstruction for an entire recording session. White lines separate the input neurons and the two LSTM layers. (J) The proportion of time during a session that gates were fully opened (x-axis) and closed (y-axis). (K) Preferred direction and modulation depth of the input neurons and the memory cells of the two LSTM layers.

We also observed that the filtering performed by the gates in the second layer had a higher contrast compared to the gates in the first layer. Namely, the second-layer gates alternated between being fully open and fully closed, whereas the first-layer gates exhibited such operations less frequently. To describe this effect quantitatively, we defined gate open state as the one with gate value greater than 0.95 (while 1.0 was the maximum) and a closed state as the one with gate value below 0.05 (0.0 was the minimum). An analysis of these states showed that the gates in the second layer were more likely to be closed than the gates in the first layer (see Figure 7J; paired permutation test, FDR corrected, p<0.01). This effect was especially clear for the input gates (29.4%±0.7% versus 3.8%±0.7%) and the output gates (35.8%±8.0% versus 7.7%±1.0%). For the open state, there was a slight but significant trend for the gates in the first layer to enter this state more often compared to the second layer (paired permutation test, FDR corrected, p<0.01).

Another interesting finding was the observation that the input units and the memory cells from both layers exhibited directional tuning; their activity was different depending on the decoded movement direction (see Figure 7K). The tuning properties were evaluated by regressing the joystick position to the unit/cell firing rates for different lags: firingrate(t-lag)=b+bxx(t)+byy(t). The lag that resulted in the best fit (according to the correlation coefficient value) was taken to represent the delay between unit/cell activity and x,y coordinates. Modulation depth and preferred direction were defined as the correlation coefficient and arctan(b2,b1), respectively, for the same lag. We found that memory cells in the second LSTM layer exhibited higher modulation depths than those in the first layer (0.28±0.01 versus 0.18±0.01, paired permutation test, p<0.01); memory cells in the first layer were more strongly modulated than the input units (0.18±0.01 versus 0.05±0.00,p<0.01).

3.2  Bimanual Target Acquisition Task

Like the analysis of the center-out data, the performance in the bimanual task was first assessed using the correlation coefficient r (see Figure 8A). The ANOVA showed main effects for both the decoder (F(4,132)=115.3,p<0.005) and kinematics (F(2,33)=47.2,p<0.005). Post hoc multiple comparison showed that LSTM2 outperformed the other four decoders (mean r=0.58; paired permutation test, FDR corrected, p<0.005), while LSTM1 (r=0.42) did not outperform WIENER10 (r=0.42), KF10 (r=0.38) and UKF5 (r=0.36) (paired permutation test, FDR corrected, p>0.16) in terms of r. The position was decoded more accurately (mean r=0.64) compared the other kinematic parameters (paired permutation test, FDR corrected, p<0.005), followed by the velocities (r=0.44) and accelerations (r=0.21).

Figure 8:

Quality of offline reconstruction of cursor position in bimanual target acquisition task, evaluated as the correlation coefficient (first row) and coefficient of determination (second row). The results for the x-axis and y-axis were pooled together to compute the median and the 95% confidence interval (error bar), obtained from 1000 bootstrap replicates.

Figure 8:

Quality of offline reconstruction of cursor position in bimanual target acquisition task, evaluated as the correlation coefficient (first row) and coefficient of determination (second row). The results for the x-axis and y-axis were pooled together to compute the median and the 95% confidence interval (error bar), obtained from 1000 bootstrap replicates.

For the assessment of performance with the coefficient of determination, R2 (see Figure 8B), the ANOVA showed main effects for both decoders (F(4,132)=59.4,p<0.005) and kinematics (F(2,33)=9.6,p<0.005). Post hoc multiple comparisons showed that LSTM2 (mean R2=0.34) surpassed the performance of LSTM1 (R2=0.19), UKF5 (R2=0.10), KF10 (R2=0.03), and WIENER10 (R2=0.20) (paired permutation test, FDR corrected, p<0.005). LSTM1 outperformed UKF5 and KF10 (p<0.05) but not WIENER10 (p=0.40). Multiple comparisons showed that decoders most accurately predicted joystick positions (R2=0.30), followed by velocity (R2=0.18) and acceleration (R2=0.04) (paired permutation test, FDR corrected, p<0.05 for each pair of kinematics). The data for SNRdb metric are presented in the supplementary materials (supplementary Figure S3).

3.3  Bipedal Walking Task

To accommodate the larger number of kinematics variables (N=36), we increased the width of the second layer of LSTM2 from 128 to 1024 cells and named it LSTM2x. This was done because our analysis of the center-out data showed that the widening LSTM performed better than the narrowing LSTM.

We first assessed the decoding performance of the six decoders using the correlation coefficient as the metric (see Figure 9A). An ANOVA showed main effects for both decoders (F(5,4665)=466.5,p<0.005) and kinematics (F(2,933)=445.7,p<0.005). Post hoc multiple comparisons revealed that LSTM2x (mean r=0.70) outperformed the other decoders (LSTM2, r=0.65; LSTM1, r=0.60; UKF5, r=0.62; KF10, r=0.64; WIENER10, r=0.65), and the performance differed significantly between every pair of the decoders (paired permutation test, FDR corrected, p<0.005). For kinematics, the ankle, knee, and hip positions were the best decoded parameters (mean r=0.84), followed by the corresponding velocities (r=0.66) and accelerations (r=0.42) (paired permutation test, FDR corrected, p<0.005).

Figure 9:

Quality of offline reconstruction of hip, knee, and ankle positions in the bipedal walking task, evaluated as the correlation coefficient (first row) and coefficient of determination (second row). The results for the x-axis and y-axis of the hips, knees, and ankles were pooled together to compute the median, and the 95% confidence intervals (error bar), obtained from 1000 bootstrap replicates, are plotted.

Figure 9:

Quality of offline reconstruction of hip, knee, and ankle positions in the bipedal walking task, evaluated as the correlation coefficient (first row) and coefficient of determination (second row). The results for the x-axis and y-axis of the hips, knees, and ankles were pooled together to compute the median, and the 95% confidence intervals (error bar), obtained from 1000 bootstrap replicates, are plotted.

We assessed the performance of different decoders using the coefficient of determination, R2 (see Figure 9B). The ANOVA showed main effects for decoders (F(5,4665)=18.5,p<0.005) and kinematics (F(2,933)=28.8,p<0.005). Post hoc multiple comparisons showed that LSTM2x was the best-performing decoder (mean R2=0.51), followed by LSTM2 (R2=0.39), WIENER10 (R2=0.35), LSTM1 (R2=0.32), UKF5 (R2=0.14), and KF10 (R2=-0.21) (paired permutation test, FDR corrected, p<0.005). Pairwise comparison of different decoders revealed significant differences in all cases (p<0.05), except for the comparison of LSTM1 with WIENER10 (p=0.23). For decoding of kinematics, the position was decoded the best (R2=0.61), followed by velocity (R2=0.25) and acceleration (R2=-0.11) (paired permutation test, FDR corrected, p<0.005). (See supplementary Figures S4– S6 for the decoding performance of each kinematic variable.)

To assess how the decoding performance was affected by leaving out units from either M1 or S1, we analyzed data from the bipedal walking task, where substantial numbers of units were recorded from both areas for all 26 sessions (on average 388.3 units; M1, 60.5%, S1, 39.5%). For a fair comparison between conditions, we ensured that the number of units remained the same after leaving out data from different cortical areas. When more M1 units were recorded than S1 units in a session, we randomly selected and discarded M1 units so that the number of remaining units was the same for both, leaving out M1 and S1 data. This analysis showed that the decoding performance of LSTM2, evaluated as R2, decreased slightly (but significantly) when leaving out both M1 and S1 data for all the decoding kinematics (paired permutation test, p<0.01 after FDR correction; see Figure 10). Additionally, R2 was slightly lower when leaving out M1 compared to leaving out S1 (paired permutation test, p<0.01 after FDR correction).

Figure 10:

Decoding performance of LSTM2 after leaving either M1 or S1 out. All, with all areas; -M1, without M1; -S1, without S1.

Figure 10:

Decoding performance of LSTM2 after leaving either M1 or S1 out. All, with all areas; -M1, without M1; -S1, without S1.

4  Discussion

We developed an LSTM-based decoder for translating extracellular discharges of an ensemble of cortical neurons into predictions of up to 36 kinematic parameters. The decoder was applied to two arm movement tasks (unimanual and bimanual reaching) and the bipedal locomotion task. LSTM proved to be suitable for decoding motor parameters because it captured well several features of the behavioral tasks: the time course with a characteristic interval of 1 to 5 s (trial duration in the arm movement tasks or step cycle duration in the bipedal walking task) and the state transitions (from resting to moving and from leg stance to leg swing). These features were captured by gating the neuronal activity in and out of the LSTM memory cells. Motor information could be kept in the memory cells for long periods of time, which allowed the neuronal decoder to capture the temporal characteristics of both upper and lower limb movements.

We demonstrated that given the same number of training data, LSTM outperformed the conventional neural decoders: the Wiener filter, and Kalman filters, including the unscented Kalman filter. Moreover, with many fewer training samples, LSTM could reach the same level of decoding performance as the other decoders. We suggest that LSTM outperformed the Wiener and Kalman filters because the latter two decoders did not capture the nonlinearities between the spiking activities and the limb kinematics. By contrast, LSTM captured nonlinearities by its nonlinear gates and activation functions. Because of such representation of nonlinearities, LSTM outperformed even the unscented Kalman filter that was also designed to incorporate nonlinearities. We suggest that LSTM had higher accuracy than the Kalman filter because its memory performed better compared to the unscented Kalman filter tap structure.

While the formation of the weight structure in LSTM networks is not well understood, some believe that it may mimic physiological properties of biological neuronal circuits (Banino et al., 2018). We conducted a systematic analysis of the effect of hyperparameters on LSTM performance. As a result, we found that when LSTM2 did not perform well, the decoder's performance could be improved by using a widening LSTM (i.e., LSTM2x). We also found that LSTM was robust to various perturbations in hyperparameter settings. For example, the decoding performance was comparable when using training snippets longer than 1 s, dropout ratio in training from 0 to 0.3, and even with a different number of memory cells and LSTM layer structures.

While LSTM was robust to our hyperparameter perturbations, some LSTM structures appeared to fit certain tasks particularly well. For example, the two-layer LSTM outperformed the single-layer LSTM, but the three-layer LSTM did not improve the decoding performance for the center-out task. Also, the widening LSTM outperformed the narrowing LSTM for the same number of neurons. The superior performance of the widening LSTM should be examined in future studies. At present, we suggest that the first LSTM layer could be lower in size than the neuronal sample because kinematic information is represented redundantly in the activity of different neurons. Additionally, the second LSTM layer should be larger than the number of variables being decoded because multiple memory cells are needed to fully represent the properties of each variable.

The most striking advantage of our LSTM-based decoder was its ability to stabilize the cursor during the delay periods of the arm-reaching tasks, that is, the periods corresponding to monkeys holding the cursor in the center of the screen, a task none of the other algorithms was able to adequately match. For example, the unscented Kalman filter generated noisy kinematic patterns during the delay intervals even though it incorporated nonlinear neuronal tuning to behavioral variables. Several studies previously attempted to solve this problem with decoder add-ons. Homer and his colleagues (2013) proposed that the cursor position should not be updated if the decoded velocity was close to zero. Golub, Yu, Schwartz, and Chase (2014) developed a speed-dampening Kalman filter that assumed that the cursor speed was inversely correlated with its angular velocity, that is, the cursor moved slower when movement direction changed. Gilja and his colleagues (2012) used the cursor position as feedback in their ReFIT Kalman filter to reduce the jitter during online control. Here, we showed that LSTM does not require such add-ons to stabilize the cursor. Instead, the hold state was detected by the gating mechanism that regulated the transformation of neuronal spikes to limb kinematics. According to this mechanism, the decoded position, velocity, and acceleration from the previous time point provided recurrent input to the LSTM for the next time point.

In the case of previously proposed BMI decoders, the length of spiking history was usually set arbitrarily or adjusted empirically. In nonrecurrent decoders such as the Wiener filter and Kalman filters, the trial history is defined as the number of discrete steps, called taps. Although many taps could reproduce the neuronal dynamics better, excessive taps cause an exponential increase in computational cost, overfitting, and numerical instability. In previous studies, the history was typically set to 0.5 to 1.0 s and the number of taps to 5 to 10. Recurrent LSTM is immune to these problems because no taps are utilized. Instead, LSTM keeps the entire trial history rather than limiting its memory to a short time span. In our case, we obtained very good results with a spiking history as long as 4 s, which corresponded to 80 taps of nonrecurrent decoders. We found that when the LSTMs were trained with a longer history than the typical trial duration, these decoders' performance neither deteriorated nor improved, probably because the memory span was exceeded by the timing required for the neurophysiological mechanism involved in executing the motor task to unfold properly.

We also found that LSTM2 (i.e., two-layer LSTM) reached a performance plateau with a shorter spiking history compared to LSTM1 (i.e., single-layer LSTM). One possible explanation of this observation is that each of these LSTMs captured a different input-output relationship in different ways. LSTM2 has twice as many free parameters to fit as LSTM1, which probably explains why a short temporal lag was sufficient for training.

Previously, an RNN-based neural decoder has been used by for decoding arm-reaching kinematics from cortical activities for the center-out task (Sussillo et al., 2012; Sussillo, Stavisky, Kao, Ryu, & Shenoy, 2016). The decoder was based on echo state networks (ESN; Jaeger & Haas, 2004) that consisted of two layers: a recurrent layer and an output layer. The recurrent layer had a large number of nonlinear units that were sparsely and randomly connected with random initial weights. The output layer was a linear readout of the recurrent layer. Because the learning took place only in the output layer, not the recurrent layer, the ESN did not have to propagate error signals back through time, which made training more efficient. However, since the recurrent layer was not trained, the decoder performance was highly sensitive to several hyperparameters of the recurrent layer (Bianchi, Maiorino, Kampffmeyer, Rizzi, & Jenssen, 2017), such as the number of neurons, connectivity between neurons, reservoir spectral radius, and noise terms. While good results have been reported for ESN in the literature (see Bianchi et al., 2017, for examples), this algorithm was compared with only a few other methods. For instance, Sussillo et al. (2012) compared ESN only to a velocity-KF instead of ReFIT-KF, presumably a better algorithm developed by the same group (Gilja et al., 2012).

As is well known, neural recording conditions can change in a matter of hours or days. When they change, a decoder that has established a fixed mapping between neural activities and limb movement would perform poorly. To this end, Sussillo et al. (2016) applied multiplicative recurrent neural networks (MRNNs; Martens & Sutskever, 2011) to improve the robustness of neural decoders across days. MRNN addresses this issue by learning a library of the mapping and selecting the right mapping based on the input. The authors showed that MRNN not only outperformed FIT-KF (similar to ReFIT-KF; Fan et al., 2014) when tested on many daily recording sessions, but it also outperformed ReFIT-KF. While MRNN is a powerful framework and can be integrated with LSTM (Krause, Lu, Murray, & Renals, 2016), it is data thirsty and time-consuming to train (Sussillo et al., 2016).

Despite outperforming other classic decoders, some limitations and future directions should be noted for our LSTM approach. While LSTM accurately predicted behaviors from neural data, unlike Kalman filters, it lacked explicit neural tuning models. Nevertheless, because the LSTMs we used in this study had a relatively simple architecture, many limitations could be addressed in the future. For example, LSTM could learn the motion and measurement models of the Kalman filter (Coskun, Achilles, Dipietro, Navab, & Tombari, 2017). It could use target position as a training signal to improve decoding performance (Li et al., 2016). For the sake of a fair comparison of different decoders, the use of target position was not assessed in this study because the Wiener filter, unlike the other decoders, could not use target information. In another direction for future research, LSTM could incorporate adaptive features based on reinforcement learning (Bakker, 2002). This capacity could be used to improve decoding from nonstationary neural signals. For practical BMI applications, it is essential that decoders detect the instances when a user wants to halt BMI control and switch attention to a different task. LSTM can incorporate such a detection (Bahdanau, Cho, & Bengio, 2014; Chorowski, Bahdanau, Serdyuk, Cho, & Bengio, 2015).

Overall, our study showed that LSTM architectures provide a robust neural decoding framework with many features that can be adjusted to improve BMI performance. Since LSTM is a neural network, it can be viewed as an artificial circuit attached to the brain in order to recover and augment physiological functions. In this respect, it is interesting that we observed the emergence of directional tuning and neuronal dynamics in our LSTM units—the patterns resembling activity of real neurons. Given the accurate performance of LSTM as a neural decoder, we propose that this method should be considered for the design of future neuroprosthetic devices for restoring mobility in severely disabled patients.

References

References
Aggarwal
,
V.
,
Mollazadeh
,
M.
,
Davidson
,
A. G.
,
Schieber
,
M. H.
, &
Thakor
,
N. V.
(
2013
).
State-based decoding of hand and finger kinematics using neuronal ensemble and LFP activity during dexterous reach-to-grasp movements
.
Journal of Neurophysiology
,
109
(
12
),
3067
3081
. https://doi.org/10.1152/jn.01038.2011
Bahdanau
,
D.
,
Cho
,
K.
, &
Bengio
,
Y.
(
2014
).
Neural machine translation by jointly learning to align and translate
. https://arXiv.org/abs/1409.0473
Bakker
,
B.
(
2002
). Reinforcement learning with long short-term Memory. In
T. G.
Dietterich
,
S.
Becker
, &
Z.
Ghahramani
(Eds.),
Advances in neural information processing systems
,
14
(pp.
1475
1482
).
Cambridge, MA
:
MIT Press
.
Banino
,
A.
,
Barry
,
C.
,
Uria
,
B.
,
Blundell
,
C.
,
Lillicrap
,
T.
,
Mirowski
,
P.
, …
Kumaran
,
D.
(
2018
).
Vector-based navigation using grid-like representations in artificial agents
.
Nature
,
557
(
7705
),
429
433
. https://doi.org/10.1038/s41586-018-0102-6
Bianchi
,
F. M.
,
Maiorino
,
E.
,
Kampffmeyer
,
M. C.
,
Rizzi
,
A.
, &
Jenssen
,
R.
(
2017
).
An overview and comparative analysis of recurrent neural networks for short term load forecasting
,
1
41
.
Carmena
,
J. M.
,
Lebedev
,
M. a
,
Crist
,
R. E.
,
O'Doherty
,
J. E.
,
Santucci
,
D. M.
,
Dimitrov
,
D. F.
, …
Nicolelis
,
M. a L.
(
2003
).
Learning to control a brain-machine interface for reaching and grasping by primates
.
PLoS Biology
,
1
(
2
),
E42
. https://doi.org/10.1371/journal.pbio.0000042
Chen
,
Z.
, &
Takahashi
,
K.
(
2013
).
Sparse Bayesian inference methods for decoding 3D reach and grasp kinematics and joint angles with primary motor cortical ensembles
. In
Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(pp.
5930
5933
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/EMBC.2013.6610902
Chestek
,
C. A.
,
Gilja
,
V.
,
Nuyujukian
,
P.
,
Foster
,
J. D.
,
Fan
,
J. M.
,
Kaufman
,
M. T.
, …
Shenoy
,
K. V.
(
2011
).
Long-term stability of neural prosthetic control signals from silicon cortical arrays in rhesus macaque motor cortex
.
Journal of Neural Engineering
,
8
(
4
),
045005
. https://doi.org/10.1088/1741-2560/8/4/045005
Chorowski
,
J. K.
,
Bahdanau
,
D.
,
Serdyuk
,
D.
,
Cho
,
K.
, &
Bengio
,
Y.
(
2015
).
Attention-based models for speech recognition
. In
C.
Cortes
,
N. D.
Lawrence
,
D. D.
Lee
,
M.
Sugiyama
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems
,
28
(pp.
577
585
).
Red Hook, NY
:
Curran
. https://arXiv.org/abs/1506.07503
Coskun
,
H.
,
Achilles
,
F.
,
Dipietro
,
R.
,
Navab
,
N.
, &
Tombari
,
F.
(
2017
).
Long short-term memory Kalman filters: Recurrent neural estimators for pose regularization
. In
Proceedings of the IEEE International Conference on Computer Vision
(pp.
5525
5533
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/ICCV.2017.589
Dangi
,
S.
,
Gowda
,
S.
, &
Carmena
,
J. M.
(
2013
).
Likelihood gradient ascent (LGA): A closed-loop decoder adaptation algorithm for brain-machine interfaces
. In
Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(pp.
2768
2771
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/EMBC.2013.6610114
Fan
,
J. M.
,
Nuyujukian
,
P.
,
Kao
,
J. C.
,
Chestek
,
C. A.
,
Ryu
,
S. I.
, &
Shenoy
,
K. V.
(
2014
).
Intention estimation in brain-machine interfaces
.
Journal of Neural Engineering
,
11
(
1
),
016004
. https://doi.org/10.1088/1741-2560/11/1/016004
Fitzsimmons
,
N. A.
,
Lebedev
,
M. A.
,
Peikon
,
I. D.
, &
Nicolelis
,
M. A. L.
(
2009
).
Extracting kinematic parameters for monkey bipedal walking from cortical neuronal ensemble activity
.
Frontiers in Integrative Neuroscience
,
3
(
March
),
3
. https://doi.org/10.3389/neuro.07.003.2009
Gers
,
F. A.
,
Schmidhuber
,
J.
, &
Cummins
,
F.
(
2000
).
Learning to forget: Continual prediction with LSTM
.
Neural Computation
,
12
(
10
),
2451
2471
. https://doi.org/10.1162/089976600300015015
Gilja
,
V.
,
Nuyujukian
,
P.
,
Chestek
,
C. a
,
Cunningham
,
J. P.
,
Yu
,
B. M.
,
Fan
,
J. M.
, …
Shenoy
,
K. V.
(
2012
).
A high-performance neural prosthesis enabled by control algorithm design
.
Nature Neuroscience
,
15
(
12
),
1752
1757
. https://doi.org/10.1038/nn.3265
Golub
,
M. D.
,
Yu
,
B. M.
,
Schwartz
,
A. B.
, &
Chase
,
S. M.
(
2014
).
Motor cortical control of movement speed with implications for brain-machine interface control
.
Journal of Neurophysiology
,
112
(
2
),
411
429
. https://doi.org/10.1152/jn.00391.2013
Hinton
,
G. E.
,
Srivastava
,
N.
,
Krizhevsky
,
A.
,
Sutskever
,
I.
, &
Salakhutdinov
,
R. R.
(
2012
).
Improving neural networks by preventing co-adaptation of feature detectors
,
1
18
. https://arXiv.org/abs/1207.0580
Hochberg
,
L. R.
,
Bacher
,
D.
,
Jarosiewicz
,
B.
,
Masse
,
N. Y.
,
Simeral
,
J. D.
,
Vogel
,
J.
, …
Donoghue
,
J. P.
(
2012
).
Reach and grasp by people with tetraplegia using a neurally controlled robotic arm
.
Nature
,
485
(
7398
),
372
375
. https://doi.org/10.1038/nature11076
Hochreiter
,
S.
, &
Schmidhuber
,
J.
(
1997
).
Long short-term memory
.
Neural Computation
,
9
(
8
),
1735
1780
. https://doi.org/10.1162/neco.1997.9.8.1735
Homer
,
M. L.
,
Harrison
,
M. T.
,
Black
,
M. J.
,
Perge
,
J. A.
,
Cash
,
S. S.
,
Friehs
,
G.
, &
Hochberg
,
L. R.
(
2013
).
Mixing decoded cursor velocity and position from an offline Kalman filter improves cursor control in people with tetraplegia
. In
Proceedings of the International IEEE/EMBS Conference on Neural Engineering
(pp.
715
718
).
Piscataway, NJ
:
IEEE
. https://doi.org/10.1109/NER.2013.6696034
Ifft
,
P. J.
,
Shokur
,
S.
,
Li
,
Z.
,
Lebedev
,
M. A.
, &
Nicolelis
,
M. A. L.
(
2013
).
A brain-machine interface enables bimanual arm movements in monkeys
.
Science Translational Medicine
,
5
(
210
),
210ra154
. https://doi.org/10.1126/scitranslmed.3006159
Jaeger
,
H.
, &
Haas
,
H.
(
2004
).
Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication
.
Science
,
304
(
5667
),
78
80
. https://doi.org/10.1126/science.1091277
Jarosiewicz
,
B.
,
Masse
,
N. Y.
,
Bacher
,
D.
,
Cash
,
S. S.
,
Eskandar
,
E.
,
Friehs
,
G.
, …
Hochberg
,
L. R.
(
2013
).
Advantages of closed-loop calibration in intracortical brain-computer interfaces for people with tetraplegia
.
Journal of Neural Engineering
,
10
(
4
),
046012
. https://doi.org/10.1088/1741-2560/10/4/046012
Kalman
,
R. E.
(
1960
).
A new approach to linear filtering and prediction problems
.
Journal of Basic Engineering
,
82
(
1
),
35
. https://doi.org/10.1115/1.3662552
Kalman
,
R. E.
, &
Bucy
,
R. S.
(
1961
).
New results in linear filtering and prediction theory
.
Journal of Basic Engineering
,
83
(
1
),
95
. https://doi.org/10.1115/1.3658902
Kingma
,
D. P.
, &
Ba
,
J.
(
2014
).
Adam: A method for stochastic optimization
. In
Proceedings of the International Conference on Learning Representations 2015
(pp.
1
15
). http://arxiv.org/abs/1412.6980
Krause
,
B.
,
Lu
,
L.
,
Murray
,
I.
, &
Renals
,
S.
(
2016
).
Multiplicative LSTM for sequence modelling
(pp.
1
11
). http://arxiv.org/abs/1609.07959
Li
,
S.
,
Li
,
J.
, &
Li
,
Z.
(
2016
).
An improved unscented Kalman filter based decoder for cortical brain-machine interfaces
.
Frontiers in Neuroscience
,
10
. https://doi.org/10.3389/fnins.2016.00587
Li
,
Z.
,
O'Doherty
,
J. E.
,
Hanson
,
T. L.
,
Lebedev
,
M. A.
,
Henriquez
,
C. S.
, &
Nicolelis
,
M. A. L.
(
2009
).
Unscented Kalman filter for brain-machine interfaces
.
PloS One
,
4
(
7
),
e6243
. https://doi.org/10.1371/journal.pone.0006243
Makin
,
J. G.
,
O'Doherty
,
J.
,
Cardoso
,
M. M. B.
, &
Sabes
,
P. N.
(
2017
).
Superior arm-movement decoding from cortex with a new, unsupervised-learning algorithm
.
Journal of Neural Engineering
,
15
,
026010
. https://doi.org/10.1088/1741-2552/aa9e95
Martens
,
J.
, &
Sutskever
,
I.
(
2011
).
Learning recurrent neural networks with Hessian-free optimization
. In
Proceedings of the 28th International Conference on Machine
(pp.
1033
1040
).
Madison, WI
:
Omnipress
.
Peikon
,
I. D.
,
Fitzsimmons
,
N. A.
,
Lebedev
,
M. A.
, &
Nicolelis
,
M. A. L.
(
2009
).
Three-dimensional, automated, real-time video system for tracking limb motion in brain-machine interface studies
.
Journal of Neuroscience Methods
,
180
(
2
),
224
233
. https://doi.org/10.1016/j.jneumeth.2009.03.010
Rajangam
,
S.
,
Tseng
,
P.-H.
,
Yin
,
A.
,
Lehew
,
G.
,
Schwarz
,
D.
,
Lebedev
,
M. A.
, &
Nicolelis
,
M. A. L.
(
2016
).
Wireless cortical brain-machine interface for whole-body navigation in primates
.
Scientific Reports
,
6
,
22170
. https://doi.org/10.1038/srep22170
Rohrer
,
B.
,
Fasoli
,
S.
,
Krebs
,
H. I.
,
Hughes
,
R.
,
Volpe
,
B.
,
Frontera
,
W. R.
, …
Hogan
,
N.
(
2002
).
Movement smoothness changes during stroke recovery
.
Journal of Neuroscience
,
22
(
18
),
8297
8304
. https://doi.org/10.1523/JNEUROSCI.22-18-08297.2002
Schwarz
,
D. a
,
Lebedev
,
M. a
,
Hanson
,
T. L.
,
Dimitrov
,
D. F.
,
Lehew
,
G.
,
Meloy
,
J.
, …
Nicolelis
,
M. a L.
(
2014
).
Chronic, wireless recordings of large-scale brain activity in freely moving rhesus monkeys
.
Nature Methods
,
11
(
6
),
670
676
. https://doi.org/10.1038/nmeth.2936
Stark
,
E.
, &
Abeles
,
M.
(
2007
).
Predicting movement from multiunit activity
.
Journal of Neuroscience
,
27
(
31
),
8387
8394
. https://doi.org/10.1523/JNEUROSCI.1321-07.2007
Sussillo
,
D.
,
Nuyujukian
,
P.
,
Fan
,
J. M.
,
Kao
,
J. C.
,
Stavisky
,
S. D.
,
Ryu
,
S.
, &
Shenoy
,
K.
(
2012
).
A recurrent neural network for closed-loop intracortical brain-machine interface decoders
.
Journal of Neural Engineering
,
9
(
2
),
026027
. https://doi.org/10.1088/1741-2560/9/2/026027
Sussillo
,
D.
,
Stavisky
,
S. D.
,
Kao
,
J. C.
,
Ryu
,
S. I.
, &
Shenoy
,
K. V.
(
2016
).
Making brain-machine interfaces robust to future neural variability
.
Nature Communications
,
7
,
1
12
. https://doi.org/10.1038/ncomms13749
Székely
,
G. J.
, &
Rizzo
,
M. L.
(
2013
).
The distance correlation t-test of independence in high dimension
.
Journal of Multivariate Analysis
,
117
,
193
213
. https://doi.org/10.1016/j.jmva.2013.02.012
Taylor
,
D. M.
,
Tillery
,
S. I. H.
, &
Schwartz
,
A. B.
(
2002
).
Direct cortical control of 3D neuroprosthetic devices
.
Science
,
296
(
5574
),
1829
1832
. https://doi.org/10.1126/science.1070291
Wu
,
W.
,
Black
,
M. J.
,
Gao
,
Y.
,
Bienenstock
,
E.
,
Serruya
,
M.
,
Shaikhouni
,
A.
, &
Donoghue
,
J. P.
(
2003
). Neural decoding of cursor motion using a Kalman filter. In
S.
Becker
,
S.
Thrun
, &
K.
Obermayer
(Eds.),
Advances in neural information processing systems
,
15
(pp.
133
140
).
Cambridge, MA
:
MIT Press
.